JPMorgan has developed a new tool called DocLLM, a smart language model designed to understand various types of business documents. Unlike other models, DocLLM does not rely on expensive image technology but instead focuses on understanding the structure of documents by identifying and defining rectangles around important text segments. It has a unique feature called disentangled spatial attention, which allows it to efficiently process information within specific areas of a document. DocLLM is particularly effective in handling documents with irregular layouts and different types of content. To train the model, JPMorgan used data from two main sources: IIT-CDIP Test Collection 1.0 and DocBank. Tests have shown that DocLLM outperforms other similar models on various document-related tasks. JPMorgan plans to further enhance DocLLM by incorporating vision-related features in a lightweight manner.
