ILLUIN Technology and CentraleSupélec are proud to present an innovative new approach to Retrieval-Augmented Generation (RAG) applied to complex document corpora with ColPali: Efficient Document Retrieval with Vision Language Models.
Find information in complex documents
Faced with the difficulty of efficiently searching for information in complex documents - often incorporating images, tables and diagrams - we set about developing an innovative solution. This new approach is integrated into our products(ILLUIN Search and ILLUIN Dialogue), as well as into the customized GenAI projects we carry out.
Traditional document indexing pipelines have two main stages:
- 🔄 Use of numerous computer vision models to understand document structure and extract text.
- 🗂️ Text indexing using text representations for subsequent retrieving.
However, this method has its limitations: slowness, propagation of errors, and limited understanding of the visual elements of a document. To overcome these drawbacks, we have developed a more suitable representation of the document.
The main contributions of this breakthrough
Two main contributions are presented in this publication:
- 📚 The ViDoRe (Visual Document Retrieval) benchmark: the first open-source benchmark to evaluate the quality of retrievers in the search for visually rich information within complex documents.
- 🤖 The ColPali model: an innovative approach based on Google's PaliGemma VLM model, creating a multi-vector representation of the document. This model uses Colbert's "late interaction" mechanism for precise and efficient matching of query tokens with document patches during inference.
Promising results
ColPali is distinguished by superior performance and speed compared with other methods, including those based on image captioning with Anthropic's Claude Sonnet model. This breakthrough demonstrates the potential of Vision Language Models (VLM) for documentary retrieving. 📈
To find out more, read the full publication on arxiv.org and discover more about :
- The HuggingFace organization
- Manuel Faysse's blogpost

Thanks
A very big bravo to all the contributors: Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Celine Hudelot, Pierre Colombo and the CINES team for the ADASTRA calculation resources. 👏
CC: Robert VESOUL, Wacim Belblidia, Paul-Henry Cournède, Renaud Monnet











