RAG-Anything
"RAG-Anything: All-in-One RAG Framework"
Python framework for building question-answering systems that handle complex documents with text, images, tables, charts, and equations all together.
RAG-Anything is an all-in-one Python framework for building question-answering systems that work with complex, mixed-content documents — not just plain text. RAG stands for Retrieval-Augmented Generation, a technique where an AI model answers questions by first searching a document collection for relevant information, then using that context to generate an answer. Most RAG systems struggle with documents that contain images, charts, tables, or mathematical equations alongside text. RAG-Anything is designed specifically to handle all of these content types together.
The framework processes documents end-to-end: it ingests PDFs, Office files, and images, parses them into their component parts (text, tables, figures, equations), builds a multimodal knowledge graph that captures relationships between these elements, and then allows users to query across all of them through a single interface. It is built on top of LightRAG, another project from the same research group at Hong Kong University. A recent addition is VLM-Enhanced Query mode, which routes visual content through a vision-language model for deeper analysis when images are relevant to a query.
This system is aimed at research and enterprise scenarios where documents contain rich mixed content — academic papers with figures and equations, financial reports with charts and tables, or technical documentation with diagrams. A Python package called raganything is available on PyPI, and the project has an accompanying academic paper on arXiv (2510.12323).
Where it fits
- Build a question-answering system for academic papers that extracts answers from text, figures, and equations together.
- Create a financial document analyzer that answers questions by searching across tables, charts, and narrative text in reports.
- Develop a technical documentation search tool that understands diagrams, code snippets, and explanatory text as a unified knowledge base.