LightRAG

Python ★ 37k updated 2d ago

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python library for building question-answering systems over large document collections using knowledge graphs instead of flat text search.

PythonNeo4jPostgreSQLMongoDBOpenSearchLanguage Modelssetup: hardcomplexity 3/5

LightRAG is a Python library for building Retrieval-Augmented Generation (RAG) systems that can answer questions about large document collections. RAG is a technique where an AI language model does not answer questions purely from its training data; instead, it first searches a document collection for relevant passages and then uses those passages as context to generate a grounded answer. LightRAG's distinguishing feature is that it structures its document knowledge as a knowledge graph rather than a flat list of text chunks.

When you feed documents into LightRAG, it uses a language model to extract entities (people, places, concepts) and the relationships between them, building a graph where nodes are concepts and edges capture how they connect. When you ask a question, LightRAG can retrieve relevant information at two levels: local, focusing on specific entities and their direct neighbors in the graph, and global, reasoning about high-level patterns and themes across the entire document set. This dual-level retrieval means it can handle both narrow factual questions and broad, synthesizing questions better than approaches that only do flat text similarity search.

Storage backends are pluggable: you can store the knowledge graph and vector embeddings in Neo4j, PostgreSQL, MongoDB, or OpenSearch, giving flexibility to choose the database that fits your infrastructure. There is also a web UI for inserting documents, querying, and visualizing the knowledge graph interactively.

You would use LightRAG when building a question-answering system over a large corpus of documents — internal company knowledge bases, research literature, legal documents, or technical documentation — where you need the system to handle both specific detail questions and broad thematic questions well. It is a Python library, published as the package lightrag-hku, and presented as research at the EMNLP 2025 natural language processing conference.

Where it fits

Build a question-answering system over internal company documentation that understands both specific facts and broad themes.
Create a research paper search tool that retrieves relevant studies and synthesizes answers across multiple papers.
Set up a legal document search system that finds relevant clauses and explains how they relate to a query.
Develop a technical documentation assistant that answers both narrow how-to questions and high-level architecture questions.

Open on GitHub → Full breakdown on explaingit →