OmniRetrieval

Python ★ 31 updated 18d ago

Official Code Repository for OmniRetrieval

A research system that takes a plain-English question and automatically searches across databases, knowledge graphs, and text documents at the same time, translating your question into SQL, SPARQL, Cypher, or keyword search as needed.

PythonSQLSPARQLCyphervLLMOpenAI APIAnthropic APIsetup: hardcomplexity 4/5

OmniRetrieval is a research system that lets you ask a natural-language question and have it automatically search across fundamentally different types of data sources at once. The problem it addresses is that information lives in many forms: some in plain text documents, some in relational databases (the kind that use SQL), some in structured knowledge graphs (databases that represent facts as linked entities), and some in graph databases with a different query language called Cypher. Each type normally requires a specialist who knows how to write queries in that format. OmniRetrieval uses a language model to handle all of that automatically.

When you give it a question, the system first decides which data sources are relevant, then translates the question into the native query language for each of those sources (SQL for a database, SPARQL for one kind of knowledge graph, Cypher for another, or a standard keyword search for text), runs the queries, and then picks the best answer from the results. The whole process is phrased in the code as four steps: route, generate, execute, and select.

The benchmark built into the project covers 13 datasets and 309 separate knowledge bases drawn from a range of fields including news fact-checking, financial questions, scientific papers, and general-knowledge questions. You can run the pipeline with different AI model providers as the reasoning backbone, including OpenAI, Anthropic, Google, or a locally hosted open-source model via a tool called vLLM.

Setup involves installing Python dependencies, downloading and preprocessing the datasets using provided scripts, and adding API keys for whichever model provider you choose. Each evaluation run saves its results and metrics to a timestamped folder, and there is a separate script for rescoring saved runs without re-running the full pipeline. The project is a research codebase rather than a packaged library.

Where it fits

Ask a natural-language question that requires combining answers from a SQL database and a knowledge graph, without writing any queries yourself.
Benchmark how well different AI models (OpenAI, Anthropic, Google, or a local model) handle multi-source retrieval across 13 research datasets.
Rescore previously saved evaluation runs with a different model without re-running the full pipeline from scratch.

Open on GitHub → Full breakdown on explaingit →