Search-R1

Python ★ 5.0k updated 7mo ago

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

A research framework for training AI language models to search the web or a local document database mid-reasoning using reinforcement learning, so the model learns on its own when and how to query for information before answering.

PythonPyTorchCUDALlamaQwensetup: hardcomplexity 5/5

Search-R1 is a research framework for training AI language models to search the web (or a local document database) as part of their reasoning process. The core idea is that a language model should not just answer questions from what it already knows. Instead, it should learn when to pause, issue a search query, read the results, and then continue reasoning with that new information, all in one continuous process. Search-R1 provides the training infrastructure to teach a model this behavior using reinforcement learning.

Reinforcement learning here means the model is rewarded when it produces correct answers and penalized when it does not, without being given explicit step-by-step instructions on how to reason. Over many training rounds, the model figures out on its own when and how to call the search engine effectively. The framework extends the open-source work behind DeepSeek-R1 and is described by its authors as an open alternative to OpenAI's DeepResearch product.

The framework supports several popular language models as starting points, including Llama and Qwen variants, and works with multiple types of search backends: a local sparse retriever (keyword matching), a local dense retriever (meaning-based similarity search with a vector index), or online search engines. Researchers can also plug in their own datasets and their own retrieval systems.

The quick-start workflow shown in the README involves downloading a Wikipedia index, preparing a question-answering dataset, launching a local retrieval server, and then running a training script. After training, you can run inference to ask the trained model questions and watch it search and reason in real time.

The project is primarily aimed at AI researchers who want to experiment with tool-calling and search-augmented reasoning in language models. It requires GPU hardware and familiarity with setting up Python environments, model weights, and vector indexes. Two research papers are linked in the README with detailed results comparing the trained models against baselines.

Where it fits

Train a language model to automatically issue search queries during reasoning and incorporate results before answering.
Run inference on a trained search-augmented model and watch it query a local Wikipedia index in real time.
Plug in a custom retrieval system or dataset to experiment with different search backends during training.
Compare search-augmented reasoning against baseline models using the datasets and training scripts provided.

Open on GitHub → Full breakdown on explaingit →