gitmyhub

SearchSwarm

Python ★ 72 updated 3d ago

SearchSwarm trains a 30B-parameter agent to delegate complex research tasks to subagents, each gathering cited evidence in isolated contexts, then synthesizes final answers without holding the full process in memory.

Pythonms-swiftMegatronRaytorchrunKubernetesvLLMJSONsetup: hardcomplexity 5/5

SearchSwarm is a research project about teaching AI language models to handle complex, long-running research tasks more effectively. The core idea is training a main agent to break a large question into smaller pieces and hand those pieces off to helper agents called subagents. Each subagent works in its own isolated context, gathers relevant evidence, and returns a short, citation-backed report. The main agent then combines all those reports into a final answer without needing to hold the entire research process in memory at once.

The project includes a training pipeline to teach the main agent when to delegate work, how to give subagents clear instructions, and how to verify what they return. High-quality training data was built from cleaned agent trajectories that show the delegation process step by step. The result is a 30-billion-parameter model called SearchSwarm-30B-A3B, which the authors report achieves strong results compared to other open-source research agents of similar size on benchmarks like BrowseComp, GAIA, and xbench-DeepSearch.

The repository contains two main components: an evaluation framework and training scripts. The evaluation tool reads from a configuration file and supports two inference modes, one where the model is served by an external API-compatible endpoint, and one where it runs locally across eight GPU servers. Benchmark datasets are not bundled with the code; users need to obtain them from their official sources and convert them to a specific JSON format before pointing the tool at those files.

Training uses ms-swift's Megatron backend. The repository offers three launch paths for multi-node setups: a Ray-based option for cloud clusters, an SSH/torchrun path for traditional clusters, and a shared-filesystem path for schedulers like Kubernetes batch jobs. A single-GPU smoke test is also provided to validate the environment before running at full scale.

This repository is primarily a research artifact for people who want to reproduce the paper's results or extend the approach. It assumes access to significant GPU resources and familiarity with large-model training infrastructure.

Where it fits