sr2am
SR²AM: Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
An AI research system that makes language model agents more efficient by deciding upfront how much reasoning a task needs, cutting unnecessary thinking tokens by up to 95% while staying competitive with models many times larger.
SR2AM is a research project from an AI lab focused on making AI agents more efficient at reasoning through complex tasks. The core idea is that current AI systems often generate excessive thinking text before taking action, which is slow and costly. SR2AM tries to fix this by teaching an AI model to decide upfront how much planning a given task actually needs, rather than always reasoning at maximum depth.
The system works by splitting the AI's process into three roles: one part handles direct, step-by-step reasoning and action; a second part mentally simulates what would happen if it took a particular action, acting like an internal planner; and a third part decides when and how much planning is worth doing before acting. This separation is all handled within a single language model's chain of thought, not three separate systems, which keeps things practical.
The researchers release two models built on top of Qwen3 (a family of open AI language models): an 8-billion-parameter version that they say is competitive with models 15 to 40 times larger, and a 30-billion-parameter version that they claim matches systems in the 685 billion to 1 trillion parameter range while using 25 to 95 percent fewer reasoning tokens. These claims are benchmarked against math, science, and web-navigation tasks.
To run SR2AM on your own questions, you need several external services configured: a web search API (SerpAPI by default), a code execution sandbox, a separate language model for summarizing web pages, and a GPU setup capable of running large models. The 8B model needs roughly 16GB of GPU memory; the 30B model needs four GPUs running in parallel. Input data is a JSONL file with one question per line.
This is an academic release tied to a paper on arXiv. It is aimed at researchers and engineers working on AI agent systems who want to reproduce the results or test the method on their own benchmarks. The setup is nontrivial and assumes familiarity with running large language models on GPU hardware.
Where it fits
- Run the SR2AM 8B model on math and science benchmarks to compare reasoning efficiency against larger baseline models.
- Test the meta-planning approach on web-navigation tasks to measure reasoning token reduction.
- Reproduce the paper's results on your own GPU cluster using the JSONL question input format.