gitmyhub

autoresearch

Python ★ 86k updated 2mo ago

AI agents running research on single-GPU nanochat training automatically

An experimental system that lets an AI agent automatically run machine-learning research overnight on a single GPU, modifying training code and iterating to improve model performance.

PythonPyTorchNVIDIA GPUuvsetup: hardcomplexity 3/5

autoresearch is a small experimental setup that lets an AI agent run machine-learning research on a single GPU automatically, overnight. The idea is to give the agent a working but simplified language-model training pipeline and let it experiment by itself: it modifies the training code, runs a short training, checks whether the result improved, keeps or discards the change, and repeats. You wake up the next day to a log of experiments and, hopefully, a better model.

The training code is a simplified single-GPU implementation drawn from a related project called nanochat. The repository is deliberately tiny and centers on three files. prepare.py handles one-time data preparation — it downloads training data and trains a tokenizer — plus runtime utilities. The agent does not touch this file. train.py is the single file the agent edits and contains the full model, optimizer, and training loop, so architecture, hyperparameters, batch size, and similar choices are all fair game. program.md is a short instructions file that you, the human, edit to set up your "research org" — it is what you point the agent at to start a run.

Each training run uses a fixed five-minute wall-clock time budget, no matter the hardware. The metric tracked is val_bpb (validation bits per byte), where lower is better. The fixed budget means roughly twelve experiments per hour and around a hundred while you sleep, and it lets architectural changes be compared fairly.

Someone would use autoresearch to tinker with autonomous AI research loops or to study how an agent iterates on a real training pipeline. Requirements are a single NVIDIA GPU, Python 3.10 or newer, and the uv project manager.

Where it fits