gitmyhub

Paper2Code

Python ★ 4.7k updated 3mo ago

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

An AI system that reads a machine learning research paper as a PDF or LaTeX file and automatically generates a working code repository implementing the paper's methods using a multi-agent pipeline.

PythonOpenAI APIvLLMsetup: moderatecomplexity 3/5

Paper2Code is a research project from a team at ICLR 2026 that attempts to automatically turn machine learning research papers into working code repositories. The core system, called PaperCoder, takes a paper as input, either as a PDF or as LaTeX source files, and produces a folder of code that implements the methods described in the paper.

The system works in three stages handled by multiple AI agents. A planning agent reads the paper and lays out what needs to be built. An analysis agent examines the technical details of the methods. A code generation agent then writes the actual code. The result is a structured output directory containing planning notes, analysis artifacts, and the final generated repository.

To use it, you need an API key for an AI provider. The default supported option is OpenAI, where running the system on a single paper costs roughly fifty to seventy cents using the o3-mini model. The project also supports running open-source language models locally using a framework called vLLM, with DeepSeek-Coder as the default model for that path. Instructions in the README walk through the steps to convert a PDF into the JSON format the system expects, or you can feed it LaTeX directly.

The README includes an example using the well-known "Attention Is All You Need" paper, which introduced the Transformer model. It also describes an evaluation framework for scoring how well the generated code matches a reference implementation, using either a reference-free approach (judged by the AI alone against the paper) or a reference-based approach (compared against the original authors' published code).

The project also released a benchmark dataset on HuggingFace called paper2code, which pairs machine learning papers with their corresponding official code repositories for evaluation purposes.

Where it fits