gitmyhub

nanoGPT

Python ★ 59k updated 7mo ago

The simplest, fastest repository for training/finetuning medium-sized GPTs.

A minimal, readable Python implementation of GPT-2 that trains language models to predict and generate text. Learn how GPT works by reading ~600 lines of clean code.

PythonPyTorchCUDAsetup: moderatecomplexity 3/5

nanoGPT is a minimal Python codebase for training and fine-tuning GPT-style language models, designed to be readable and hackable rather than production-hardened. GPT models are neural networks that learn to predict the next word in a sequence and can be fine-tuned to generate text that continues a given prompt. The project was written to reimplement the original GPT-2 architecture in as few lines as possible while still achieving the same training results, making the internals easy to understand and modify. The README notes that this repository is now deprecated and that its successor, nanochat, is the recommended alternative for new users.

The entire project consists of two main files: a roughly 300-line training loop and a roughly 300-line model definition. Despite this simplicity, it can reproduce GPT-2 with 124 million parameters on standard benchmark datasets when run on appropriate hardware, around 4 days on 8 high-end GPUs. For experimentation on smaller hardware, it includes examples for training a character-level model on Shakespeare's works in about 3 minutes on a single GPU, or more slowly on a CPU or Apple Silicon Mac. The code supports distributed training across multiple GPUs using PyTorch's built-in parallelism tools, and can also load pre-trained GPT-2 weights from OpenAI as a starting point for fine-tuning. The tech stack is Python with PyTorch as the deep learning framework. You would use nanoGPT when you want to understand how GPT training works from first principles by reading clean, commented code, when you want a starting point for language model research, or when you need to fine-tune a GPT-style model on a custom dataset without wading through a large framework.

Where it fits