llama2.c

C ★ 20k updated 1y ago

Inference Llama 2 in one file of pure C

Run Llama 2 AI language models in a single 700-line C file with no dependencies. Download tiny pre-trained models or export Meta's weights, compile, and generate text locally.

CPythonPyTorchsetup: easycomplexity 2/5

llama2.c is a minimalist project that lets you run Llama 2 — Meta's large language model AI — using a single file of plain C code with no external dependencies. The problem it solves is making AI language models approachable for learning and experimentation: instead of a huge complex codebase, you get one readable 700-line file that handles the inference (the "run the AI" part), plus PyTorch code for training smaller versions from scratch.

The way it works is that you either download one of the pre-trained "TinyLlamas" (small models trained on short stories, ranging from 15M to 110M parameters) or export Meta's official Llama 2 weights into the project's format. You then compile and run the C file, which reads the model and generates text. It runs surprisingly fast — around 110 tokens per second on an M1 MacBook Air for the small models. You can give it a text prompt and it will continue the story or answer in kind.

You'd use this if you want to understand how AI language models work at a low level, run a tiny AI locally without Python or heavy frameworks, or just experiment with text generation for educational purposes. The tech stack is C for inference, Python and PyTorch for training.

Where it fits

Run a small AI language model on your laptop without Python or heavy frameworks installed.
Learn how language model inference works by reading and modifying a single readable C file.
Train and experiment with tiny versions of Llama 2 for educational projects or prototyping.

Open on GitHub → Full breakdown on explaingit →