llama

Python ★ 59k updated 1y ago

Inference code for Llama models

Deprecated repository that originally provided Meta's Llama 2 language model weights and inference code. Now redirects to newer maintained repositories.

PythonPyTorchCUDAtorchrunsetup: hardcomplexity 3/5

This repository was the original home for Meta's Llama 2 language model inference code, but it is now deprecated. The README itself explains that Meta has consolidated its model repositories and this one is no longer being maintained. The original purpose was to provide the model weights and minimal Python code needed to load and run Llama 2, which was Meta's open-weights large language model ranging from 7 billion to 70 billion parameters. A large language model, or LLM, is an AI system trained on vast amounts of text that can generate coherent, contextually appropriate responses to prompts and questions.

When this repository was active, you would download the model weights from Meta's website after accepting a license agreement, then use a command called torchrun to launch the model and send it text prompts to complete or answer. The inference code used PyTorch as the deep learning framework and required CUDA-capable hardware for the larger model sizes. Different model sizes required different numbers of GPUs to run, with the smallest 7-billion-parameter version fitting on a single GPU and the 70-billion-parameter version requiring eight. The project's primary usefulness was giving researchers and developers access to a capable open-weights model they could run locally and adapt without API costs. The README now directs users to newer, actively maintained repositories including llama-models, PurpleLlama for safety tooling, and llama-cookbook for practical usage examples. You would only encounter this repository when following older tutorials or tracing the history of the Llama model family.

Where it fits

Run Llama 2 language model locally on your own hardware without API costs.
Fine-tune or adapt Llama 2 for custom tasks using the provided inference framework.
Research and experiment with open-weights large language models at different scales.

Open on GitHub → Full breakdown on explaingit →