codellama

Python ★ 16k updated 1y ago ▣ archived

Inference code for CodeLlama models

The official Python inference code for Meta's Code Llama AI models, specialized for understanding and writing code, available in sizes from 7B to 70B parameters for self-hosting on your own GPU hardware.

PythonPyTorchCUDAsetup: hardcomplexity 4/5

Code Llama is a family of large language models (AI systems trained on vast amounts of text and code) released by Meta, specialized for understanding and generating code. This repository contains the Python inference code — the scripts needed to load Code Llama model weights and run them locally to get predictions.

The family comes in multiple flavors: base models (Code Llama) for code completion, Python-specialized models (Code Llama - Python) tuned further on Python code, and instruction-following models (Code Llama - Instruct) that you can prompt in conversational style to ask coding questions. Each flavor is available in sizes of 7 billion, 13 billion, 34 billion, and 70 billion parameters — larger models are generally more capable but require more memory and hardware. The 7B model requires about 12.55 GB of storage, while the 70B model requires about 131 GB.

A notable feature is code infilling: the 7B and 13B base and instruct models can fill in a gap in existing code based on the surrounding context — useful for autocomplete-style features. All models support input contexts of up to 100,000 tokens, meaning they can consider large amounts of existing code when generating.

To use the models, you request download access via Meta's website, download the weights, and run inference locally using PyTorch with CUDA (a GPU computing framework). This is for developers who want to run Code Llama on their own infrastructure rather than calling a hosted API. The full README is longer than what was provided.

Where it fits

Run a local AI code completion model that fills in gaps in existing code without sending data to a third-party API
Self-host a 7B coding model for fast autocomplete in a private development environment
Use Code Llama Instruct in a conversational style to ask coding questions and get detailed answers
Build a custom code review or generation pipeline using the 34B or 70B model for higher accuracy

Open on GitHub → Full breakdown on explaingit →