torchtune

Python ★ 5.8k updated 18h ago

PyTorch native post-training library

A Python library from Meta for fine-tuning large language models using methods like LoRA, QLoRA, and DPO. Active development ended in 2025 but the code remains publicly available.

PythonPyTorchLoRAQLoRAHugging FaceYAMLsetup: hardcomplexity 4/5

torchtune was a Python library built by Meta's PyTorch team for fine-tuning and experimenting with large language models. Development wound down in 2025, but the code remains publicly available and was shaped by contributions from over 150 people during its active period.

The central concept is post-training: taking a pre-built AI model and adjusting it to new tasks, datasets, or behaviors. torchtune supported several methods for doing this. Supervised fine-tuning updates the model's weights directly using labeled examples. LoRA and QLoRA are lighter alternatives that train only a small fraction of the model's parameters rather than the whole thing, which cuts down on GPU memory requirements considerably. Knowledge distillation trains a smaller model to behave like a larger one. DPO, PPO, and GRPO are reinforcement-learning-style techniques used to align a model's responses with human preferences.

Running a training job meant picking a recipe (the training method) and a config file (YAML format), then calling the tune run command. The library shipped ready-made configs for a range of well-known models: Llama 4, Llama 3.x, Mistral, Gemma 2, Phi-4, Qwen 2.5, and others. Model weights were loaded from Hugging Face Hub or Kaggle Hub.

The library was designed to run on a single GPU, multiple GPUs on one machine, or multiple machines at once. Its focus was on memory efficiency and performance using PyTorch's built-in APIs, keeping the training code readable and modifiable rather than hiding it behind heavy abstractions.

Because active development has ended, there is no ongoing support. The README links to a GitHub issue that explains the shutdown decision for anyone looking for background on why the project was wound down.

Where it fits

Fine-tune Llama or Mistral on your own dataset using LoRA to cut GPU memory requirements.
Run multi-GPU distributed training jobs using PyTorch's native APIs.
Apply DPO or GRPO techniques to align a language model's outputs to human preferences.
Distill a large language model into a smaller, faster version using the built-in recipes.

Open on GitHub → Full breakdown on explaingit →