peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
PEFT is a Python library that lets you fine-tune large AI models cheaply by training only a tiny fraction of their parameters, so you can customize a model on your own data without needing expensive multi-GPU hardware.
PEFT, short for Parameter-Efficient Fine-Tuning, is a Python library from Hugging Face for adapting very large pretrained AI models to new tasks without retraining the whole thing. Fine-tuning a modern large language model or image-generation model normally means updating billions of parameters, which is slow, expensive, and hungry for GPU memory. PEFT instead trains only a small number of extra parameters on top of the frozen base model, which the README says can match the quality of full fine-tuning while using a fraction of the compute and storage. The library packages several PEFT methods — adapters, LoRA, soft prompts, and IA3 are the ones called out in the README, with conceptual guides for each. You install it with pip install peft and use it by wrapping a regular Transformers model with a configuration object such as LoraConfig and a helper called get_peft_model. The example in the README adapts the Qwen2.5-3B-Instruct model with LoRA at rank 16, ends up training only about 0.12% of the model's parameters, and saves a small adapter checkpoint that can be reloaded for inference with PeftModel.from_pretrained. PEFT is designed to plug into the rest of the Hugging Face stack: it works with Transformers for training and inference, Diffusers for managing adapters in image models like Stable Diffusion, and Accelerate for distributed training. The README also shows memory tables where a 3B-parameter model that needs about 47GB of GPU memory for full fine-tuning fits in roughly 14GB with LoRA, and notes that PEFT pairs well with quantization techniques like QLoRA to fine-tune big LLMs on a single consumer GPU. This is useful for anyone who wants to specialize a large open model on their own data — researchers, hobbyists with a single GPU, or teams that need many task-specific variants without storing many giant checkpoints.
Where it fits
- Fine-tune a large language model on your own dataset using a single consumer GPU with LoRA instead of full retraining.
- Add task-specific behavior to a Hugging Face Transformers model while keeping the base model frozen and unchanged.
- Save small adapter checkpoints per task instead of storing many full model copies.
- Combine PEFT with quantization techniques like QLoRA to fit large model fine-tuning into minimal GPU memory.