peft

Python ★ 21k updated 1d ago

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

PEFT is a Python library that lets you fine-tune large AI models cheaply by training only a tiny fraction of their parameters, so you can customize a model on your own data without needing expensive multi-GPU hardware.

PythonPyTorchTransformersDiffusersAcceleratesetup: moderatecomplexity 3/5

PEFT, short for Parameter-Efficient Fine-Tuning, is a Python library from Hugging Face for adapting very large pretrained AI models to new tasks without retraining the whole thing. Fine-tuning a modern large language model or image-generation model normally means updating billions of parameters, which is slow, expensive, and hungry for GPU memory. PEFT instead trains only a small number of extra parameters on top of the frozen base model, which the README says can match the quality of full fine-tuning while using a fraction of the compute and storage. The library packages several PEFT methods — adapters, LoRA, soft prompts, and IA3 are the ones called out in the README, with conceptual guides for each. You install it with pip install peft and use it by wrapping a regular Transformers model with a configuration object such as LoraConfig and a helper called get_peft_model. The example in the README adapts the Qwen2.5-3B-Instruct model with LoRA at rank 16, ends up training only about 0.12% of the model's parameters, and saves a small adapter checkpoint that can be reloaded for inference with PeftModel.from_pretrained. PEFT is designed to plug into the rest of the Hugging Face stack: it works with Transformers for training and inference, Diffusers for managing adapters in image models like Stable Diffusion, and Accelerate for distributed training. The README also shows memory tables where a 3B-parameter model that needs about 47GB of GPU memory for full fine-tuning fits in roughly 14GB with LoRA, and notes that PEFT pairs well with quantization techniques like QLoRA to fine-tune big LLMs on a single consumer GPU. This is useful for anyone who wants to specialize a large open model on their own data — researchers, hobbyists with a single GPU, or teams that need many task-specific variants without storing many giant checkpoints.

Where it fits

Fine-tune a large language model on your own dataset using a single consumer GPU with LoRA instead of full retraining.
Add task-specific behavior to a Hugging Face Transformers model while keeping the base model frozen and unchanged.
Save small adapter checkpoints per task instead of storing many full model copies.
Combine PEFT with quantization techniques like QLoRA to fit large model fine-tuning into minimal GPU memory.

Open on GitHub → Full breakdown on explaingit →