PaddleFormers

Python ★ 13k updated 20h ago

PaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.

PaddleFormers is a high-performance library for training and fine-tuning 100+ large AI models on PaddlePaddle, with distributed multi-GPU support and faster throughput than Megatron-LM for models like DeepSeek-V3 and Llama-3.

PythonPaddlePaddleCUDADockerLoRAsetup: hardcomplexity 5/5

PaddleFormers is a library for training large AI models, built on top of PaddlePaddle, which is Baidu's deep learning framework. It provides a model library and training toolkit similar in purpose to the widely known Hugging Face Transformers, but optimized for PaddlePaddle's ecosystem and for high-performance distributed training across many GPUs.

The library supports over 100 models, covering both large language models (models that process and generate text) and vision-language models (models that handle both images and text together). Supported model families include DeepSeek-V3, Qwen2 and Qwen3, Llama-3, GLM-4.5, Baidu's own ERNIE-4.5 series, and several others. The README is written primarily in Chinese.

The main technical focus is training efficiency. The library implements strategies for spreading training across many machines at once, including tensor parallelism, pipeline parallelism, and expert parallelism. It also uses lower-precision arithmetic and other optimizations to reduce memory and compute usage during training. According to the README, training speed for key models such as DeepSeek-V3 and GLM-4.5-Air exceeds that of Megatron-LM, which is a commonly used benchmark for large-scale training performance.

Beyond initial training from scratch, the library supports the full workflow including fine-tuning with various techniques such as LoRA (a method for adapting a model with far fewer parameters than full retraining) and alignment training methods. Models trained with PaddleFormers can be saved in a format compatible with other tools like vLLM and SGLang, so they can be deployed outside of PaddlePaddle.

Installation is via Docker image or pip, and requires Python 3.10 or later along with CUDA-enabled GPUs for training.

Where it fits

Fine-tune a large language model like Llama-3 or Qwen2 on your own dataset using LoRA with far fewer GPU resources than full retraining.
Train large AI models across many GPUs using tensor and pipeline parallelism for faster throughput.
Export a PaddleFormers-trained model to a format compatible with vLLM or SGLang for production deployment.

Open on GitHub → Full breakdown on explaingit →