transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Transformers is Hugging Face's Python library with ready-to-use definitions of thousands of state-of-the-art AI models for text, images, audio, video, and multimodal tasks, load a pretrained model from the Hub and run it in a few lines of code.
Transformers is a Python library from Hugging Face that provides ready-to-use definitions of state-of-the-art machine learning models for text, computer vision, audio, video and multimodal tasks, for both inference and training. The README calls it the model-definition framework: instead of every project rewriting the architecture of a model like a large language model or an image classifier from scratch, they import a shared definition from Transformers. It works by centralizing how models are described so that the same definition plugs into the wider ecosystem. The README says that if a model definition is supported, it will be compatible with most training frameworks such as Axolotl, Unsloth, DeepSpeed, FSDP and PyTorch-Lightning, inference engines such as vLLM, SGLang and TGI, and libraries such as llama.cpp and mlx. Models come from the Hugging Face Hub, which the README says holds over one million Transformers checkpoints, where a checkpoint is the saved weights of a trained model. The easiest entry point is the Pipeline API, shown in the Quickstart, which handles preprocessing and returns output for tasks like text-generation; the example loads a model from the Hub by name and runs a prompt through it in a few lines of Python. A command-line chat option is also referenced. You would actually use Transformers if you want to run a pretrained AI model in your own code, fine-tune one on your data, or build a chatbot, image classifier, transcription tool or multimodal app without writing the network yourself. The tech stack is Python 3.10 or newer with PyTorch 2.4 or newer; you install it via pip or uv.
Where it fits
- Run a pretrained language model on your own data for text generation, classification, or summarization in a few lines of Python.
- Fine-tune a Hugging Face model on your own dataset to build a custom text classifier or chatbot.
- Load an image classifier or audio transcription model from Hugging Face Hub without writing the network architecture yourself.
- Build a multimodal AI app that processes text and images together using a shared Transformers model definition.