mistral-inference
Official inference library for Mistral models
The official Python library for running Mistral AI's open-weight language models locally on your own GPU, with support for chat, function calling, image input via Pixtral, and fine-tuning with LoRA.
Mistral Inference is the official Python library for running Mistral AI's language models on your own hardware. Mistral AI is a French AI company that releases open-weight large language models, meaning the model files are publicly available for download and can be run locally rather than only through a cloud API.
The library requires a GPU (graphics card) to install and run, because it depends on a GPU-acceleration package called xformers. Once installed, you download the model weights you want and point the library's command-line tools at the folder. The main commands are mistral-demo for a quick test and mistral-chat for an interactive conversation. Larger models like the 8x7B and 8x22B Mixtral variants need multiple GPUs and are launched with the torchrun command.
Models are available in two ways: direct download links (tar archives from Mistral's servers) or through the Hugging Face Hub using a Python download helper. The library supports the full range of Mistral's model lineup, including Mistral 7B, the Mixtral mixture-of-experts models, Codestral (a code-focused variant), Mathstral (math-focused), Mistral Nemo, Mistral Large, and Pixtral (which can process images). Most models allow commercial use, but Codestral and Mistral Large carry a non-commercial research license.
Beyond chatting, the library supports function calling (letting the model invoke tools you define), fine-tuning on your own data using a technique called LoRA, and image input for the Pixtral models. Tutorials are included as Jupyter notebooks and can be opened directly in Google Colab.
Documentation is at docs.mistral.ai and community support is available via a Discord server.
Where it fits
- Run a local Mistral 7B chatbot on your own GPU without sending data to any external API.
- Fine-tune a Mistral model on a custom dataset using LoRA to specialize it for a specific task.
- Use Pixtral's image-input support to build a local pipeline that describes or reasons about images.