oumi
Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!
An open-source platform covering the full large language model lifecycle, data prep, training, fine-tuning, evaluation, and deployment, supporting models from 10M to 405B parameters on laptops, clusters, and major cloud providers.
Oumi is an open-source platform for working with large language models (LLMs) from start to finish. It covers the full process: preparing your training data, actually training or adjusting an existing AI model, evaluating how well it performs, and deploying it so others can use it. The project is aimed at researchers, developers, and teams who want to work with state-of-the-art AI models without building all the infrastructure themselves.
The platform supports a wide range of popular models, including Llama, DeepSeek, Qwen, Phi, and others. It handles both text-only models and models that can process images alongside text. You can train models that range from very small (10 million parameters) to very large (405 billion parameters), and the platform supports several training techniques including standard fine-tuning, LoRA, QLoRA, and GRPO, which are different methods for adjusting a pre-trained model to a specific task while managing compute costs.
Oumi is designed to run in many environments. You can experiment on a laptop, scale up to a computing cluster, or run jobs on major cloud providers like AWS, Azure, and GCP. It integrates with both open-source models and commercial AI providers such as OpenAI and Anthropic, all through a single consistent interface. This means you can swap out which model or provider you use without rewriting your workflow.
The platform also includes tools for building and cleaning training datasets using AI-based quality checks, running models in production using popular inference engines, and measuring model quality across standard benchmarks. A recent addition is an MCP server that lets you connect Oumi models directly to tools like Claude and Cursor.
Installation is via pip, and the project provides a quickstart guide along with a set of Jupyter notebooks covering common tasks. Documentation and community support are available on the project website and a Discord server. The full README is longer than what was shown.
Where it fits
- Fine-tune a Llama or Qwen model on your own dataset using LoRA or QLoRA to reduce compute costs.
- Evaluate a language model across standard benchmarks without writing custom evaluation code.
- Deploy a fine-tuned model to AWS or GCP and connect it to Claude or Cursor via the built-in MCP server.
- Build and clean a training dataset using AI-based quality checks before starting a training run.