ai-toolkit
The ultimate training toolkit for finetuning diffusion models
An open-source toolkit for fine-tuning AI image, video, and audio generation models on consumer Nvidia GPUs, with both a browser UI and a command-line YAML config workflow.
AI Toolkit is a free, open-source training suite for AI image, video, and audio generation models. Training in this context means taking an existing AI model that can already generate images or video and teaching it new styles, subjects, or behaviors by showing it additional examples. The toolkit is designed to run on consumer-grade Nvidia graphics cards, with the goal of making this kind of AI customization accessible without requiring expensive specialized hardware.
The project supports a long list of image generation models, including several versions of FLUX, Stable Diffusion 1.5 and XL, and a range of others from various research groups. For video, it supports the Wan 2.1 and 2.2 series of models in various sizes, as well as LTX-2. There is also experimental support for audio generation through a model called Ace Step. The scope of supported models appears to update frequently based on what is currently popular in the AI generation community.
You can interact with the toolkit in two ways. The first is a web-based UI that runs locally in your browser and lets you start, stop, and monitor training jobs with a visual interface. It does not need to stay running for jobs to continue; you can close it and the training continues in the background. The second method is a command-line approach where you edit a configuration file in YAML format (a structured text file with settings and values) and run a Python script that reads it. Both approaches follow the same underlying process: you point the tool at a folder of training images, configure options like how long to train and how large of a model to create, and let it run.
Installation requires Python 3.10 or newer, Git, and an Nvidia GPU. The setup involves cloning the repository, creating a virtual environment (an isolated Python workspace), and installing the required packages. Instructions are provided for Linux, Windows, and experimentally for Apple Silicon Macs. A community Discord server is the primary support channel.
The full README is longer than what was shown.
Where it fits
- Fine-tune a FLUX or Stable Diffusion model on your own images to generate art in a custom style.
- Train a video generation model on example clips to produce new video content that matches your visual style.
- Run training jobs overnight using the web UI, which keeps working even after you close the browser tab.
- Use a YAML config file to automate and version-control your model training settings for repeatable experiments.