ms-swift

Python ★ 15k updated 2d ago

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

A Python framework for training, fine-tuning, evaluating, and deploying large AI language and multimodal models, supporting 600+ models on hardware from consumer GPUs to multi-GPU clusters, with a full pipeline from raw weights to a running API.

PythonPyTorchLoRAvLLMCUDAsetup: hardcomplexity 5/5

ms-swift is a Python framework from the ModelScope community that makes it easier to train and fine-tune large AI language models. Fine-tuning means taking an existing model that was already trained on large amounts of text and then training it further on your own data so it performs better for a specific task. The project covers the full pipeline from training through evaluation, quantization, and deployment so you can take a model from raw weights to a running API endpoint.

The library supports over 600 text-only models and over 400 multimodal models (ones that can handle images, video, and audio alongside text). It works with well-known model families such as Qwen3, Llama4, DeepSeek-R1, InternLM3, GLM4.5, and many others. The hardware requirements are flexible: you can run it on consumer GPUs like RTX cards, datacenter GPUs like A100 and H100, Apple MPS, CPU-only machines, and Ascend NPUs.

Training large models normally requires enormous amounts of GPU memory. ms-swift addresses this through several techniques. It supports lightweight fine-tuning methods such as LoRA and QLoRA that only update a small fraction of a model weights, keeping memory use low enough that a 7-billion-parameter model can be trained on as little as 9 GB of GPU memory. It also includes options such as Flash Attention, gradient checkpointing, and parallel training strategies that split work across multiple GPUs or machines.

Beyond standard instruction fine-tuning, the framework supports reinforcement learning alignment, preference learning methods such as DPO and KTO, embedding and reranker training, and several GRPO-family algorithms used to improve model reasoning. Once a model is trained, ms-swift can quantize it to a smaller size and deploy it behind an OpenAI-compatible API endpoint using vLLM, SGLang, or LmDeploy for faster inference.

A web interface is included if you prefer not to work on the command line. The project was accepted at AAAI 2025 and has an associated academic paper. The full README is longer than what was shown.

Where it fits

Fine-tune a 7-billion-parameter language model on your own dataset using a single consumer GPU with 9 GB of memory.
Train a multimodal model that understands images, video, and audio alongside text using LoRA or QLoRA.
Deploy a fine-tuned model behind an OpenAI-compatible API endpoint using vLLM or SGLang for fast inference.
Apply DPO or GRPO preference learning to improve a model's reasoning and alignment with human feedback.

Open on GitHub → Full breakdown on explaingit →