Wan2.2

Python ★ 16k updated 3mo ago

Wan: Open and Advanced Large-Scale Video Generative Models

Wan2.2 is an open-source AI model that generates short video clips from a text description or a starting image, running on consumer GPUs and supporting audio-driven and character animation variants.

PythonPyTorchComfyUIDiffusersCUDAsetup: hardcomplexity 4/5

Wan2.2 is an open-source AI system that generates videos from text descriptions or still images. You type what you want to see — or provide a starting image — and the model produces a short video clip. It is written in Python and released by Wan-AI.

The 2.2 version introduces several improvements over earlier releases. It uses a "Mixture-of-Experts" (MoE) architecture — a design where different specialist sub-models handle different parts of the video generation process, increasing capability without proportionally increasing computing cost. The model was trained on a substantially larger dataset than its predecessor, with about 65% more images and 83% more videos, improving the realism and variety of motion. It can generate video at 720P resolution at 24 frames per second, and the 5B (five-billion parameter) version of the model is designed to run on consumer graphics cards.

Beyond basic text-to-video and image-to-video, the project includes specialized models: one for audio-driven video (generating cinematic video from a speech recording), and one for character animation (replicating a person's movement and expressions from reference footage).

You would use this if you want to generate video content from text prompts or images without relying on a commercial service — for creative projects, research, or building AI-powered video tools. The model integrates with popular AI toolkits including ComfyUI and Diffusers. The full README is longer than what was provided.

Where it fits

Generate short video clips from text prompts for creative projects without relying on a paid commercial video AI service.
Animate a still image into a short video clip using the image-to-video model on a local consumer GPU.
Create audio-driven cinematic video from a speech recording using the specialized audio-driven model.
Build a custom AI video generation pipeline by integrating Wan2.2 with ComfyUI or the Diffusers library.

Open on GitHub → Full breakdown on explaingit →