Wan2.2
Wan: Open and Advanced Large-Scale Video Generative Models
Wan2.2 is an open-source AI model that generates short video clips from a text description or a starting image, running on consumer GPUs and supporting audio-driven and character animation variants.
Wan2.2 is an open-source AI system that generates videos from text descriptions or still images. You type what you want to see — or provide a starting image — and the model produces a short video clip. It is written in Python and released by Wan-AI.
The 2.2 version introduces several improvements over earlier releases. It uses a "Mixture-of-Experts" (MoE) architecture — a design where different specialist sub-models handle different parts of the video generation process, increasing capability without proportionally increasing computing cost. The model was trained on a substantially larger dataset than its predecessor, with about 65% more images and 83% more videos, improving the realism and variety of motion. It can generate video at 720P resolution at 24 frames per second, and the 5B (five-billion parameter) version of the model is designed to run on consumer graphics cards.
Beyond basic text-to-video and image-to-video, the project includes specialized models: one for audio-driven video (generating cinematic video from a speech recording), and one for character animation (replicating a person's movement and expressions from reference footage).
You would use this if you want to generate video content from text prompts or images without relying on a commercial service — for creative projects, research, or building AI-powered video tools. The model integrates with popular AI toolkits including ComfyUI and Diffusers. The full README is longer than what was provided.
Where it fits
- Generate short video clips from text prompts for creative projects without relying on a paid commercial video AI service.
- Animate a still image into a short video clip using the image-to-video model on a local consumer GPU.
- Create audio-driven cinematic video from a speech recording using the specialized audio-driven model.
- Build a custom AI video generation pipeline by integrating Wan2.2 with ComfyUI or the Diffusers library.