gitmyhub

Wan2.2

Python ★ 16k updated 3mo ago

Wan: Open and Advanced Large-Scale Video Generative Models

Wan2.2 is an open-source AI model that generates short video clips from a text description or a starting image, running on consumer GPUs and supporting audio-driven and character animation variants.

PythonPyTorchComfyUIDiffusersCUDAsetup: hardcomplexity 4/5

Wan2.2 is an open-source AI system that generates videos from text descriptions or still images. You type what you want to see — or provide a starting image — and the model produces a short video clip. It is written in Python and released by Wan-AI.

The 2.2 version introduces several improvements over earlier releases. It uses a "Mixture-of-Experts" (MoE) architecture — a design where different specialist sub-models handle different parts of the video generation process, increasing capability without proportionally increasing computing cost. The model was trained on a substantially larger dataset than its predecessor, with about 65% more images and 83% more videos, improving the realism and variety of motion. It can generate video at 720P resolution at 24 frames per second, and the 5B (five-billion parameter) version of the model is designed to run on consumer graphics cards.

Beyond basic text-to-video and image-to-video, the project includes specialized models: one for audio-driven video (generating cinematic video from a speech recording), and one for character animation (replicating a person's movement and expressions from reference footage).

You would use this if you want to generate video content from text prompts or images without relying on a commercial service — for creative projects, research, or building AI-powered video tools. The model integrates with popular AI toolkits including ComfyUI and Diffusers. The full README is longer than what was provided.

Where it fits