magic-animate
[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"
MagicAnimate turns a still photo of a person into a video by animating them to follow a body motion sequence, using a Stable Diffusion model extended to produce temporally consistent frames.
MagicAnimate is a research project from the National University of Singapore and ByteDance that turns a still photo of a person into a short video by making the person follow a sequence of body movements. You provide two inputs: a reference image of the person you want to animate, and a motion sequence (a series of skeleton or pose frames representing how a body should move). The system then generates a video where the person in your photo performs those movements while keeping their appearance consistent across frames.
The technique is built on top of Stable Diffusion, a widely used image generation system. Stable Diffusion works by learning to produce images from a kind of structured randomness, and researchers have found it can be extended to produce video frames as well. MagicAnimate adds specific components for locking in the person's appearance across time and for reading the body pose information from a format called DensePose, which maps body positions onto a surface representation of a human figure.
To use the code, you need a machine with a compatible GPU, Python 3.8 or higher, and video-processing software called ffmpeg. Setup involves downloading several pretrained model files from Hugging Face (a platform for hosting AI models) and placing them in a specific directory structure before running the included scripts. The README provides step-by-step folder layout instructions and commands for running on one or multiple GPUs. There is also an online demo hosted on Hugging Face Spaces where you can try it without installing anything.
This work was presented at the CVPR 2024 conference, which is one of the main academic conferences for computer vision research. The code was released in late 2023. The repository does not state a license in the README beyond linking the paper and models.
Where it fits
- Animate a portrait photo to follow a dance or body movement sequence using DensePose pose data
- Generate consistent person animations for creative video projects without motion capture hardware
- Experiment with human video generation research using pretrained appearance encoder models
- Try human animation in the browser via the Hugging Face Spaces demo without any local setup