Wav2Lip

Python ★ 13k updated 1y ago

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Wav2Lip is a research tool that replaces the lip movements in a video to match a different audio track, enabling cross-language dubbing on real or animated faces using pretrained AI models.

PythonPyTorchsetup: moderatecomplexity 3/5

Wav2Lip is a research project that automatically synchronizes the lip movements in a video to match a separate audio track. In plain terms, you give it a video of a person talking and a different audio file, and it produces a new video where the person's mouth moves to match the new audio. This can work across different languages, voices, and identities, including animated or computer-generated faces.

The project came out of a research paper published at ACM Multimedia 2020, and the repository contains the full training and inference code along with pretrained model weights. For someone who wants to try it without writing code, a Google Colab notebook is provided, which lets you run the process in a browser using cloud computing resources without installing anything locally.

The README describes two separate paths for using the technology. The first is the original open-source version, which is free for non-commercial use and requires setting up Python, downloading pretrained models, and running inference scripts locally. The second is a commercial API offered by Sync Labs (sync.so), which the README now promotes prominently as a higher-quality option. The commercial version requires creating an account, getting an API key, and calling the API from Python or TypeScript code. The two paths are independent.

For the open-source path, the setup involves downloading model checkpoints, installing Python dependencies, and running a command-line script that takes a video file and an audio file as inputs. The output is a video file with the lips resynced. The README also covers how to train the model from scratch using your own data, and how to evaluate the quality of results.

The open-source code is available for research use. The commercial Sync Labs product operates under separate terms from the original research code.

Where it fits

Dub a video into a different language by providing new audio while keeping the original speaker's face.
Replace or retime speech in a talking-head video so mouth movements match new recorded audio.
Run lip-sync inference on custom video and audio pairs using the provided pretrained model weights.

Open on GitHub → Full breakdown on explaingit →