BackgroundMattingV2

Python ★ 7.2k updated 2y ago

Real-Time High-Resolution Background Matting

Background Matting V2 is a research AI model that removes backgrounds from photos and videos in real time at up to 4K resolution, no green screen needed, just a clean reference photo of the empty background.

PythonPyTorchTensorFlowONNXTorchScriptCUDAsetup: hardcomplexity 3/5

Background matting is the process of separating a person or object from its background in a photo or video, without a green screen. This repository contains the code and pre-trained model weights for a research paper from the University of Washington on doing that separation in real time at high resolution. The paper received a Best Student Paper Honorable Mention at CVPR 2021, a major computer vision research conference.

The approach requires one extra step compared to green-screen setups: before filming, you take a photo of the background without anyone in it. The model then compares that reference image to each frame of the video or image you want to process, using the difference to figure out what is subject and what is background. This technique lets the model run at 4K resolution at 30 frames per second, or at HD resolution at 60 frames per second, on a high-end consumer graphics card.

The repository includes three scripts: one for processing a folder of images, one for processing a video file, and one for running interactively with a webcam. Google Colab notebooks are also provided so you can try the model online without installing anything locally. The model runs through PyTorch, TorchScript, TensorFlow, or ONNX, depending on what your existing workflow uses.

Two datasets are published alongside the code: VideoMatte240K and PhotoMatte85. The README notes that the video processing scripts are for testing and experimentation only. They do not include hardware-accelerated encoding or decoding, so production use would require additional engineering beyond what the repository provides.

A follow-up paper called Robust Video Matting improved on this work by removing the requirement for a background reference image entirely. A Linux plugin is also available that pipes webcam footage through the model for use in video calls. The project is released under the MIT License, which permits commercial use.

Where it fits

Remove the background from a recorded video using only a reference photo of the empty background.
Run real-time background removal on webcam footage at HD resolution for use in video calls.
Try the model in Google Colab without installing anything locally to evaluate whether it fits your use case.
Integrate the model into an existing ML pipeline using PyTorch, TensorFlow, or ONNX depending on your stack.

Open on GitHub → Full breakdown on explaingit →