MV-RoMa
[CVPR 2026 Highlight] MV-RoMa: From Pairwise Matching into Multi-View Track Reconstruction
A Python research library that finds matching points across multiple photos simultaneously, keeping correspondences consistent across the whole set, to enable cleaner 3D reconstruction from ordinary images.
MV-RoMa is a Python library and research project from a group of computer vision researchers, presented at a major academic conference called CVPR in 2026. The goal is to find matching points between photographs, which is a core step in building 3D models from ordinary images.
When you take several photos of the same object or scene from different angles, software can reconstruct a 3D model by figuring out which spot in one photo corresponds to which spot in another. Most existing tools compare two photos at a time. MV-RoMa does this with multiple photos simultaneously, keeping matches consistent across the whole set rather than treating each pair independently. The result is cleaner point tracks, meaning a single real-world location can be reliably followed across many photos.
The library comes with pre-trained neural network weights for outdoor scenes (trained on a dataset called MegaDepth) and for indoor scenes. You give the model one source image and several target images, and it returns a map showing where each pixel in the source lands in each target, along with a confidence score for each prediction.
Running the project requires a computer with a compatible NVIDIA GPU, Python 3.10 or later, and the PyTorch deep learning framework. Setup involves installing several dependencies including a separate library called UFM. A demo script is included so you can test the model on your own images right after setup.
This is a research tool intended for computer vision engineers and researchers working on 3D reconstruction pipelines. It is not a consumer product, and using it effectively requires familiarity with deep learning and image processing concepts.
Where it fits
- Match points across 10+ photos of a scene simultaneously to get cleaner, globally consistent correspondences for 3D reconstruction.
- Use pre-trained outdoor or indoor models to get pixel-level matches with confidence scores without training from scratch.
- Replace pairwise feature matching in an existing 3D reconstruction pipeline with MV-RoMa's multi-view consistent approach.
- Test point matching quality on your own images using the included demo script right after setup.