gitmyhub

4D_PM

Python ★ 31 updated 15d ago

[CVPR 2026 Oral] 4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction

CVPR 2026 research code that reconstructs 4D scenes from video by fitting geometric primitives to objects and tracking them persistently over time, even when they temporarily leave the frame.

PythonPyTorchCUDASAM 2Pi3Rerunsetup: hardcomplexity 5/5

4D Primitive-Mache is a research codebase from a paper accepted as an oral presentation at CVPR 2026, a major academic conference in computer vision. The project addresses a problem called persistent 4D scene reconstruction, which means building a model of a physical scene that tracks how objects move and change over time across multiple video frames, not just capturing a single static snapshot.

The core idea is to represent scenes using geometric primitives: simple shapes like ellipsoids or similar building blocks that can be positioned, oriented, and deformed. The paper proposes a method for fitting these primitives to video footage and tracking them persistently over time, even when objects temporarily disappear from view (for example, when a drawer closes and hides its contents). This property, called object permanence in the demo configurations, is what the word "persistent" in the title refers to.

The codebase is organized into three main parts. The frontend handles geometry estimation and object segmentation, using two external models called Pi3 and SAM 2. The core module handles mathematical optimization: specifically a Gauss-Newton solver that fits the primitives to the observed data. The object mapper handles motion tracking and assembles what the authors call 4D replay, which is a time-extended representation of the scene that can be replayed or inspected after reconstruction.

Running the system requires an NVIDIA graphics card, CUDA, and the PyTorch deep learning library. An install script sets up the environment, downloads model checkpoints, and configures paths automatically. Demo configurations for a robot arm dataset and two object-permanence scenarios (a drawer and a fridge) are included.

The README is technical and assumes familiarity with computer vision research. There is no graphical interface; results are visualized using an external tool called Rerun.

Where it fits