MoVerse
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
MoVerse is a research project that turns a single ordinary photo into a navigable 3D world you can move through in real time. Feed it one image taken with a normal camera and the system constructs a full 360-degree environment, then lets a virtual camera roam through it while rendering video on the fly.
The process runs in three stages. First, the system takes the input photo and expands it into a complete 360-degree panoramic image using a diffusion-based generation model. Second, that panorama gets converted into a 3D structure called a Gaussian scaffold: a representation built from tiny positioned blobs in three-dimensional space that can be rendered very efficiently. Third, as the virtual camera moves along a user-specified path, the system generates photorealistic video frames from the scaffold in real time at eight frames per second on a single high-end graphics card (an RTX 4090 in the tests).
The output is meant to look like a real video walkthrough rather than an obviously computer-generated scene. The paper demonstrates results across many types of environments: indoor rooms, outdoor plazas, ancient ruins, and stylized settings like anime landscapes and cyberpunk streets.
This is an academic research repository. A preprint is available on arXiv and the project page hosts demo videos, interactive panorama viewers, and 3D scaffold visualizations. The code and trained model weights have not been released yet. The team notes the delay is due to a corporate compliance and security review expected to take about one month from when the repository was created.