gitmyhub

ml-sharp

Python ★ 8.6k updated 6mo ago

Sharp Monocular View Synthesis in Less Than a Second

SHARP is Apple's research tool that generates realistic novel viewpoints of a scene from a single photograph using 3D Gaussian splatting, producing a full 3D representation in under a second on a GPU.

PythonPyTorchCUDA3D Gaussian Splattingsetup: hardcomplexity 4/5

SHARP is a research project from Apple that takes a single photograph as input and generates realistic images of the same scene from nearby camera angles. In other words, you give it one picture, and it produces what the scene would look like from slightly different positions, creating a sense of three-dimensional depth from a flat image.

The way it works is that a trained neural network looks at the photo and quickly figures out a three-dimensional representation of the scene using a technique called 3D Gaussian splatting. This representation stores the scene as a large collection of small fuzzy blobs in three-dimensional space, each with color and opacity information. Once that representation is built, a rendering engine can produce new viewpoints in real time. The whole process from photo to 3D representation takes under a second on a standard graphics card, which the paper describes as three orders of magnitude faster than previous approaches. The output files are compatible with existing 3D Gaussian rendering tools.

The project accompanies a research paper and includes a command-line tool called sharp. After installing the Python dependencies, you point it at a folder of input images and it writes the resulting 3D representation to an output folder. The model weights are downloaded automatically on the first run. A separate render command can then produce video along a camera path, though that step currently requires an NVIDIA GPU.

The representation uses real-world scale, so camera movements correspond to actual distances rather than arbitrary units. The authors report that SHARP improves on previous methods by measurable amounts on several image quality benchmarks.

The code and model are released under separate licenses, each with their own terms.

Where it fits