sam2

Jupyter Notebook ★ 19k updated 20d ago

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

SAM 2 (Segment Anything Model 2) is an AI model from Meta's research lab that can automatically identify and outline any object in a photo or video — a task called "image segmentation." You point it at an object (by clicking, drawing a box, or specifying a point), and it precisely traces the boundary of that object. The key upgrade over the original SAM is that it works on video too, tracking the object frame-by-frame across the entire clip, even as the object moves or partially disappears.

Under the hood, it uses a transformer architecture — the same family of neural networks behind modern language models — plus a "streaming memory" system that lets it remember where an object was in previous frames to keep tracking it in later ones. Meta also released a large new video segmentation dataset (SA-V) that was used to train the model. Multiple size variants are available (tiny, small, base plus, large), and the model can be compiled for faster video processing.

You'd use this when you need to isolate objects in photos or videos: cutting out subjects for video editing, training other AI models that need labeled object data, analyzing medical scans, or building apps that need to "understand" where things are in an image. It requires Python 3.10 or higher, PyTorch 2.5.1 or higher, and a GPU. Usage examples are provided as Jupyter notebooks.

Open on GitHub → Full breakdown on explaingit →