segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Segment Anything Model (SAM) is an AI model from Meta's research team that can identify and cut out any object in an image, even objects it has never been specifically trained to recognize. Traditional image segmentation tools require training on labeled examples of the exact type of object you want to detect. SAM works differently: it accepts a simple prompt such as a point click, a bounding box drawn around an object, or a text description, and it generates a precise mask (a pixel-level outline) of the corresponding object. It can also automatically generate masks for every distinct object in an entire image without any prompt at all.
Under the hood, SAM was trained on a dataset of 11 million images and over 1 billion annotated masks, giving it broad visual knowledge. The model architecture uses a Vision Transformer (a type of neural network designed for image understanding) to encode images into a representation that the mask decoder can then use to respond to prompts. The model is available in three sizes with different accuracy and speed tradeoffs. The lightweight mask decoder can also be exported to the ONNX format, which is a standard format for running models in environments other than Python, including in web browsers.
You would use SAM if you are a computer vision researcher or developer who needs flexible, zero-shot image segmentation for tasks like photo editing, medical imaging, satellite image analysis, robotics perception, or any application where you need to isolate objects in images. The tech stack is Python with PyTorch and torchvision, with example Jupyter Notebooks included. A newer version called SAM 2 extending these capabilities to video is also available.