notebooks

Jupyter Notebook ★ 9.5k updated 1mo ago

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.

A collection of 59 step-by-step computer vision tutorials covering object detection, segmentation, tracking, OCR, and pose estimation, each runnable for free in Google Colab with no local install needed.

PythonJupyter NotebookPyTorchYOLOGoogle Colabsetup: easycomplexity 3/5

This repository is a collection of step-by-step tutorials for working with computer vision models, meaning models that analyze and interpret images or video. There are currently 59 notebook tutorials covering a wide range of tasks: detecting objects in images, segmenting images by drawing precise outlines around objects, tracking objects across video frames, reading text from images (OCR), classifying images by category, and estimating body poses.

Each tutorial is a Jupyter Notebook, which is a document that mixes explanatory text with runnable code. You do not need to install anything on your own computer. Each notebook has a button to open it directly in Google Colab, which is a free online environment where you can run the code in a browser. Kaggle and SageMaker Studio Lab are also offered as alternatives.

The tutorials cover many of the most widely used models in the field, including several versions of YOLO (a fast object detection model family), SAM (a model that can segment any object in an image when you point at it), Florence, PaliGemma, Qwen, and RF-DETR. For each model, the notebook typically walks through loading the model, running it on sample data, and often fine-tuning it on a custom dataset using images you supply or fetch from Roboflow, the company that maintains this repository.

Roboflow builds tools for managing computer vision datasets and deploying models, so many notebooks connect to their platform at some point, though the core model usage is generally accessible without a paid account.

The full README is longer than what was shown.

Where it fits

Fine-tune a YOLO model on your own images to detect custom objects using free Google Colab GPUs
Segment any object in a photo by pointing at it with a SAM model notebook
Track objects across video frames using a pre-trained detection and tracking model
Read text from images using an OCR model tutorial without installing anything locally

Open on GitHub → Full breakdown on explaingit →