yolov10

Python ★ 11k updated 1y ago

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]

YOLOv10 is a real-time object detection model that draws labeled boxes around things it spots in images or video, running faster than earlier YOLO versions by skipping a post-processing cleanup step.

PythonPyTorchHugging Facesetup: moderatecomplexity 4/5

YOLOv10 is a computer vision model that identifies and locates objects within images and video. You show it a picture and it draws bounding boxes around things it recognizes, such as cars, people, or animals, along with a label for each. It is part of the YOLO family of models, which are built around speed and can run in real time. This version was developed by researchers at Tsinghua University and published at NeurIPS 2024.

The main technical contribution is removing a post-processing step called NMS (non-maximum suppression) that previous YOLO versions required after generating predictions. NMS is a cleanup pass that filters out duplicate detections, but it adds latency and complicates deployment on certain hardware. YOLOv10 is trained to avoid producing duplicates in the first place, so no NMS step is needed when running the model. The paper reports that this makes it 1.8 times faster than a comparable competing model at similar accuracy.

Several model sizes are provided, from a compact version suited for devices with limited computing power to larger versions aimed at higher accuracy. Pre-trained checkpoints are available on Hugging Face. The model can be tested through a browser-based demo, run in a Google Colab notebook, or installed as a Python package for integration into custom projects.

The README also promotes a follow-up project called YOLOE, which extends this work to open-vocabulary detection: recognizing objects that go beyond a fixed predefined list of categories, by accepting text or visual prompts at inference time.

This is a research codebase intended for practitioners familiar with Python and machine learning workflows.

Where it fits

Run real-time object detection on security camera footage to spot and label people, vehicles, or other objects.
Add bounding box detection to a Python app that processes product photos or medical images.
Fine-tune the model on a custom image dataset to recognize objects specific to your industry.
Deploy the compact model variant on edge hardware with limited computing power for on-device inference.

Open on GitHub → Full breakdown on explaingit →