yolov7

Jupyter Notebook ★ 14k updated 1y ago

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

An official real-time object detection system that identifies and labels objects in images and video at up to 161 frames per second, with pre-trained models ready to use immediately and support for pose estimation and instance segmentation.

PythonPyTorchCUDAJupyter NotebookDockersetup: moderatecomplexity 4/5

YOLOv7 is an object detection system that can identify what is in an image or video frame and mark each recognized object with a bounding box, all in real time. Object detection means automatically finding and labeling things like people, cars, or animals within a photo or video. This repository contains the official research code from a 2022 paper that introduced the YOLOv7 architecture.

The model runs on PyTorch and is built around speed. The base YOLOv7 variant processes images at 161 frames per second on a single GPU, fast enough for live video feeds. Several size variants are included, from the standard model to larger ones like YOLOv7-E6E that trade some speed for better accuracy. All variants come with pre-trained weight files you can download and use immediately without training anything yourself.

If you want to train on your own images, the repository provides scripts for single-GPU and multi-GPU setups. You point the training script at a folder of labeled images, set a configuration file, and run it. Transfer learning is supported too: you can start from one of the provided checkpoints rather than from scratch, which is useful when your dataset is small.

The code also includes support for pose estimation (detecting human body keypoints in images) and instance segmentation (drawing the exact outline of each detected object rather than just a box around it). A live web demo is hosted on Hugging Face Spaces if you want to try the model without any local setup. The recommended installation route is Docker, with pip as an alternative for environments where Docker is not practical.

Where it fits

Run real-time object detection on a live video feed to identify people, cars, or animals without training anything.
Fine-tune YOLOv7 on your own labeled image dataset to detect custom objects specific to your use case.
Use the pose estimation feature to detect and overlay human body keypoints on images or video.
Apply instance segmentation to draw exact outlines around each detected object rather than just bounding boxes.

Open on GitHub → Full breakdown on explaingit →