YOLOX

Python ★ 11k updated 1y ago

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

YOLOX is a Python library that automatically draws labeled boxes around objects in images and video in real time, offering model sizes from phone-friendly Nano to GPU-powered Extra-Large, without needing pre-defined anchor boxes.

PythonPyTorchONNXTensorRTOpenVINOCUDAsetup: moderatecomplexity 3/5

YOLOX is a Python library for object detection, meaning it lets you take an image or video and automatically draw boxes around the objects in it, labeling each one (a car, a person, a dog, and so on). It belongs to the YOLO family of detectors, which have been popular for years because they are fast enough to process video in real time. YOLOX was released by Megvii in 2021 and presented improvements over earlier YOLO versions (v3 through v5).

The key design change in YOLOX is that it removed a concept called anchors. Older YOLO models used a set of pre-defined box shapes to help them guess where objects might be. YOLOX predicts boxes directly without that pre-defined set, which simplifies the training process and makes the model easier to adapt to new tasks. The README states this approach achieves higher accuracy than the anchor-based versions it replaces.

The library ships several model sizes: Nano and Tiny for devices with very limited computing power (like a phone or a small embedded board), and Small, Medium, Large, and Extra-Large for servers with GPUs. A table in the README shows the speed and accuracy numbers for each size, tested on the COCO benchmark dataset that researchers commonly use to compare object detectors.

Once you install YOLOX from source using pip, you can run a demo on a single image or on a video file with one command. Training your own model on the COCO dataset or a custom dataset is also supported. The library works with multiple export formats so you can deploy a trained model in different environments: ONNX for cross-platform compatibility, TensorRT for fast inference on Nvidia hardware, ncnn for mobile devices, and OpenVINO for Intel hardware.

The codebase is written in PyTorch, which is the framework most AI researchers use. A separate version using MegEngine (Megvii's own framework) also exists in a different repository.

Where it fits

Run real-time object detection on video footage to automatically identify and draw boxes around cars, people, and other objects.
Train a custom object detection model on your own labeled image dataset using the YOLOX training pipeline.
Deploy a YOLOX Nano model on a mobile or embedded device for lightweight on-device object detection.
Export a trained YOLOX model to TensorRT for fast inference on Nvidia GPU hardware in production.

Open on GitHub → Full breakdown on explaingit →