gluon-cv

Python ★ 5.9k updated 1y ago

Gluon CV Toolkit

GluonCV is a Python toolkit for computer vision tasks built on top of the MXNet deep learning library, with support for PyTorch as well. Computer vision here means teaching a program to understand images and videos: recognizing what objects appear in a photo, drawing boxes around them, labeling individual pixels, tracking human body positions, or identifying actions in video clips.

The toolkit ships with over 50 pre-trained models covering six main tasks: image classification, object detection, semantic segmentation, instance segmentation, human pose estimation, and video action recognition. Pre-trained means these models were already trained on large public datasets and are ready to use without starting from scratch. A developer can load one of these models with a few lines of code and start making predictions immediately.

Beyond just inference, GluonCV includes the full training scripts used to produce the results reported in published research papers. This lets a researcher or engineer reproduce a known result, or adapt the training process to their own dataset. The APIs are designed to reduce the amount of setup code needed, so teams can go from a raw dataset to a trained model without writing boilerplate infrastructure.

The project is maintained by the DMLC open-source community. It is released under the Apache 2.0 license and is installable via PyPI. The documentation includes tutorials covering each supported task. The README also points to AutoGluon as an alternative for users who want a more automated, lower-configuration approach to image classification and object detection with a broader range of underlying model architectures.

Open on GitHub → Full breakdown on explaingit →