cvat

Python ★ 16k updated 1d ago

Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling services, for image, video, and 3D annotation with AI-assisted labeling, quality assurance, team collaboration, analytics, and developer APIs.

A platform for drawing labels on images and videos to create training data for computer vision AI models, available as a free hosted service, a paid cloud plan, or a self-hosted Docker deployment.

PythonDockerDatumarosetup: moderatecomplexity 3/5

CVAT, which stands for Computer Vision Annotation Tool, is a platform for labeling images and videos so they can be used to train computer-vision models. When someone is building an AI that needs to recognise objects in pictures (say, cars in traffic footage or defects on a production line), the model first needs thousands of examples where a human has drawn boxes around the objects and tagged them. CVAT is the workspace where that drawing and tagging happens.

It is offered in three ways. You can use the free hosted version at cvat.ai, where you can create up to ten annotation tasks and upload up to 500 MB of data. You can pay for a cloud subscription that lifts those limits and unlocks features like auto-annotation and integrations with Roboflow and HuggingFace. Or you can self-host the tool on your own servers using prebuilt Docker images (server and UI), which the README says have been downloaded more than a million times.

The platform supports image, video, and 3D annotation, and it can import and export many common dataset formats so the labels work with other tools in the machine-learning pipeline. A related project called Datumaro is included for transforming datasets further. CVAT also exposes a server API, a Python SDK installable with pip install cvat-sdk, and a command-line tool installable with pip install cvat-cli, so teams can automate work or plug CVAT into their own scripts.

You would actually use CVAT when you have raw images or videos and need labeled training data, whether you are a researcher, a startup training a model, or an enterprise running a labeling team. The codebase is primarily Python. The full README is longer than what was provided.

Where it fits

Label a dataset of traffic camera images with bounding boxes around cars and pedestrians to train an object detection model.
Set up a self-hosted CVAT instance with Docker so your team can annotate proprietary images without sending data to a third party.
Use the cvat-sdk Python library to automate annotation import and export within an existing ML pipeline.
Annotate 3D point cloud data or video frame-by-frame for instance segmentation in a computer vision research project.

Open on GitHub → Full breakdown on explaingit →