rf-detr
RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning. [ICLR 2026]
RF-DETR is a Python library from Roboflow for detecting objects and segmenting their shapes in images and video. Object detection means locating specific things in an image and drawing a box around each one. Instance segmentation goes further and traces the precise outline of each detected object. RF-DETR does both through the same simple interface.
The model was accepted at ICLR 2026, a major academic machine learning conference, and achieves top results on COCO, the standard benchmark used to compare object detection models. It is built on a type of neural network called a vision transformer, specifically using a backbone called DINOv2 developed by Meta. Compared to similarly fast models, it offers a strong balance between speed and accuracy. Multiple size variants are available: smaller versions run faster and use less memory, while larger ones achieve higher accuracy. The smaller to medium variants are released under the Apache 2.0 open-source license, while the two largest variants use a more restrictive commercial license.
To use RF-DETR, you install it with a single pip command in a Python 3.10 or newer environment. You can then load a pre-trained model and run detection on your own images or video in a few lines of code. The library also supports fine-tuning on your own labeled dataset if you want the model to specialize in detecting particular objects. Roboflow provides a notebook on Google Colab showing the fine-tuning process end to end, and the model can be used directly through a Hugging Face web interface if you want to test it without installing anything.
The library integrates with Roboflow's broader tooling for building computer vision applications, but the core detection and segmentation features work independently. A Discord community is available for questions and support.