IDEA-Research ORG

@IDEA-Research ·China ·www.idea.edu.cn

The International Digital Economy Academy (“IDEA”).

49 repos
2.9k followers
0 following

Python 82%
Jupyter Notebook 9%
TypeScript 5%
C++ 2%
HTML 2%

All public repos (49)

Show forks Show archived

Grounded-Segment-Anything ★ PINNED

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Grounded-Segment-Anything combines text-prompt object detection and pixel-level masking to find and outline any object in an image just by typing its name, with optional editing via Stable Diffusion.

Jupyter Notebook ★ 18k 1y ago
Explain →
detrex ★ PINNED

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

Python ★ 2.3k 9mo ago
Explain →
GroundingDINO ★ PINNED

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

An AI model that finds and locates objects in images based on text descriptions you write, instead of being limited to a fixed list of pre-trained categories, published at ECCV 2024.

Python ★ 10k 1y ago
Explain →
OpenSeeD ★ PINNED

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python ★ 759 2y ago
Explain →
MaskDINO ★ PINNED

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Python ★ 1.5k 2y ago
Explain →
DINO ★ PINNED

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Python ★ 2.8k 1y ago
Explain →
Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook ★ 3.6k 7mo ago
Explain →
DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)

Python ★ 2.8k 2y ago
Explain →
T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Python ★ 2.7k 8mo ago
Explain →
Rex-Omni

[CVPR2026] Detect Anything via Next Point Prediction

Jupyter Notebook ★ 1.5k 3mo ago
Explain →
awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)

★ 1.4k 1y ago
Explain →
DINO-X-API

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python ★ 1.4k 11mo ago
Explain →
Grounding-DINO-1.5-API

Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python ★ 1.1k 1y ago
Explain →
Motion-X

[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"

Python ★ 874 1y ago
Explain →
X-Pose

[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"

Python ★ 810 1y ago
Explain →
OSX

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"

Python ★ 793 1y ago
Explain →
DN-DETR

[CVPR 2022 Oral] Official implementation of DN-DETR

Python ★ 605 2y ago
Explain →
DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"

Jupyter Notebook ★ 579 3y ago
Explain →
MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Python ★ 386 1y ago
Explain →
HumanTOMATO

[ICML 2024] 🍅HumanTOMATO: Text-aligned Whole-body Motion Generation

Python ★ 364 2y ago
Explain →
HumanSD

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"

Python ★ 307 2y ago
Explain →
HumanArt

[CVPR 2023] The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"

★ 282 2y ago
Explain →
TAPTR

[ECCV 2024 & NeurIPS 2024 & ICLR 2026] Official implementation of the paper TAPTR & TAPTRv2 & TAPTRv3

★ 280 4mo ago
Explain →
deepdataspace

The Go-To Choice for CV Data Visualization, Annotation, and Model Analysis.

TypeScript ★ 263 2mo ago
Explain →
Stable-DINO

[ICCV 2023] Official implementation of the paper "Detection Transformer with Stable Matching"

Python ★ 242 2y ago
Explain →
ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Python ★ 214 8mo ago
Explain →
Lite-DETR

[CVPR 2023] Official implementation of the paper "Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR"

Python ★ 209 3y ago
Explain →
DreamWaltz

[NeurIPS 2023] Official implementation of the paper "DreamWaltz: Make a Scene with Complex 3D Animatable Avatars".

Python ★ 190 1y ago
Explain →
ED-Pose

[ICLR 2023] Official implementation of the paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "

Python ★ 188 2y ago
Explain →
3D-deformable-attention

[ICCV 2023] Official implementation of the paper "DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting"

Python ★ 186 1y ago
Explain →
RexSeek

[ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark

Python ★ 184 8mo ago
Explain →
Rex-Thinker

[ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

Python ★ 151 11mo ago
Explain →
MP-Former

[CVPR 2023] Official implementation of the paper: MP-Former: Mask-Piloted Transformer for Image Segmentation

Python ★ 142 2y ago
Explain →
SceneMaker

[CVPR 2026] Implementation of paper "SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model"

Python ★ 137 1mo ago
Explain →
DINO-X-MCP

Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.

TypeScript ★ 112 3d ago
Explain →
Click-Pose

[ICCV 2023] Official implementation of the paper "Neural Interactive Keypoint Detection"

Python ★ 88 2y ago
Explain →
DiffHOI

Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"

Python ★ 68 2y ago
Explain →
SegDINO3D

[AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features“

Python ★ 59 5mo ago
Explain →
DQ-DETR

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

★ 59 3y ago
Explain →
DisCo-CLIP

Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".

Python ★ 59 2y ago
Explain →
V-Reflection

Related code, checkpoints and project page for V-Reflection

Python ★ 58 2mo ago
Explain →
LipsFormer

No description.

Python ★ 44 3y ago
Explain →
TOSS

[ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"

Python ★ 24 2y ago
Explain →
hana

Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch

Python ★ 18 3y ago
Explain →
MotionCLR

[Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Python ★ 17 1y ago
Explain →
SegVGGT

Official implementation of the paper "SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images"

Python ★ 14 1mo ago
Explain →
IYFC

No description.

C++ ★ 10 2y ago
Explain →
detrex-storage

No description.

★ 4 1y ago
Explain →
HandOSweb

No description.

HTML ★ 2 1y ago
Explain →