Members
-
GFPGAN
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Python ★ 37k 1y agoExplain → -
PhotoMaker
PhotoMaker [CVPR 2024]
Jupyter Notebook ★ 10k 1y agoExplain → -
InstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Python ★ 4.4k 1y agoExplain → -
T2I-Adapter
T2I-Adapter
Python ★ 3.8k 2y agoExplain → -
Pixal3D
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
Python ★ 1.8k 28d agoExplain → -
BrushNet
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
Python ★ 1.7k 1y agoExplain → -
MotionCtrl
Official Code for MotionCtrl [SIGGRAPH 2024]
Python ★ 1.5k 1y agoExplain → -
SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
Python ★ 1.0k 6mo agoExplain → -
SEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language Model
Python ★ 885 1y agoExplain → -
MasaCtrl
[ICCV 2023] Consistent Image Synthesis and Editing
Python ★ 845 1y agoExplain → -
VideoPainter
[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
Python ★ 611 1y agoExplain → -
BrushEdit
[under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
Python ★ 588 9mo agoExplain → -
ToonComposer
[ICLR 2026] Streamlining Cartoon Production with Generative Post-Keyframing
Python ★ 575 10mo agoExplain → -
LLaMA-Pro
[ACL 2024] Progressive LLaMA with Block Expansion.
Python ★ 513 2y agoExplain → -
StereoCrafter
A framework to convert any 2D videos to immersive stereoscopic 3D
Python ★ 492 25d agoExplain → -
ColorFlow
The official implementation of paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization". ColorFlow:基于检索增强的图像序列上色
Python ★ 462 6mo agoExplain → -
GeometryCrafter
[ICCV 2025] GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Python ★ 449 8mo agoExplain → -
Mix-of-Show
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Python ★ 430 2y agoExplain → -
RollingForcing
[ICLR 2026] Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Python ★ 424 7mo agoExplain → -
VerseCrafter
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Python ★ 400 3mo agoExplain → -
SmartEdit
Official code of SmartEdit [CVPR-2024 Highlight]
Python ★ 375 2y agoExplain → -
AnimeSR
Codes for "AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos"
Python ★ 368 2y agoExplain → -
VQFR
ECCV 2022, Oral, VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Python ★ 355 3y agoExplain → -
AnimeGamer
[ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Python ★ 346 1y agoExplain → -
DiTCtrl
[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
Python ★ 325 1y agoExplain → -
AudioStory
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Jupyter Notebook ★ 302 9mo agoExplain → -
CustomNet
No description.
Python ★ 290 1y agoExplain → -
FreeSplatter
[ICCV 2025] FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
JavaScript ★ 240 10mo agoExplain → -
UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Python ★ 240 2y agoExplain → -
Track4World
[ECCV 2026] Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
Python ★ 238 3d agoExplain → -
ARC-Hunyuan-Video-7B
Structured Video Comprehension of Real-World Shorts
Python ★ 238 9mo agoExplain → -
TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Python ★ 236 10mo agoExplain → -
ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Python ★ 190 1y agoExplain → -
Moto
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
Python ★ 179 8mo agoExplain → -
MM-RealSR
Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"
Python ★ 179 2y agoExplain → -
MotionCrafter
[CVPR 2026 Highlight🔥] MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
Python ★ 169 9d agoExplain → -
IC-Custom
[ICLR'26] IC-Custom: Diverse Image Customization via In-Context Learning
Python ★ 162 9mo agoExplain → -
GenCompositor
[ICLR 2026] GenCompositor: Generative Video Compositing with Diffusion Transformer
Python ★ 158 3mo agoExplain → -
ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Python ★ 155 1y agoExplain → -
TimeLens
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Python ★ 146 1mo agoExplain → -
DeSRA
Official codes for DeSRA (ICML 2023)
Python ★ 143 2y agoExplain → -
MCQ
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
Python ★ 142 3y agoExplain → -
MindOmni
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Python ★ 140 8mo agoExplain → -
DI-PCG
Code release of our paper "DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation".
Python ★ 138 1y agoExplain → -
NVComposer
[CVPR 2025] Boosting Generative Novel View Synthesis with Sparse and Unposed Images
Python ★ 130 1y agoExplain → -
CubeComposer
[CVPR 2026] Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
Python ★ 123 2mo agoExplain → -
FAIG
NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Python ★ 121 3y agoExplain → -
FluxKits
No description.
Python ★ 110 1y agoExplain → -
ArcNerf
Nerf and extensions in all
Jupyter Notebook ★ 108 3y agoExplain → -
SEED-Bench-R1
No description.
Python ★ 101 1y agoExplain → -
mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
Python ★ 95 1y agoExplain → -
Video-Holmes
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
Python ★ 94 11mo agoExplain → -
Divot
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
Python ★ 87 1y agoExplain → -
GRPO-CARE
[ACL2026 Findings] GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Python ★ 84 1y agoExplain → -
SurfelNeRF
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
★ 79 2y agoExplain → -
RepSR
Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"
★ 76 4y agoExplain → -
DSR_Suite
No description.
Jupyter Notebook ★ 74 2mo agoExplain → -
HOSNeRF
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Python ★ 70 2y agoExplain → -
FastRealVSR
Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"
★ 60 3y agoExplain → -
GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
Python ★ 60 3y agoExplain → -
ConMIM
Official codes for ConMIM (ICLR 2023)
Python ★ 59 3y agoExplain → -
TVTS
Turning to Video for Transcript Sorting
Jupyter Notebook ★ 50 2y agoExplain → -
BEBR
Official code for "Binary embedding based retrieval at Tencent"
Python ★ 45 2y agoExplain → -
ARC-Chapter
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
★ 43 7mo agoExplain → -
ViSFT
No description.
Python ★ 39 2y agoExplain → -
SGAT4PASS
[IJCAI 2023] official implementation of the paper SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
Python ★ 37 3y agoExplain → -
pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
Python ★ 34 2y agoExplain → -
BTS
BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
★ 34 2y agoExplain → -
FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Python ★ 33 3y agoExplain → -
Efficient-VSR-Training
Codes for "Accelerating the Training of Video Super-Resolution"
★ 31 4y agoExplain → -
DTN
Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
Python ★ 30 4y agoExplain → -
OpenCompatible
OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.
Python ★ 26 4y agoExplain → -
Plot2Code
No description.
Python ★ 24 1y agoExplain → -
BlobCtrl
[SIGGRAPH ASIA'25] BlobCtrl: Taming Controllable Blob for Element-level Image Editing
Python ★ 24 7mo agoExplain → -
SFDA
No description.
Python ★ 22 3y agoExplain → -
OmniScript
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
★ 17 1mo agoExplain → -
TaCA
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
★ 17 3y agoExplain → -
common_trainer
Common template for pytorch project. Easy to extent and modify for new project.
Python ★ 14 3y agoExplain → -
TransFusion
The code repo for the ACM MM paper: TransFusion: Multi-Modal Fusion for Video Tag Inference viaTranslation-based Knowledge Embedding.
★ 10 4y agoExplain → -
ArcVis
Visualization of 3d and 2d components interactively.
Jupyter Notebook ★ 7 3y agoExplain → -
BasicVQ-GEN
No description.
★ 7 3y agoExplain → -
Sculpt4D
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers. CVPR‘2026
Python ★ 6 15d agoExplain → -
VTLayout
No description.
★ 4 2y agoExplain → -
vllm
vllm for ARC-Hunyuan-Video-7B
Python ★ 3 8mo agoExplain → -
TencentARC.github.io
No description.
HTML ★ 1 10mo agoExplain → -
Forward-Warp ⑂
An optical flow forward warp's lib with backpropagation using pytorch.
Python ★ 0 1mo agoExplain →
No repos match these filters.