-
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, and various other applications.
★ 5.7k 3d agoExplain → -
Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Python ★ 4.4k 2y agoExplain → -
Paper2Video
Automatic Video Generation from Scientific Papers
Python ★ 2.3k 3mo agoExplain → -
Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Python ★ 2.0k 5mo agoExplain → -
computer_use_ootb
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python ★ 1.9k 1y agoExplain → -
ShowUI
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Python ★ 1.9k 1mo agoExplain → -
Code2Video
[ICML 2026] Video generation via code
Python ★ 1.8k 19d agoExplain → -
Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
★ 1.2k 10mo agoExplain → -
Show-1
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python ★ 1.1k 9mo agoExplain → -
MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Python ★ 1.0k 1y agoExplain → -
Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
★ 1.0k 8mo agoExplain → -
Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
★ 827 8mo agoExplain → -
Image2Paragraph
[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Python ★ 824 3y agoExplain → -
X-Adapter
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python ★ 772 1y agoExplain → -
videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python ★ 669 6mo agoExplain → -
VLog
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
Python ★ 589 1y agoExplain → -
DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Python ★ 506 1y agoExplain → -
livecc
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Python ★ 457 7mo agoExplain → -
PhotoDoodle
[ICCV 2025] Code Implementation of "ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples"
Python ★ 430 1y agoExplain → -
OmniConsistency
The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."
Python ★ 419 1y agoExplain → -
VideoSwap
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Python ★ 405 1y agoExplain → -
UniVTG
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
Python ★ 378 2y agoExplain → -
MovieAgent
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Python ★ 344 1y agoExplain → -
Awesome-Robotics-Diffusion
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
★ 342 4d agoExplain → -
DatasetDM
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
Python ★ 332 2y agoExplain → -
ShowUI-Aloha
Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.
Python ★ 312 5mo agoExplain → -
FAR
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Python ★ 308 1y agoExplain → -
Kiwi-Edit
A unified and fully open-source framework for instruction-guided and reference-guided video editing using natural language.
Python ★ 292 1mo agoExplain → -
all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Python ★ 281 3y agoExplain → -
BoxDiff
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Python ★ 274 1y agoExplain → -
EgoVLP
[NeurIPS 2022] Egocentric Video-Language Pretraining
Python ★ 260 2y agoExplain → -
MakeAnything
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
Python ★ 208 1y agoExplain → -
DeVRF
The Pytorch implementation of "DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes"
Python ★ 188 3y agoExplain → -
VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python ★ 147 1y agoExplain → -
whisperVideo
Find out who said what in the video.
Jupyter Notebook ★ 146 4mo agoExplain → -
D-AR
the official repo for "D-AR: Diffusion via Autoregressive Models"
Python ★ 139 4mo agoExplain → -
VisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
Python ★ 138 2y agoExplain → -
showui-pi
[CVPR 2026] ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Python ★ 128 1mo agoExplain → -
WorldGUI
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Python ★ 124 10mo agoExplain → -
OmniPSD
Official code implementation of "OmniPSD: Layered PSD Generation with Diffusion Transformer"
Python ★ 117 25d agoExplain → -
ROICtrl
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
Python ★ 111 1y agoExplain → -
Olaf-World
[ICML 2026] Orienting Latent Actions for Video World Modeling
Python ★ 108 2mo agoExplain → -
Soap2Soap
The official code implementation of the paper “Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration“.
Python ★ 102 26d agoExplain → -
MovieBench
[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation
Python ★ 98 1y agoExplain → -
LayerTracer
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
Python ★ 95 1y agoExplain → -
LOVA3
(NeurIPS 2024) Official PyTorch implementation of LOVA3
Python ★ 91 1y agoExplain → -
EVOLVE-VLA
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
HTML ★ 86 6mo agoExplain → -
Adv-GRPO
[CVPR 2026] An official implementation of Adv-GRPO. The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.
Python ★ 85 3mo agoExplain → -
ShowAnything
No description.
Jupyter Notebook ★ 83 2y agoExplain → -
Impossible-Videos
ICML 2025 - Impossible Videos
Python ★ 83 11mo agoExplain → -
T2VScore
T2VScore: Towards A Better Metric for Text-to-Video Generation
★ 81 2y agoExplain → -
loveu-tgve-2023
Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.
Python ★ 78 2y agoExplain → -
cosmo
No description.
Python ★ 75 2y agoExplain → -
sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
Python ★ 75 1y agoExplain → -
Multi-human-Talking-Video-Dataset
Muti-human Interactive Talking Dataset
Python ★ 74 10mo agoExplain → -
RobotSeg
[CVPR 2026 Oral] RobotSeg
Python ★ 71 1mo agoExplain → -
SMS
[ICCV 2025] Balanced Image Stylization with Style Matching Score
Python ★ 70 3mo agoExplain → -
assistgpt
No description.
JavaScript ★ 66 3y agoExplain → -
Exo2Ego-V
No description.
Python ★ 60 1y agoExplain → -
FQGAN
FQGAN: Factorized Visual Tokenization and Generation
Python ★ 59 1y agoExplain → -
datacentric.vlp
Compress conventional Vision-Language Pre-training data
Python ★ 52 2y agoExplain → -
videogui
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
JavaScript ★ 52 3mo agoExplain → -
UniRL
The code repository of UniRL
Python ★ 52 1y agoExplain → -
EvolveDirector
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python ★ 52 1y agoExplain → -
afformer
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
Python ★ 46 1y agoExplain → -
X-Humanoid
No description.
★ 45 6mo agoExplain → -
AUI
Computer-Use Agents as Judges for Generative UI
Python ★ 45 6mo agoExplain → -
Edit2Perceive
[CVPR 2026] Official Implementation of Edit2Perceive
Python ★ 44 3mo agoExplain → -
MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook ★ 44 1y agoExplain → -
Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Python ★ 43 3y agoExplain → -
CLVQA
[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)
Python ★ 42 2y agoExplain → -
RingID
No description.
Python ★ 40 1y agoExplain → -
SAM-I2V
[CVPR 2025] SAM-I2V
Jupyter Notebook ★ 38 5mo agoExplain → -
mist
No description.
Jupyter Notebook ★ 37 2y agoExplain → -
FocusUI
[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Python ★ 36 12d agoExplain → -
macosworld
No description.
Python ★ 36 4mo agoExplain → -
WorldWander
Official Pytorch Code of the Paper "WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation"
Python ★ 35 15h agoExplain → -
PANDA
[NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
★ 33 8mo agoExplain → -
BYOC
[IEEE-VR 2024] Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters
C# ★ 33 2y agoExplain → -
DoraCycle
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Python ★ 32 3mo agoExplain → -
Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.
★ 32 2y agoExplain → -
Long-form-Video-Prior
No description.
Python ★ 32 2y agoExplain → -
DiffSim
[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python ★ 31 11mo agoExplain → -
assistgui
No description.
JavaScript ★ 30 2y agoExplain → -
World-VLA-Loop
Github repository for World-VLA-Loop.
JavaScript ★ 29 3mo agoExplain → -
DIM
[ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
Python ★ 29 1mo agoExplain → -
VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Python ★ 28 1y agoExplain → -
Dream.exe
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
★ 27 16d agoExplain → -
ShowRoom3D
This is the project page of ShowRoom3D
★ 26 2y agoExplain → -
TPDiff
TPDiff: Temporal Pyramid Video Diffusion Model
★ 25 1y agoExplain → -
H2R-Grounder
A V2V framework that translates human interaction videos into robot manipulation videos.
★ 24 6mo agoExplain → -
Q2A
[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Python ★ 23 4mo agoExplain → -
Efficient-CLS
[ICCV 2023] Label-Efficient Online Continual Object Detection in Streaming Video
Python ★ 23 2y agoExplain → -
IDProtector
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
Python ★ 22 11mo agoExplain → -
DemoVLP
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
Python ★ 22 4y agoExplain → -
AVA-AVD
No description.
Python ★ 21 3y agoExplain → -
Mitty
Official code implementation of "Mitty: Diffusion-based Human-to-Robot Video Generation"
Python ★ 19 5mo agoExplain → -
GEB-Plus
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Python ★ 17 3y agoExplain → -
HOSNeRF
This is the project page for the HOSNeRF
JavaScript ★ 16 2y agoExplain → -
Tune-An-Ellipse
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Python ★ 14 1y agoExplain → -
headshot
No description.
★ 14 4y agoExplain → -
Sparkle
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
HTML ★ 13 1mo agoExplain → -
Demo2Tutorial
No description.
Python ★ 13 16d agoExplain → -
Show-Anything-3D
Edit and Generate Anything in 3D world!
★ 13 3y agoExplain → -
TrustScorer
ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score
Python ★ 13 6mo agoExplain → -
GUI-Narrator
Repository of GUI Action Narrator
JavaScript ★ 13 1y agoExplain → -
StreamingEffect
Implementation of StreamingEffect: Real-Time Human-Centric Video Effect Generation
Python ★ 12 25d agoExplain → -
SCT
[IJCV2023] Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"
Python ★ 12 2y agoExplain → -
UniMoD
The code repository of UniMoD
★ 11 1y agoExplain → -
watermark-steganalysis
No description.
Python ★ 11 1y agoExplain → -
SOIS
The Pytorch implementation of "Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization"
★ 9 3y agoExplain → -
OmniHumanoid
Implementation of OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation
Python ★ 7 1mo agoExplain → -
PICO
[ArXiv 2025] Personalized Vision via Visual In-Context Learning
★ 7 8mo agoExplain → -
VISTA
The official code implementation of the paper “VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers“.
Python ★ 6 29d agoExplain → -
VC2L
No description.
Python ★ 6 8mo agoExplain → -
T2F-Bench
A comprehensive benchmark for evaluating text-to-film generation performance.
★ 6 4mo agoExplain → -
SAM-I2VPP
[TPAMI 2026] SAM-I2V++
Jupyter Notebook ★ 5 5mo agoExplain → -
ColonNeRF
This is the project page for ColonNeRF.
JavaScript ★ 5 2y agoExplain → -
PAI-Studio
No description.
★ 4 1d agoExplain → -
ActionMap
No description.
Python ★ 4 10d agoExplain → -
DynVideo-E
This is the project page for DynVideo-E.
JavaScript ★ 3 2y agoExplain → -
SWEET
Official Code of SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution
Python ★ 2 25d agoExplain → -
UENR-600K
No description.
★ 2 2mo agoExplain → -
World-VLA-Loop-Pages
World-VLA-Loop Project Github Pages
JavaScript ★ 2 4mo agoExplain → -
P-Flow
P-Flow: Prompting Visual Effects Generation
★ 2 2mo agoExplain → -
WMAdapter
A watermark plugin for latent diffusion models.
★ 2 1y agoExplain → -
cvpr2024-tutorial-video-diffusion-models
No description.
HTML ★ 2 1y agoExplain → -
AssistGaze
No description.
Python ★ 2 7mo agoExplain → -
TTC-Tuning
Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
★ 2 3y agoExplain → -
magicanimate
No description.
JavaScript ★ 2 2y agoExplain → -
assistq
No description.
SCSS ★ 1 4y agoExplain → -
Aloha_Page
The website for aloha introduction
HTML ★ 0 5mo agoExplain → -
omg
Open Multimodal Gathering workshop @ NUS
JavaScript ★ 0 1y agoExplain → -
InterFeedback
No description.
★ 0 1y agoExplain → -
xagen
No description.
JavaScript ★ 0 2y agoExplain → -
Moonshot
No description.
JavaScript ★ 0 2y agoExplain → -
pv3d
No description.
JavaScript ★ 0 3y agoExplain →
No repos match these filters.