Show Lab ORG

@showlab ·sites.google.com/view/showlab

137 repos
1.2k followers
0 following

Python 77%
JavaScript 13%
Jupyter Notebook 5%
HTML 4%
C# 1%

All public repos (137)

Show forks Show archived

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

★ 5.7k 3d ago
Explain →
Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Python ★ 4.4k 2y ago
Explain →
Paper2Video

Automatic Video Generation from Scientific Papers

Python ★ 2.3k 3mo ago
Explain →
Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python ★ 2.0k 5mo ago
Explain →
computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Python ★ 1.9k 1y ago
Explain →
ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python ★ 1.9k 1mo ago
Explain →
Code2Video

[ICML 2026] Video generation via code

Python ★ 1.8k 19d ago
Explain →
Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

★ 1.2k 10mo ago
Explain →
Show-1

[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Python ★ 1.1k 9mo ago
Explain →
MotionDirector

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

Python ★ 1.0k 1y ago
Explain →
Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

★ 1.0k 8mo ago
Explain →
Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

★ 827 8mo ago
Explain →
Image2Paragraph

[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.

Python ★ 824 3y ago
Explain →
X-Adapter

[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Python ★ 772 1y ago
Explain →
videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python ★ 669 6mo ago
Explain →
VLog

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

Python ★ 589 1y ago
Explain →
DragAnything

[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation

Python ★ 506 1y ago
Explain →
livecc

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)

Python ★ 457 7mo ago
Explain →
PhotoDoodle

[ICCV 2025] Code Implementation of "ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples"

Python ★ 430 1y ago
Explain →
OmniConsistency

The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."

Python ★ 419 1y ago
Explain →
VideoSwap

Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Python ★ 405 1y ago
Explain →
UniVTG

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Python ★ 378 2y ago
Explain →
MovieAgent

MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning

Python ★ 344 1y ago
Explain →
Awesome-Robotics-Diffusion

A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

★ 342 4d ago
Explain →
DatasetDM

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models

Python ★ 332 2y ago
Explain →
ShowUI-Aloha

Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.

Python ★ 312 5mo ago
Explain →
FAR

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python ★ 308 1y ago
Explain →
Kiwi-Edit

A unified and fully open-source framework for instruction-guided and reference-guided video editing using natural language.

Python ★ 292 1mo ago
Explain →
all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

Python ★ 281 3y ago
Explain →
BoxDiff

[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Python ★ 274 1y ago
Explain →
EgoVLP

[NeurIPS 2022] Egocentric Video-Language Pretraining

Python ★ 260 2y ago
Explain →
MakeAnything

Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"

Python ★ 208 1y ago
Explain →
DeVRF

The Pytorch implementation of "DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes"

Python ★ 188 3y ago
Explain →
VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Python ★ 147 1y ago
Explain →
whisperVideo

Find out who said what in the video.

Jupyter Notebook ★ 146 4mo ago
Explain →
D-AR

the official repo for "D-AR: Diffusion via Autoregressive Models"

Python ★ 139 4mo ago
Explain →
VisorGPT

[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT

Python ★ 138 2y ago
Explain →
showui-pi

[CVPR 2026] ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Python ★ 128 1mo ago
Explain →
WorldGUI

Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

Python ★ 124 10mo ago
Explain →
OmniPSD

Official code implementation of "OmniPSD: Layered PSD Generation with Diffusion Transformer"

Python ★ 117 25d ago
Explain →
ROICtrl

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

Python ★ 111 1y ago
Explain →
Olaf-World

[ICML 2026] Orienting Latent Actions for Video World Modeling

Python ★ 108 2mo ago
Explain →
Soap2Soap

The official code implementation of the paper “Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration“.

Python ★ 102 26d ago
Explain →
MovieBench

[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation

Python ★ 98 1y ago
Explain →
LayerTracer

Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"

Python ★ 95 1y ago
Explain →
LOVA3

(NeurIPS 2024) Official PyTorch implementation of LOVA3

Python ★ 91 1y ago
Explain →
EVOLVE-VLA

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

HTML ★ 86 6mo ago
Explain →
Adv-GRPO

[CVPR 2026] An official implementation of Adv-GRPO. The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.

Python ★ 85 3mo ago
Explain →
ShowAnything

No description.

Jupyter Notebook ★ 83 2y ago
Explain →
Impossible-Videos

ICML 2025 - Impossible Videos

Python ★ 83 11mo ago
Explain →
T2VScore

T2VScore: Towards A Better Metric for Text-to-Video Generation

★ 81 2y ago
Explain →
loveu-tgve-2023

Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.

Python ★ 78 2y ago
Explain →
cosmo

No description.

Python ★ 75 2y ago
Explain →
sparseformer

(ICLR 2024, CVPR 2024) SparseFormer

Python ★ 75 1y ago
Explain →
Multi-human-Talking-Video-Dataset

Muti-human Interactive Talking Dataset

Python ★ 74 10mo ago
Explain →
RobotSeg

[CVPR 2026 Oral] RobotSeg

Python ★ 71 1mo ago
Explain →
SMS

[ICCV 2025] Balanced Image Stylization with Style Matching Score

Python ★ 70 3mo ago
Explain →
assistgpt

No description.

JavaScript ★ 66 3y ago
Explain →
Exo2Ego-V

No description.

Python ★ 60 1y ago
Explain →
FQGAN

FQGAN: Factorized Visual Tokenization and Generation

Python ★ 59 1y ago
Explain →
datacentric.vlp

Compress conventional Vision-Language Pre-training data

Python ★ 52 2y ago
Explain →
videogui

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

JavaScript ★ 52 3mo ago
Explain →
UniRL

The code repository of UniRL

Python ★ 52 1y ago
Explain →
EvolveDirector

[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

Python ★ 52 1y ago
Explain →
afformer

Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)

Python ★ 46 1y ago
Explain →
X-Humanoid

No description.

★ 45 6mo ago
Explain →
AUI

Computer-Use Agents as Judges for Generative UI

Python ★ 45 6mo ago
Explain →
Edit2Perceive

[CVPR 2026] Official Implementation of Edit2Perceive

Python ★ 44 3mo ago
Explain →
MovieSeq

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

Jupyter Notebook ★ 44 1y ago
Explain →
Region_Learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Python ★ 43 3y ago
Explain →
CLVQA

[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)

Python ★ 42 2y ago
Explain →
RingID

No description.

Python ★ 40 1y ago
Explain →
SAM-I2V

[CVPR 2025] SAM-I2V

Jupyter Notebook ★ 38 5mo ago
Explain →
mist

No description.

Jupyter Notebook ★ 37 2y ago
Explain →
FocusUI

[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Python ★ 36 12d ago
Explain →
macosworld

No description.

Python ★ 36 4mo ago
Explain →
WorldWander

Official Pytorch Code of the Paper "WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation"

Python ★ 35 15h ago
Explain →
PANDA

[NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer

★ 33 8mo ago
Explain →
BYOC

[IEEE-VR 2024] Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters

C# ★ 33 2y ago
Explain →
DoraCycle

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Python ★ 32 3mo ago
Explain →
Awesome-Long-Context

A curated list of resources about long-context in large-language models and video understanding.

★ 32 2y ago
Explain →
Long-form-Video-Prior

No description.

Python ★ 32 2y ago
Explain →
DiffSim

[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Python ★ 31 11mo ago
Explain →
assistgui

No description.

JavaScript ★ 30 2y ago
Explain →
World-VLA-Loop

Github repository for World-VLA-Loop.

JavaScript ★ 29 3mo ago
Explain →
DIM

[ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

Python ★ 29 1mo ago
Explain →
VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Python ★ 28 1y ago
Explain →
Dream.exe

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

★ 27 16d ago
Explain →
ShowRoom3D

This is the project page of ShowRoom3D

★ 26 2y ago
Explain →
TPDiff

TPDiff: Temporal Pyramid Video Diffusion Model

★ 25 1y ago
Explain →
H2R-Grounder

A V2V framework that translates human interaction videos into robot manipulation videos.

★ 24 6mo ago
Explain →
Q2A

[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

Python ★ 23 4mo ago
Explain →
Efficient-CLS

[ICCV 2023] Label-Efficient Online Continual Object Detection in Streaming Video

Python ★ 23 2y ago
Explain →
IDProtector

The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.

Python ★ 22 11mo ago
Explain →
DemoVLP

[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training

Python ★ 22 4y ago
Explain →
AVA-AVD

No description.

Python ★ 21 3y ago
Explain →
Mitty

Official code implementation of "Mitty: Diffusion-based Human-to-Robot Video Generation"

Python ★ 19 5mo ago
Explain →
GEB-Plus

[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

Python ★ 17 3y ago
Explain →
HOSNeRF

This is the project page for the HOSNeRF

JavaScript ★ 16 2y ago
Explain →
Tune-An-Ellipse

[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Python ★ 14 1y ago
Explain →
headshot

No description.

★ 14 4y ago
Explain →
Sparkle

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

HTML ★ 13 1mo ago
Explain →
Demo2Tutorial

No description.

Python ★ 13 16d ago
Explain →
Show-Anything-3D

Edit and Generate Anything in 3D world!

★ 13 3y ago
Explain →
TrustScorer

ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score

Python ★ 13 6mo ago
Explain →
GUI-Narrator

Repository of GUI Action Narrator

JavaScript ★ 13 1y ago
Explain →
StreamingEffect

Implementation of StreamingEffect: Real-Time Human-Centric Video Effect Generation

Python ★ 12 25d ago
Explain →
SCT

[IJCV2023] Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"

Python ★ 12 2y ago
Explain →
UniMoD

The code repository of UniMoD

★ 11 1y ago
Explain →
watermark-steganalysis

No description.

Python ★ 11 1y ago
Explain →
SOIS

The Pytorch implementation of "Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization"

★ 9 3y ago
Explain →
OmniHumanoid

Implementation of OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Python ★ 7 1mo ago
Explain →
PICO

[ArXiv 2025] Personalized Vision via Visual In-Context Learning

★ 7 8mo ago
Explain →
VISTA

The official code implementation of the paper “VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers“.

Python ★ 6 29d ago
Explain →
VC2L

No description.

Python ★ 6 8mo ago
Explain →
T2F-Bench

A comprehensive benchmark for evaluating text-to-film generation performance.

★ 6 4mo ago
Explain →
SAM-I2VPP

[TPAMI 2026] SAM-I2V++

Jupyter Notebook ★ 5 5mo ago
Explain →
ColonNeRF

This is the project page for ColonNeRF.

JavaScript ★ 5 2y ago
Explain →
PAI-Studio

No description.

★ 4 1d ago
Explain →
ActionMap

No description.

Python ★ 4 10d ago
Explain →
DynVideo-E

This is the project page for DynVideo-E.

JavaScript ★ 3 2y ago
Explain →
SWEET

Official Code of SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

Python ★ 2 25d ago
Explain →
UENR-600K

No description.

★ 2 2mo ago
Explain →
World-VLA-Loop-Pages

World-VLA-Loop Project Github Pages

JavaScript ★ 2 4mo ago
Explain →
P-Flow

P-Flow: Prompting Visual Effects Generation

★ 2 2mo ago
Explain →
WMAdapter

A watermark plugin for latent diffusion models.

★ 2 1y ago
Explain →
cvpr2024-tutorial-video-diffusion-models

No description.

HTML ★ 2 1y ago
Explain →
AssistGaze

No description.

Python ★ 2 7mo ago
Explain →
TTC-Tuning

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

★ 2 3y ago
Explain →
magicanimate

No description.

JavaScript ★ 2 2y ago
Explain →
assistq

No description.

SCSS ★ 1 4y ago
Explain →
Aloha_Page

The website for aloha introduction

HTML ★ 0 5mo ago
Explain →
omg

Open Multimodal Gathering workshop @ NUS

JavaScript ★ 0 1y ago
Explain →
InterFeedback

No description.

★ 0 1y ago
Explain →
xagen

No description.

JavaScript ★ 0 2y ago
Explain →
Moonshot

No description.

JavaScript ★ 0 2y ago
Explain →
pv3d

No description.

JavaScript ★ 0 3y ago
Explain →