FoundationVision ORG

@FoundationVision

Bytedance's opensource FoundationVision models

21 repos
1.0k followers
0 following

Python 82%
HTML 12%
Jupyter Notebook 6%

All public repos (21)

Show forks Show archived

VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

VAR is a 2024 NeurIPS Best Paper AI image generation project that builds images coarse-to-fine across scales using next-scale prediction, outperforming diffusion models in several benchmarks. Pretrained models from 310M to 2.3B parameters are available on Hugging Face.

Jupyter Notebook ★ 8.7k 7mo ago
Explain →
ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python ★ 6.5k 2y ago
Explain →
LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python ★ 2.0k 1y ago
Explain →
Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python ★ 1.6k 2mo ago
Explain →
GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python ★ 1.2k 1y ago
Explain →
Waver

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

★ 942 9mo ago
Explain →
InfinityStar

[NeurIPS 2025 Oral]Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation

Python ★ 767 2mo ago
Explain →
Liquid

(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python ★ 643 19d ago
Explain →
VNext ▣

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

Python ★ 617 2y ago
Explain →
Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python ★ 587 2y ago
Explain →
UniTok

[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

Python ★ 527 7mo ago
Explain →
FlashVideo

[AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Python ★ 459 1y ago
Explain →
Alive

[Tech Report] Alive: A Unified Audio-Video Generation Model

★ 457 2mo ago
Explain →
OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

Python ★ 324 1y ago
Explain →
UniRef

[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces

Python ★ 238 1y ago
Explain →
GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python ★ 196 1y ago
Explain →
vaex

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Python ★ 107 2y ago
Explain →
BitVAE

official training and inference code of bitwise tokenizer

Python ★ 72 1y ago
Explain →
.github

No description.

★ 0 7mo ago
Explain →
flashvideo-page

No description.

HTML ★ 0 1y ago
Explain →
infinity.project

No description.

HTML ★ 0 1y ago
Explain →