28-day current streak·76-day longest streak
Hi there 👋 My name is Niels, I'm 28 years old and I live in Belgium. I'm currently working as an ML engineer @ 🤗 HuggingFace, where I'm part of…
Hi there 👋
My name is Niels, I'm 28 years old and I live in Belgium.
I'm currently working as an ML engineer @ 🤗 HuggingFace, where I'm part of the Open-Source team.
I work on HuggingFace Transformers, a Python library implementing several state-of-the-art AI algorithms, all based on the original Transformer by Google.
I love making AI more accessible to anyone. So far, I've contributed the following algorithms to HuggingFace Transformers:
- TAPAS, by Google AI
- ViT, by Google AI
- DEiT, by Facebook AI
- DETR, by Facebook AI
- BEiT, by Microsoft Research
- CANINE, by Google AI
- LUKE, by Studio Ousia
- LayoutLMv2 and LayoutXLM, by Microsoft Research
- DINO, by Facebook AI
- TrOCR, by Microsoft Research
- SegFormer, by NVIDIA
- ImageGPT, by OpenAI
- Perceiver/Perceiver IO, by Deepmind
- MAE, by Facebook AI
- ViLT, by NAVER AI Lab
- ConvNeXT, by Facebook AI
- DiT, by Microsoft Research
- GLPN, by KAIST (Korea Advanced Institute of Science and Technology)
- DPT, by Intel Labs
- YOLOS, by School of EIC, Huazhong University of Science & Technology
- TAPEX by Microsoft Research
- LayoutLMv3, by Microsoft Research
- VideoMAE, by Multimedia Computing Group, Nanjing University
- Donut, by NAVER AI Lab
- X-CLIP, by Microsoft Research
- Deformable DETR, by SenseTime Research
- MarkupLM, by Microsoft Research
- LiLT, South China University of Technology
- Table Transformer, by Microsoft Research
- CLIPSeg, by University of Göttingen
- Audio Spectrogram Transformer, by MIT Computer Science and Artificial Intelligence Laboratory, Cambridge
- BiT, by Google AI
- ViT Hybrid, by Google AI
- Swin2SR, by CAIDAS, University of Würzburg
- GIT, by Microsoft Research
- UPerNet, by Peking University
- BLIP-2, by Salesforce
- InstructBLIP, by Salesforce
- FocalNet, by Microsoft Research
- PerSAM, by Shanghai Artificial Intelligence Laboratory
- DINOv2, by Meta AI
- ViTMatte, by HustVL
- Nougat, by Meta AI
- OWLv2, by Google AI
- SigLIP, by Google AI
- SlimSAM, by National University of Singapore
- Depth Anything, by The University of Hong Kong/TikTok
- UDOP, by Microsoft Research
- ZoeDepth, by Intel Research
- DINOv2 with Registers, by Meta
- ViTPose, by University of Sydney
- MetaCLIP 2, by Meta AI
Besides that, I help others add models to the library, including:
- Swin Transformer, by Microsoft Research
- mLUKE, by Studio Ousia
- Nyströmformer, by University of Wisconsin-Madison
- YOSO, by University of Wisconsin-Madison
- PoolFormer, by Sea AI Lab
- CvT, by Microsoft Research
- GroupViT, by NVIDIA
- TextNet, by Nanjing University
- Grounding DINO, by Tsinghua University
- KOSMOS 2.5, by Microsoft Research
I'm mostly working with PyTorch. For IDEs, I work with Visual Studio Code and Google Colab. It's all you need.
I learned everything about deep learning through self-study, mainly thanks to Andrej Karpathy's cs231n course at Stanford University (lectures are free on Youtube and assigment solutions can be found on Github).
-
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Jupyter Notebook ★ 12k 2mo agoExplain → -
Vision-Transformer-papers
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.
★ 201 4y agoExplain → -
tutorials
A repository containing general tutorials I'd like to share with the world.
Jupyter Notebook ★ 80 6mo agoExplain → -
transformers ⑂
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Python ★ 50 2mo agoExplain → -
awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
★ 49 2y agoExplain → -
Description2Process
Transforming textual descriptions into process models using deep learning
Jupyter Notebook ★ 15 7y agoExplain → -
coco-eval
A tiny package supporting distributed computation of COCO metrics for PyTorch models.
Python ★ 15 3y agoExplain → -
NielsRogge
Short README about myself.
★ 13 9mo agoExplain → -
arxiv-ocr
No description.
Python ★ 11 2mo agoExplain → -
unilm ⑂
UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond
★ 11 2y agoExplain → -
tapas_utils
A package containing utils for the PyTorch version of the Tapas algorithm.
Python ★ 11 5y agoExplain → -
CogVLM ⑂
a state-of-the-art-level open visual language model
★ 8 2y agoExplain → -
diffusion-notes
Some notes I took when learning about diffusion models.
★ 8 4y agoExplain → -
computer-vision-zero-to-one
A repository showcasing the entire workflow of putting computer vision models in production.
Jupyter Notebook ★ 7 3mo agoExplain → -
notebooks ⑂
Notebooks using the Hugging Face libraries 🤗
★ 5 3y agoExplain → -
LLaVA ⑂
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
Python ★ 5 2y agoExplain → -
mistral-src ⑂
Reference implementation of Mistral AI 7B v0.1 model.
★ 4 2y agoExplain → -
MedSAM ⑂
The official repository for MedSAM: Segment Anything in Medical Images.
★ 4 3y agoExplain → -
big_vision ⑂
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.
★ 4 2y agoExplain → -
rf-detr ⑂
RF-DETR is a real-time object detection model architecture developed by Roboflow, released under the Apache 2.0 license.
★ 4 1y agoExplain → -
ImageBind ⑂
ImageBind One Embedding Space to Bind Them All
★ 4 2y agoExplain → -
Matcha-TTS ⑂
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
★ 4 1y agoExplain → -
evaluation-parsing
This repository is meant for parsing evaluation results from Hugging Face models, and opening pull requests on the hub to display them at leaderboards.
Python ★ 3 2mo agoExplain → -
LlamaGen ⑂
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
★ 3 2y agoExplain → -
agents-poc
No description.
Python ★ 3 1y agoExplain → -
Deep-Learning-for-Computer-Vision
This repository contains the assignments I made during the 2019 version of the Deep Learning for Computer Vision course taught at the University of Michigan.
Jupyter Notebook ★ 3 5y agoExplain → -
micro_diffusion ⑂
Official repository for our work on micro-budget training of large-scale diffusion models.
★ 3 1y agoExplain → -
yolov10 ⑂
YOLOv10: Real-Time End-to-End Object Detection
★ 3 2y agoExplain → -
table-transformer ⑂
Model training and evaluation code for our dataset PubTables-1M, developed to support the task of table extraction from unstructured documents.
★ 3 2y agoExplain → -
nougat ⑂
Implementation of Nougat Neural Optical Understanding for Academic Documents
★ 3 2y agoExplain → -
vstar ⑂
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
★ 3 2y agoExplain → -
ml-intern ⑂
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
★ 2 1mo agoExplain → -
safetensors ⑂
Simple, safe way to store and distribute tensors
★ 2 2y agoExplain → -
agentic-document-ai-baselines
A repository with various baselines for the agentic-document-ai project.
Python ★ 2 3mo agoExplain → -
openclaw ⑂
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
★ 2 4mo agoExplain → -
verl ⑂
veRL: Volcano Engine Reinforcement Learning for LLM
★ 2 1y agoExplain → -
huggingface.js ⑂
Utilities to use the Hugging Face Hub API
★ 2 1mo agoExplain → -
releasing-research-code ⑂
Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)
★ 2 1y agoExplain → -
axcell ⑂
Tools for extracting tables and results from Machine Learning papers
★ 2 3y agoExplain → -
alltracker ⑂
No description.
★ 2 11mo agoExplain → -
VisualQuality-R1 ⑂
VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.
★ 2 1y agoExplain → -
whispering ⑂
No description.
★ 2 1y agoExplain → -
CS224n-Assignments
My solutions to the practical assignments of CS224n (Natural Language Processing with Deep Learning) [Stanford University-Winter 2019]
★ 2 6y agoExplain → -
CS224N-Natural-Language-Processing-with-Deep-Learning ⑂
No description.
JavaScript ★ 2 4y agoExplain → -
mmcv ⑂
OpenMMLab Computer Vision Foundation
★ 2 3y agoExplain → -
DarkIR ⑂
DarkIR: Robust Low-Light Image Restoration [Official PyTorch Implementation]
★ 2 1y agoExplain → -
MagicDriveDiT ⑂
Official implementation of the paper “MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control”
★ 2 1y agoExplain → -
GST ⑂
Official implementation of "GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers"
★ 2 1y agoExplain → -
unimatch ⑂
[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
★ 2 1y agoExplain → -
DocLayout-YOLO ⑂
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
★ 2 1y agoExplain → -
segment-anything-2 ⑂
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
★ 2 1y agoExplain → -
optimum ⑂
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
★ 2 3y agoExplain → -
trl ⑂
Train transformer language models with reinforcement learning.
★ 2 2y agoExplain → -
MeshAnythingV2 ⑂
From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"
★ 2 1y agoExplain → -
VAR ⑂
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
★ 2 2y agoExplain → -
community-events-1 ⑂
Place where folks can contribute to 🤗 community events
★ 2 3y agoExplain → -
diffusers ⑂
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
★ 2 1y agoExplain → -
LiLT ⑂
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
★ 2 3y agoExplain → -
DETA ⑂
Detection Transformers with Assignment
★ 2 3y agoExplain → -
enformer-pytorch ⑂
Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch
★ 2 4y agoExplain → -
DPT ⑂
Dense Prediction Transformers
★ 2 4y agoExplain → -
DCGAN-huggingface
An implementation of DCGAN, leveraging the HuggingFace ecosystem for getting data and pushing to the hub.
Python ★ 2 4y agoExplain → -
YOLOS ⑂
You Only Look at One Sequence (NeurIPS 2021)
★ 2 4y agoExplain → -
pix2seq ⑂
Pix2Seq - A general framework for turning RGB pixels into semantically meaningful sequences
★ 2 4y agoExplain → -
datasets ⑂
🤗 Fast, efficient, open-access datasets and evaluation metrics in PyTorch, TensorFlow, NumPy and Pandas
Python ★ 2 1y agoExplain → -
Open-Assistant ⑂
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
★ 2 3y agoExplain → -
open_lm ⑂
A repository for research on medium sized language models.
★ 2 2y agoExplain → -
AudioSep ⑂
Official implementation of "Separate Anything You Describe"
★ 2 2y agoExplain → -
scenic ⑂
Scenic: A Jax Library for Computer Vision Research and Beyond
★ 2 2y agoExplain → -
scikit-image ⑂
Image processing in Python
★ 2 2y agoExplain → -
dinov2 ⑂
PyTorch code and models for the DINOv2 self-supervised learning method.
★ 2 2y agoExplain → -
FAST ⑂
Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
★ 2 1y agoExplain → -
pytorch-image-models ⑂
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
★ 2 2y agoExplain → -
EasyOCR ⑂
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
★ 2 2y agoExplain → -
flights-mcp ⑂
An MCP server to search for flights.
★ 1 1y agoExplain → -
terminal-bench ⑂
A benchmark for LLMs on complicated tasks in the terminal
★ 1 5mo agoExplain → -
LW-DETR ⑂
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
★ 1 1y agoExplain → -
efficientsam3 ⑂
EfficientSAM3 compresses SAM3 into lightweight, edge-friendly models via progressive knowledge distillation for fast promptable concept segmentation and tracking.
★ 1 3mo agoExplain → -
DEIMv2 ⑂
[DEIMv2] Real Time Object Detection Meets DINOv3
Jupyter Notebook ★ 1 3mo agoExplain → -
NeMo ⑂
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
★ 1 7mo agoExplain → -
claude-agents-sdk-blog-post
No description.
Python ★ 1 5mo agoExplain → -
ai-deadlines-pwc ⑂
:alarm_clock: AI conference deadline countdowns
JavaScript ★ 1 1y agoExplain → -
openai-agents-python ⑂
A lightweight, powerful framework for multi-agent workflows
★ 1 7mo agoExplain → -
MetaCLIP ⑂
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
★ 1 10mo agoExplain → -
DepthAnythingAC ⑂
Official code for the paper: Depth Anything At Any Condition
★ 1 11mo agoExplain → -
vggt ⑂
[CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer
★ 1 1y agoExplain → -
ByteTrack ⑂
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
★ 1 3y agoExplain → -
perception_models ⑂
Code to repro PE and PLM
★ 1 1y agoExplain → -
text-embeddings-inference ⑂
A blazing fast inference solution for text embeddings models
★ 1 1y agoExplain → -
cloudsql-jump-start-solution-for-genai ⑂
A jump start solution using GKE or Cloud Run with Cloud SQL and VertexAI
★ 1 1y agoExplain → -
SimDINO ⑂
Implementation for SimDINO/SimDINOv2
★ 1 1y agoExplain → -
DeepMesh ⑂
Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
★ 1 1y agoExplain → -
OmniMamba ⑂
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
★ 1 1y agoExplain → -
yoloe ⑂
YOLOE: Real-Time Seeing Anything
★ 1 1y agoExplain → -
yolov12 ⑂
YOLOv12: Attention-Centric Real-Time Object Detectors
★ 1 1y agoExplain → -
openai-realtime-agents ⑂
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
★ 1 1y agoExplain → -
open-webui ⑂
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
★ 1 1y agoExplain → -
Show-o ⑂
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
★ 1 1y agoExplain → -
text-generation-inference ⑂
Large Language Model Text Generation Inference
★ 1 1y agoExplain → -
sos-bench ⑂
This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style over Substance: Failure modes of LLM judges in alignment benchmarking", including the MisMo-Bench meta-benchmark.
★ 1 1y agoExplain → -
lerobot ⑂
🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
★ 1 1y agoExplain → -
Apollo ⑂
Music repair method to convert lossy MP3 compressed music to lossless music.
★ 1 1y agoExplain → -
mini-omni ⑂
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
★ 1 1y agoExplain → -
LightenDiffusion ⑂
Official pytorch implementation for "LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models"
★ 1 1y agoExplain → -
LeYOLO ⑂
No description.
★ 1 1y agoExplain → -
ZIM ⑂
ZIM: Zero-Shot Image Matting for Anything
★ 1 1y agoExplain → -
ml-veclip ⑂
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
★ 1 1y agoExplain → -
clip_dinoiser ⑂
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
★ 1 1y agoExplain → -
Long-CLIP ⑂
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
★ 1 1y agoExplain → -
evaluate ⑂
A library for easily evaluating machine learning models and datasets.
★ 1 3y agoExplain → -
VidGen ⑂
No description.
★ 1 1y agoExplain → -
lightweight-gan ⑂
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
★ 1 4y agoExplain → -
swin2sr ⑂
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration at the Advances in Image Manipulation (AIM) workshop ECCV 2022, Tel Aviv
★ 1 3y agoExplain → -
ast ⑂
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
★ 1 3y agoExplain → -
clipseg ⑂
This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".
★ 1 3y agoExplain → -
MaskFormer ⑂
Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)
★ 1 3y agoExplain → -
bros ⑂
No description.
★ 1 3y agoExplain → -
GenerativeImage2Text ⑂
GIT: A Generative Image-to-text Transformer for Vision and Language
★ 1 3y agoExplain → -
mmsegmentation ⑂
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
★ 1 3y agoExplain → -
GLPDepth ⑂
GLPDepth PyTorch Implementation: Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
★ 1 4y agoExplain → -
ConditionalDETR ⑂
This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence". (https://arxiv.org/abs/2108.06152)
★ 1 4y agoExplain → -
vision ⑂
Datasets, Transforms and Models specific to Computer Vision
★ 1 4y agoExplain → -
mmclassification ⑂
OpenMMLab Image Classification Toolbox and Benchmark
★ 1 4y agoExplain → -
VideoMAE ⑂
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
★ 1 4y agoExplain → -
FocalNet ⑂
[NeurIPS 2022] Official code for "Focal Modulation Networks"
★ 1 3y agoExplain → -
H3 ⑂
Language Modeling with the H3 State Space Model
★ 1 3y agoExplain → -
segment-anything ⑂
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
★ 1 3y agoExplain → -
detectron2 ⑂
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
★ 1 3y agoExplain → -
clip-retrieval ⑂
Easily compute clip embeddings and build a clip retrieval system with them
★ 1 3y agoExplain → -
BLIP ⑂
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Jupyter Notebook ★ 1 3y agoExplain → -
Personalize-SAM ⑂
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
★ 1 3y agoExplain → -
ViTMatte ⑂
Boosting Image Matting with Pretrained Plain Vision Transformers
★ 1 3y agoExplain → -
mmdeploy ⑂
OpenMMLab Model Deployment Framework
★ 1 3y agoExplain → -
Cream ⑂
This is a collection of our NAS and Vision Transformer work.
★ 1 3y agoExplain → -
mmdetection ⑂
OpenMMLab Detection Toolbox and Benchmark
★ 1 2y agoExplain → -
datacomp ⑂
DataComp: In search of the next generation of multimodal datasets
★ 1 2y agoExplain → -
SPTSv2 ⑂
The official implementation of SPTS v2: Single-Point Text Spotting
★ 1 2y agoExplain → -
T-MARS ⑂
Code for T-MARS data filtering
★ 1 2y agoExplain → -
GroundingDINO ⑂
The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
★ 1 2y agoExplain → -
mamba ⑂
No description.
★ 1 1y agoExplain → -
olmocr ⑂
Toolkit for linearizing PDFs for LLM datasets/training
★ 0 2mo agoExplain → -
sam3 ⑂
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
★ 0 2mo agoExplain → -
posterskill ⑂
AI-assisted academic posters.
★ 0 3mo agoExplain → -
tax-calc-bench ⑂
Code & data for TaxCalcBench
★ 0 3mo agoExplain → -
OSWorld ⑂
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
★ 0 3mo agoExplain → -
roboflow-notebooks ⑂
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.
★ 0 3mo agoExplain → -
skills ⑂
No description.
★ 0 2mo agoExplain → -
videomt ⑂
Official code and models for Video Encoder-only Mask Transformer (VidEoMT).
★ 0 3mo agoExplain → -
ai-deadlines ⑂
⏰ AI conference deadline countdowns
TypeScript ★ 0 3mo agoExplain → -
sglang ⑂
SGLang is a fast serving framework for large language models and vision language models.
★ 0 6mo agoExplain → -
cvelistV5 ⑂
CVE cache of the official CVE List in CVE JSON 5 format
★ 0 8mo agoExplain → -
eomt ⑂
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
★ 0 8mo agoExplain → -
gsplat ⑂
CUDA accelerated rasterization of gaussian splatting
★ 0 1y agoExplain → -
gemini-fullstack-langgraph-quickstart ⑂
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
★ 0 1y agoExplain → -
anycam ⑂
Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)
★ 0 1y agoExplain → -
FoundationStereo ⑂
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
★ 0 1y agoExplain → -
dspy ⑂
DSPy: The framework for programming—not prompting—language models
★ 0 1y agoExplain → -
DiffusionSfM ⑂
[CVPR 2025] "DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion" official implementation.
★ 0 1y agoExplain → -
nanoVLM ⑂
The simplest, fastest repository for training/finetuning small-sized VLMs.
★ 0 1y agoExplain → -
loftup ⑂
Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"
★ 0 1y agoExplain → -
webssl ⑂
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
★ 0 1y agoExplain → -
dia ⑂
A TTS model capable of generating ultra-realistic dialogue in one pass.
★ 0 1y agoExplain → -
blt ⑂
Code for BLT research paper
★ 0 1y agoExplain → -
DLF ⑂
[AAAI'25] DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
★ 0 1y agoExplain → -
DUTrack ⑂
The official implementation for the CVPR'2025 paper Dynamic Updates for Language Adaptation in Visual-Language Tracking
★ 0 1y agoExplain → -
Opt_CWM ⑂
Official PyTorch Implementation of Opt-CWM: Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals.
★ 0 1y agoExplain → -
TwinLiteNetPlus ⑂
No description.
★ 0 1y agoExplain → -
sonata ⑂
[CVPR'25] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
★ 0 1y agoExplain → -
1d-tokenizer ⑂
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
★ 0 1y agoExplain → -
csm ⑂
A Conversational Speech Generation Model
★ 0 1y agoExplain → -
HVI-CIDNet ⑂
[CVPR2025] HVI: A New Color Space for Low-light Image Enhancement && "You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement"
★ 0 1y agoExplain → -
fractalgen ⑂
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
★ 0 1y agoExplain → -
audiobox-aesthetics ⑂
Unified automatic quality assessment for speech, music, and sound.
★ 0 1y agoExplain → -
browser-use ⑂
Make websites accessible for AI agents
★ 0 1y agoExplain → -
tea-client ⑂
Helper library for writing API clients.
★ 0 1y agoExplain → -
AnySat ⑂
No description.
★ 0 1y agoExplain → -
OmniSat ⑂
No description.
★ 0 1y agoExplain → -
multi-hmr ⑂
Pytorch demo code and models for Multi-HMR
★ 0 1y agoExplain → -
OneRestore ⑂
[ECCV 2024] OneRestore: A Universal Restoration Framework for Composite Degradation
★ 0 1y agoExplain → -
guardrails ⑂
Adding guardrails to large language models.
★ 0 1y agoExplain → -
paperswithcode-client ⑂
API Client for paperswithcode.com
★ 0 1y agoExplain → -
chat-ui ⑂
Open source codebase powering the HuggingChat app
★ 0 1y agoExplain → -
Lotus ⑂
Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
★ 0 1y agoExplain → -
shic ⑂
Official implementation of the 2024 ECCV paper SHIC: Shape-Image Correspondences with no Keypoint Annotation
★ 0 1y agoExplain → -
GenerateCT ⑂
ECCV 2024 & GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
★ 0 1y agoExplain → -
FluxMusic ⑂
Text-to-Music Generation with Rectified Flow Transformers
★ 0 1y agoExplain → -
PGTFormer ⑂
[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
★ 0 1y agoExplain → -
StreamingT2V ⑂
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
★ 0 1y agoExplain → -
EMA-VFI ⑂
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
★ 0 1y agoExplain → -
CoMAE ⑂
[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
★ 0 1y agoExplain → -
VFIMamba ⑂
VFIMamba: Video Frame Interpolation with State Space Models
★ 0 1y agoExplain → -
doubletake ⑂
[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation
★ 0 1y agoExplain → -
CounTR ⑂
CounTR: Transformer-based Generalised Visual Counting
★ 0 1y agoExplain → -
CSD ⑂
No description.
★ 0 1y agoExplain → -
AiM ⑂
Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"
★ 0 1y agoExplain → -
count_token_optimization ⑂
No description.
★ 0 1y agoExplain → -
NeuFlow_v2 ⑂
No description.
★ 0 1y agoExplain → -
silero-vad ⑂
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
★ 0 1y agoExplain → -
vggsfm ⑂
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
★ 0 1y agoExplain → -
co-tracker ⑂
CoTracker is a model for tracking any point (pixel) on a video.
★ 0 1y agoExplain →
No repos match these filters.