Members
-
streaming-llm ★ PINNED
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Python ★ 7.2k 1y agoExplain → -
llm-awq ★ PINNED
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python ★ 3.6k 11mo agoExplain → -
efficientvit ★ PINNED
Efficient vision foundation models for high-resolution generation and perception.
Python ★ 3.3k 9mo agoExplain → -
bevfusion ★ PINNED ▣
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Python ★ 3.2k 1y agoExplain → -
temporal-shift-module ★ PINNED
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Python ★ 2.2k 1y agoExplain → -
once-for-all ★ PINNED
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Python ★ 2.0k 2y agoExplain → -
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python ★ 1.7k 1y agoExplain → -
torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Jupyter Notebook ★ 1.6k 7mo agoExplain → -
torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Cuda ★ 1.5k 1y agoExplain → -
proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
C++ ★ 1.4k 1y agoExplain → -
data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Python ★ 1.3k 1y agoExplain → -
tinyml
No description.
Python ★ 1.2k 2y agoExplain → -
gan-compression
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
Python ★ 1.1k 2y agoExplain → -
streaming-vlm
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Python ★ 1.0k 8mo agoExplain → -
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
C++ ★ 956 1y agoExplain → -
tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
C ★ 950 1y agoExplain → -
omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
C++ ★ 844 1y agoExplain → -
anycost-gan
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Python ★ 780 2y agoExplain → -
distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Python ★ 725 1y agoExplain → -
fastcomposer
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Python ★ 716 1y agoExplain → -
mcunet
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Python ★ 700 2y agoExplain → -
pvcnn ▣
[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Python ★ 679 4y agoExplain → -
hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Python ★ 648 1y agoExplain → -
spvnas ▣
[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Python ★ 621 1y agoExplain → -
kernel-design-agents
No description.
★ 613 19d agoExplain → -
lite-transformer ▣
[ICLR 2020] Lite Transformer with Long-Short Range Attention
Python ★ 611 1y agoExplain → -
radial-attention
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
Python ★ 602 7mo agoExplain → -
duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Python ★ 539 1y agoExplain → -
Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
C++ ★ 527 5mo agoExplain → -
tiny-training
On-Device Training Under 256KB Memory [NeurIPS'22]
Python ★ 521 2y agoExplain → -
dlg
[NeurIPS 2019] Deep Leakage From Gradients
Python ★ 485 4y agoExplain → -
amc
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python ★ 450 2y agoExplain → -
vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Python ★ 425 1y agoExplain → -
vlash
Real-Time VLAs via Future-state-aware Asynchronous Inference.
Python ★ 418 2mo agoExplain → -
haq
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Python ★ 411 5y agoExplain → -
Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Cuda ★ 396 11mo agoExplain → -
offsite-tuning
Offsite-Tuning: Transfer Learning without Full Model
Python ★ 388 2y agoExplain → -
hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Python ★ 338 1y agoExplain → -
litepose
[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Python ★ 326 2y agoExplain → -
x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
Python ★ 281 11mo agoExplain → -
KernelWiki
No description.
Python ★ 259 11d agoExplain → -
flash-moba
No description.
C++ ★ 250 7mo agoExplain → -
inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
C++ ★ 201 4y agoExplain → -
fouroversix
Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”
Python ★ 194 2mo agoExplain → -
parallel-computing-tutorial
No description.
C++ ★ 178 2y agoExplain → -
fastrl
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Python ★ 173 3mo agoExplain → -
amc-models
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Python ★ 169 5y agoExplain → -
apq
[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Python ★ 161 6y agoExplain → -
flatformer
[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Python ★ 142 2y agoExplain → -
spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Scala ★ 136 1y agoExplain → -
ncu-report-skill
No description.
Python ★ 129 27d agoExplain → -
lpd
[ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Python ★ 104 1mo agoExplain → -
patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
Python ★ 97 1y agoExplain → -
mlsys2026-flashinfer-contest
No description.
Python ★ 88 7d agoExplain → -
sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Python ★ 82 2y agoExplain → -
foreact
[CVPR 2026 Highlight] ForeAct: Steering Your VLA with Efficient Visual Foresight Planning
Python ★ 79 1mo agoExplain → -
tinychat-tutorial
No description.
C++ ★ 79 1y agoExplain → -
bnn-icestick
Binary Neural Network on IceStick FPGA.
Jupyter Notebook ★ 56 8y agoExplain → -
e3d
Efficient 3D Deep Learning
★ 48 5y agoExplain → -
neurips-micronet
[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
Jupyter Notebook ★ 42 5y agoExplain → -
pruning-sparsity-publications
No description.
★ 31 3y agoExplain → -
vcpo
[ICML 2026] Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
Python ★ 27 1mo agoExplain → -
VisCompare
A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders
Python ★ 27 1y agoExplain → -
SMEPO
No description.
Python ★ 16 24d agoExplain → -
iccad-tinyml-open
[ICCAD'22 TinyML Contest] Efficient Heart Stroke Detection on Low-cost Microcontrollers
C ★ 16 3y agoExplain → -
sparserefine
[ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Python ★ 15 1y agoExplain → -
calo-cluster
No description.
Jupyter Notebook ★ 9 4y agoExplain → -
ml-blood-pressure
No description.
Python ★ 9 3y agoExplain → -
mlsys2026-flashinfer-contest-solution
No description.
Python ★ 6 28d agoExplain → -
data-efficient-gans-dynamic
No description.
Python ★ 5 5y agoExplain → -
gan-compression-dynamic
No description.
Python ★ 5 5y agoExplain → -
mmpose ⑂
OpenMMLab Pose Estimation Toolbox and Benchmark.
★ 2 4y agoExplain →
No repos match these filters.