Members
-
vllm ★ PINNED
A high-throughput and memory-efficient inference and serving engine for LLMs
Python ★ 85k 4h agoExplain → -
vllm-omni ★ PINNED
A framework for efficient model inference with omni-modality models
Python ★ 5.3k 5h agoExplain → -
recipes ★ PINNED
Common recipes to run vLLM
JavaScript ★ 884 1d agoExplain → -
llm-compressor ★ PINNED
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python ★ 3.5k 1d agoExplain → -
speculators ★ PINNED
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python ★ 534 1d agoExplain → -
semantic-router ★ PINNED
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Go ★ 4.6k 2h agoExplain → -
aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
Go ★ 4.9k 1d agoExplain → -
production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Python ★ 2.4k 2d agoExplain → -
vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
C++ ★ 2.3k 13h agoExplain → -
vllm-metal
Community maintained hardware plugin for vLLM on Apple Silicon
Python ★ 1.4k 15h agoExplain → -
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Python ★ 1.3k 23h agoExplain → -
tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
Python ★ 370 2h agoExplain → -
vime
An LLM post-training framework with vLLM for RL Scaling
Python ★ 309 15h agoExplain → -
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Python ★ 293 9h agoExplain → -
router
A high-performance and light-weight router for vLLM large scale deployment
Rust ★ 287 1mo agoExplain → -
flash-attention ⑂
Fast and memory-efficient exact attention
Python ★ 128 2d agoExplain → -
vllm-skills
Agent skills for vLLM
Shell ★ 86 2mo agoExplain → -
vllm-openvino
No description.
Python ★ 54 6mo agoExplain → -
vllm-daily
vLLM Daily Summarization of Merged PRs
★ 51 1d agoExplain → -
vllm-project.github.io
No description.
HTML ★ 51 11d agoExplain → -
vllm-xpu-kernels
The vLLM XPU kernels for Intel GPU
C++ ★ 49 3d agoExplain → -
ci-infra
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
HCL ★ 44 1d agoExplain → -
vllm-gaudi
Community maintained hardware plugin for vLLM on Intel Gaudi
Python ★ 43 1d agoExplain → -
agentic-api
Stateful API logic for agentic applications using vLLM
Rust ★ 37 2d agoExplain → -
vllm-bench
High-performance Rust benchmark client for vLLM serving endpoints.
Rust ★ 33 6d agoExplain → -
vllm-neuron
Community maintained hardware plugin for vLLM on AWS Neuron
Python ★ 33 1mo agoExplain → -
dllm-plugin
vLLM plugin for block-based diffusion language model (dLLM) support
Python ★ 23 1mo agoExplain → -
vllm-nccl ▣
Manages vllm-nccl dependency
Python ★ 18 2y agoExplain → -
FlashMLA ⑂
No description.
C++ ★ 14 2mo agoExplain → -
bart-plugin
vLLM Model plugin for the encoder-decoder BART model
Python ★ 12 4d agoExplain → -
vllm-gguf-plugin
vLLM Quantization plugin for GGUF
Python ★ 11 16d agoExplain → -
vLLM-in-PyTorch-Conference-2025
No description.
★ 11 6mo agoExplain → -
vllm-project.github.io-static ▣
No description.
HTML ★ 10 1y agoExplain → -
media-kit
vLLM Logo Assets
★ 9 5mo agoExplain → -
perf-eval
Performance benchmark & accuracy evaluation for vLLM
Python ★ 8 19h agoExplain → -
vllm-dashboard
No description.
TypeScript ★ 7 4d agoExplain → -
perf-dashboard
Performance dashboard for vLLM
Python ★ 3 3mo agoExplain → -
vllm-bnb-plugin
vLLM Quantization plugin for bitsandbytes
Python ★ 1 21d agoExplain → -
rfcs
No description.
★ 1 1y agoExplain → -
llm-multimodal
Standalone fork of llm-multimodal from SMG
Rust ★ 0 1d agoExplain → -
DeepGEMM ⑂
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda ★ 0 1d agoExplain → -
MSA ⑂
No description.
★ 0 6d agoExplain → -
vllm-docs
No description.
TypeScript ★ 0 1mo agoExplain →
No repos match these filters.