vLLM ORG

43 repos
3.5k followers
0 following

Python 58%
Rust 12%
Go 6%
C++ 6%
TypeScript 6%

All public repos (43)

Show forks Show archived

vllm ★ PINNED

A high-throughput and memory-efficient inference and serving engine for LLMs

Python ★ 85k 4h ago
Explain →
vllm-omni ★ PINNED

A framework for efficient model inference with omni-modality models

Python ★ 5.3k 5h ago
Explain →
recipes ★ PINNED

Common recipes to run vLLM

JavaScript ★ 884 1d ago
Explain →
llm-compressor ★ PINNED

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python ★ 3.5k 1d ago
Explain →
speculators ★ PINNED

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python ★ 534 1d ago
Explain →
semantic-router ★ PINNED

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Go ★ 4.6k 2h ago
Explain →
aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go ★ 4.9k 1d ago
Explain →
production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python ★ 2.4k 2d ago
Explain →
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

C++ ★ 2.3k 13h ago
Explain →
vllm-metal

Community maintained hardware plugin for vLLM on Apple Silicon

Python ★ 1.4k 15h ago
Explain →
guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python ★ 1.3k 23h ago
Explain →
tpu-inference

TPU inference for vLLM, with unified JAX and PyTorch support.

Python ★ 370 2h ago
Explain →
vime

An LLM post-training framework with vLLM for RL Scaling

Python ★ 309 15h ago
Explain →
compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Python ★ 293 9h ago
Explain →
router

A high-performance and light-weight router for vLLM large scale deployment

Rust ★ 287 1mo ago
Explain →
flash-attention ⑂

Fast and memory-efficient exact attention

Python ★ 128 2d ago
Explain →
vllm-skills

Agent skills for vLLM

Shell ★ 86 2mo ago
Explain →
vllm-openvino

No description.

Python ★ 54 6mo ago
Explain →
vllm-daily

vLLM Daily Summarization of Merged PRs

★ 51 1d ago
Explain →
vllm-project.github.io

No description.

HTML ★ 51 11d ago
Explain →
vllm-xpu-kernels

The vLLM XPU kernels for Intel GPU

C++ ★ 49 3d ago
Explain →
ci-infra

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL ★ 44 1d ago
Explain →
vllm-gaudi

Community maintained hardware plugin for vLLM on Intel Gaudi

Python ★ 43 1d ago
Explain →
agentic-api

Stateful API logic for agentic applications using vLLM

Rust ★ 37 2d ago
Explain →
vllm-bench

High-performance Rust benchmark client for vLLM serving endpoints.

Rust ★ 33 6d ago
Explain →
vllm-neuron

Community maintained hardware plugin for vLLM on AWS Neuron

Python ★ 33 1mo ago
Explain →
dllm-plugin

vLLM plugin for block-based diffusion language model (dLLM) support

Python ★ 23 1mo ago
Explain →
vllm-nccl ▣

Manages vllm-nccl dependency

Python ★ 18 2y ago
Explain →
FlashMLA ⑂

No description.

C++ ★ 14 2mo ago
Explain →
bart-plugin

vLLM Model plugin for the encoder-decoder BART model

Python ★ 12 4d ago
Explain →
vllm-gguf-plugin

vLLM Quantization plugin for GGUF

Python ★ 11 16d ago
Explain →
vLLM-in-PyTorch-Conference-2025

No description.

★ 11 6mo ago
Explain →
vllm-project.github.io-static ▣

No description.

HTML ★ 10 1y ago
Explain →
media-kit

vLLM Logo Assets

★ 9 5mo ago
Explain →
perf-eval

Performance benchmark & accuracy evaluation for vLLM

Python ★ 8 19h ago
Explain →
vllm-dashboard

No description.

TypeScript ★ 7 4d ago
Explain →
perf-dashboard

Performance dashboard for vLLM

Python ★ 3 3mo ago
Explain →
vllm-bnb-plugin

vLLM Quantization plugin for bitsandbytes

Python ★ 1 21d ago
Explain →
rfcs

No description.

★ 1 1y ago
Explain →
llm-multimodal

Standalone fork of llm-multimodal from SMG

Rust ★ 0 1d ago
Explain →
DeepGEMM ⑂

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda ★ 0 1d ago
Explain →
MSA ⑂

No description.

★ 0 6d ago
Explain →
vllm-docs

No description.

TypeScript ★ 0 1mo ago
Explain →