13-day current streak·14-day longest streak
-
vllm ★ PINNED ⑂
A high-throughput and memory-efficient inference and serving engine for LLMs
Python ★ 0 1d agoExplain → -
llm-d ★ PINNED ⑂
llm-d is a Kubernetes-native high-performance distributed LLM inference framework
Makefile ★ 0 1mo agoExplain → -
blis ★ PINNED ⑂
BLAS-like Library Instantiation Software Framework
C ★ 1 3y agoExplain → -
coolS
A LaTeX package to use the Cool S as a symbol in math equations
TeX ★ 9 8y agoExplain → -
momms
Multilevel Optimized Matrix-matrix Multiplication Sandbox
C ★ 8 7y agoExplain → -
j-pareto
No description.
Python ★ 3 1mo agoExplain → -
j-llm-d
Justfile harness for llm-d
Just ★ 2 3d agoExplain → -
tms_submod
Routines for submodular set function minimization
C++ ★ 2 6y agoExplain → -
prefill-decode-experiments
No description.
Just ★ 1 1y agoExplain → -
nvshmem-guide
No description.
★ 1 1y agoExplain → -
vllm-dp-lws
No description.
Dockerfile ★ 1 11mo agoExplain → -
blas_gemm_rust_driver
Just drivers to time mkl & blis dgemm written in Rust
Rust ★ 1 9y agoExplain → -
nightly-eval
No description.
Python ★ 0 8h agoExplain → -
claudectx
kubectx for AI coding agents — switch paired Claude Code + Codex CLI contexts (settings, tokens, skills, MCP servers) and translate config between them
Go ★ 0 15d agoExplain → -
dotfiles
No description.
Shell ★ 0 15d agoExplain → -
tv ⑂
No description.
★ 0 26d agoExplain → -
DeepGEMM ⑂
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
★ 0 1mo agoExplain → -
vllm-skills
No description.
Shell ★ 0 2mo agoExplain → -
flashinfer ⑂
FlashInfer: Kernel Library for LLM Serving
★ 0 1mo agoExplain → -
llmd-routing-bench
Benchmarking tool for the llm-d routing sidecar (P/D disaggregation overhead)
Go ★ 0 3mo agoExplain → -
DeepEP ⑂
DeepEP: an efficient expert-parallel communication library
★ 0 2mo agoExplain → -
vllm-dev-env
No description.
★ 0 5mo agoExplain → -
combine_traces
No description.
Python ★ 0 5mo agoExplain → -
llm-d-dev-img
No description.
Just ★ 0 6mo agoExplain → -
my_pods
No description.
Dockerfile ★ 0 9mo agoExplain → -
guidellm ⑂
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
★ 0 9mo agoExplain → -
llm-d-modelservice ⑂
No description.
★ 0 11mo agoExplain → -
benchmark-pod-interactive ⑂
Pod for benchmarking interactive in llm-d
★ 0 11mo agoExplain → -
llm-d-infra ⑂
llm-d helm charts and deployment examples
★ 0 10mo agoExplain → -
ptgq_fp8
No description.
Python ★ 0 11mo agoExplain → -
llm-d-inference-scheduler ⑂
Inference scheduler for llm-d
★ 0 2mo agoExplain → -
canhazgpu ⑂
A simple GPU reservation tool for single host shared development systems
★ 0 11mo agoExplain → -
ci-infra ⑂
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
★ 0 1y agoExplain → -
literate-bassoon
No description.
Just ★ 0 1y agoExplain → -
ibgda-repro
No description.
★ 0 1y agoExplain → -
llm-d-project-template
No description.
Just ★ 0 1y agoExplain → -
pd_examples
No description.
Just ★ 0 1y agoExplain → -
LMCache ⑂
Redis for LLMs
Python ★ 0 1y agoExplain → -
lmcache-tests ⑂
No description.
★ 0 1y agoExplain → -
lmcache-server ⑂
No description.
★ 0 1y agoExplain → -
lmcache-vllm ⑂
The driver for LMCache core to run in vLLM
★ 0 1y agoExplain → -
torchac_cuda ⑂
No description.
★ 0 1y agoExplain → -
cutlass ⑂
CUDA Templates for Linear Algebra Subroutines
★ 0 1y agoExplain → -
transformers ⑂
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python ★ 0 1y agoExplain → -
flash-attention ⑂
Fast and memory-efficient exact attention
★ 0 1y agoExplain → -
flux ⑂
A fast communication-overlapping library for tensor parallelism on GPUs.
C++ ★ 0 1y agoExplain → -
momms_exper_driver
No description.
Shell ★ 0 9y agoExplain →
No repos match these filters.