sglang-nvfp4-kv-sm120
Python
★ 2
updated 12d ago
SGLang NVFP4 (fp4_e2m1) KV cache for Blackwell SM120 (RTX PRO 6000): FlashInfer FA2 kernel patches + native FP4 pool + hybrid-SWA wiring + per-layer global-scale auto-calibration. 1.778x KV capacity, ~4% decode cost. Validated end-to-end on Step-3.7-Flash 198B (cuda-graph, TP=2). Small models hit the 4-bit precision floor (use fp8 KV).
No plain-English explanation yet — one is being written right now. Check back in a minute.