1-day current streak·10-day longest streak
-
vllm-nvfp4-kv-sm120
NVFP4 KV cache for vLLM on SM120 (RTX PRO 6000) via FlashInfer FA2 explicit-SF-stride patch — ~1.5x fp8 pool at ~95-104% speed
Python ★ 15 14d agoExplain → -
swarm-agent
High-concurrency goal-driven swarm harness for Step-3.7-Flash: conversational front door, stigmergic board, AIMD admission, recall, skills, web UI.
Python ★ 7 8d agoExplain → -
glm52-speedup
GLM-5.2 UD-Q2 decode speedup: CPU∥GPU MoE expert-split (harness, specs, de-risk)
C++ ★ 2 2h agoExplain → -
nvfp4-vs-fp8-kv-cache-terminal-bench
FP8 vs NVFP4 (4-bit) KV cache on Terminal-Bench 2.0 — no measurable accuracy loss, 1.78x more KV capacity. Full results + verify.py.
Python ★ 2 11d agoExplain → -
sglang-nvfp4-kv-sm120
SGLang NVFP4 (fp4_e2m1) KV cache for Blackwell SM120 (RTX PRO 6000): FlashInfer FA2 kernel patches + native FP4 pool + hybrid-SWA wiring + per-layer global-scale auto-calibration. 1.778x KV capacity, ~4% decode cost. Validated end-to-end on Step-3.7-Flash 198B (cuda-graph, TP=2). Small models hit the 4-bit precision floor (use fp8 KV).
Python ★ 2 12d agoExplain → -
gemma4-repe
Gemma-4-31B representation engineering: dataset-free personality control + process-level sycophancy ablation
Python ★ 1 6d agoExplain → -
llama.cpp-glm52
llama.cpp fork: GLM-5.2 CPU∥GPU MoE expert-split decode speedup
C++ ★ 0 2h agoExplain → -
dsv4-180b-tb2-eval
DeepSeek-V4-Flash-180B (REAP K160) × Terminal-Bench 2.0 — reproducible evaluation deliverable
Python ★ 0 3d agoExplain →
No repos match these filters.