turboquant
Python
★ 71
updated 25d ago
First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.
No plain-English explanation yet — one is being written right now. Check back in a minute.