gitmyhub

auto-round

Python ★ 1.5k updated 21h ago

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

No plain-English explanation yet — one is being written right now. Check back in a minute.