gitmyhub

PrefixQuant

Python ★ 0 updated 5mo ago ⑂ fork

An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization

No plain-English explanation yet — one is being written right now. Check back in a minute.