GPTQModel
★ 0
updated 8mo ago
⑂ fork
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
No plain-English explanation yet — one is being written right now. Check back in a minute.