neural-compressor
Python
★ 2.7k
updated 2d ago
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
No plain-English explanation yet — one is being written right now. Check back in a minute.