sparser-faster-llms
Cuda
★ 245
updated 1mo ago
Cuda kernels for leveraging LLM sparsity to improve throughput and decrease the memory requirements during inference and training.
No plain-English explanation yet — one is being written right now. Check back in a minute.