SageAttention
Cuda
★ 0
updated 5mo ago
⑂ fork
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
No plain-English explanation yet — one is being written right now. Check back in a minute.