gitmyhub

ffpa-attn-mma

★ 0 updated 1y ago ⑂ fork

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.

No plain-English explanation yet — one is being written right now. Check back in a minute.