gitmyhub

ffpa-attn

Python β˜… 310 updated 3d ago

πŸ€–FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3Γ—β†‘πŸŽ‰ vs SDPA, up to 430TπŸŽ‰ on H200.

No plain-English explanation yet β€” one is being written right now. Check back in a minute.