native-sparse-attention-pytorch
Python
★ 808
updated 10mo ago
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
No plain-English explanation yet — one is being written right now. Check back in a minute.