gitmyhub

duo-attention

Python ★ 539 updated 1y ago

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

No plain-English explanation yet — one is being written right now. Check back in a minute.