duo-attention
Python
★ 539
updated 1y ago
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
No plain-English explanation yet — one is being written right now. Check back in a minute.