tree_attention
Python
★ 134
updated 1y ago
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
No plain-English explanation yet — one is being written right now. Check back in a minute.