MHA2MLA
★ 1
updated 1y ago
⑂ fork
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
No plain-English explanation yet — one is being written right now. Check back in a minute.