attn_res
Python
★ 30
updated 3mo ago
A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Embeddings (RoPE).
No plain-English explanation yet — one is being written right now. Check back in a minute.