memory-efficient-attention-pytorch
Python
★ 392
updated 2y ago
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
No plain-English explanation yet — one is being written right now. Check back in a minute.