gitmyhub

MoH

Python ★ 311 updated 1y ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

No plain-English explanation yet — one is being written right now. Check back in a minute.