gitmyhub

Blockwise-Parallel-Transformer

Python ★ 50 updated 3y ago

32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.

No plain-English explanation yet — one is being written right now. Check back in a minute.