gitmyhub

FlexGen

Python ★ 0 updated 3y ago ⑂ fork

Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation.

No plain-English explanation yet — one is being written right now. Check back in a minute.