FlexGen
Python
★ 0
updated 3y ago
⑂ fork
Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation.
No plain-English explanation yet — one is being written right now. Check back in a minute.