-
FlexLLMGen ▣
Running large language models on a single GPU for throughput-oriented scenarios.
Python ★ 9.4k 1y agoExplain → -
H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Python ★ 522 1y agoExplain → -
DejaVu
No description.
Python ★ 358 2y agoExplain →
No repos match these filters.