Foundation Model Inference ORG

Inference Systems for Foundation Models

3 repos
212 followers
0 following

Python 100%

All public repos (3)

Show forks Show archived

FlexLLMGen ▣

Running large language models on a single GPU for throughput-oriented scenarios.

A Python tool that runs massive AI language models on a single consumer GPU by spreading the model across GPU memory, RAM, and disk, designed for overnight batch jobs like classifying thousands of documents, not real-time chat.

Python ★ 9.4k 1y ago
Explain →
H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python ★ 522 1y ago
Explain →
DejaVu

No description.

Python ★ 358 2y ago
Explain →