Tiiny AI ORG

@Tiiny-AI ·tiiny.ai

1 repos
127 followers
0 following

C++ 100%

All public repos (1)

Show forks Show archived

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a C++ inference engine that runs large AI language models on a consumer GPU by keeping frequently-used model parts on the GPU and rarely-used parts on the CPU, achieving speeds up to 11x faster than CPU-only alternatives.

C++ ★ 9.6k 1mo ago
Explain →