@Tiiny-AI ·tiiny.ai
High-speed Large Language Model Serving for Local Deployment
PowerInfer is a C++ inference engine that runs large AI language models on a consumer GPU by keeping frequently-used model parts on the GPU and rarely-used parts on the CPU, achieving speeds up to 11x faster than CPU-only alternatives.
No repos match these filters.