ipex-llm

Python ★ 1 updated 2y ago ⑂ fork

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

No plain-English explanation yet — one is being written right now. Check back in a minute.

Open on GitHub → Full breakdown on explaingit →