MiniCPM

Jupyter Notebook ★ 9.5k updated 1d ago

MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.

MiniCPM is a series of compact AI language models (8B parameters) designed to run on phones, laptops, and edge devices without a cloud connection, offering fast text generation and reasoning through techniques like speculative decoding.

PythonJupyter NotebookHuggingFace TransformersvLLMSGLangllama.cppOllamasetup: moderatecomplexity 4/5

MiniCPM is a series of small but capable language models built by OpenBMB and designed to run on everyday devices rather than large data-center servers. The goal is to pack as much reasoning ability as possible into a compact model size so the AI can work on phones, laptops, and edge hardware without requiring a cloud connection.

The latest releases are MiniCPM4 and MiniCPM4.1, both at 8 billion parameters. The team claims these reach over five times faster text generation compared to earlier models on typical consumer chips, and over three times faster on reasoning tasks. That speedup comes from techniques like speculative decoding, where a small draft model proposes text that the main model verifies in batches, and from a trainable sparse attention architecture called SALA that skips much of the computation for long documents.

MiniCPM4.1 adds a hybrid reasoning mode, meaning it can switch between a careful step-by-step thinking process and a faster direct-answer mode depending on the question. This matters because many questions do not need elaborate chains of thought, and forcing the model to reason slowly wastes time and battery.

You can download and run the models through standard tools like HuggingFace Transformers, vLLM, SGLang, llama.cpp, or Ollama. Quantized versions (GPTQ, AWQ, GGUF) are available for further size reduction. An Intel AIPC desktop client is also provided for Windows users who want a standalone app. The repo includes example code for running in Python with or without speculative decoding enabled.

Beyond plain text chat, the project ships two application examples: MiniCPM4-Survey for generating structured research overviews, and MiniCPM4-MCP for connecting the model to external tools using the Model Context Protocol. The full README is longer than what was shown.

Where it fits

Run a local AI chatbot on a laptop or phone without any internet connection.
Generate structured research overviews using the built-in MiniCPM4-Survey tool.
Connect a local language model to external tools and services using MiniCPM4-MCP.
Get fast on-device AI responses using quantized GGUF or AWQ model formats.

Open on GitHub → Full breakdown on explaingit →