gitmyhub

bebelm

Rust ★ 57 updated 3d ago

CPU-only, pure-Rust implementation of LiquidAI's LFM2.5-8B-A1B LLM

Run an AI language model entirely on your CPU, no graphics card needed. BebeLM uses a clever model design that keeps computation low enough for real-time responses on regular laptops and desktops.

RustLFM2.5-8B-A1BCargosetup: easycomplexity 2/5

BebeLM is a program that runs an AI language model entirely on your CPU, written in Rust. Most AI tools that run large language models require a dedicated graphics card (GPU) with several gigabytes of video memory. BebeLM is built around a model called LFM2.5-8B-A1B, which has a design that keeps the number of calculations per generated word low enough that a regular desktop or laptop CPU can produce responses at a pace that feels usable in real time.

The model has 8 billion parameters in total but only activates about 1 billion of them per step, which is what makes CPU-only inference feasible. You download a single file of model weights, roughly 5.2 gigabytes, and then run the tool against that file. The project has very few code dependencies and requires no extra system libraries beyond the Rust toolchain itself.

There are two ways to use it. The command-line interface gives you a chat mode for back-and-forth conversation and a generate mode for one-shot text completions. The model can show its reasoning process as a separate block before giving its final answer, and you can cap how long that reasoning block runs if you want shorter responses. You can also install the binary directly via Cargo, the Rust package manager, without cloning the repository.

Beyond the command-line tool, BebeLM is structured as a Rust library that other programs can import. The API lets you load the model once and run multiple conversations from the same loaded weights, pass a callback function to receive tokens as they are generated rather than waiting for the full response, and control sampling settings like temperature. The library handles all the low-level details of the model format and token handling.

The project has been tested on Apple M5, AMD Ryzen, and AMD Threadripper processors. The README notes it should also work on Intel CPUs and Raspberry Pi 4 and 5, though those have not been verified.

Where it fits