web-llm
High-performance In-browser LLM Inference Engine
Runs large language models entirely inside a web browser using the device's graphics card, with no server required, as a drop-in replacement for the OpenAI API.
WebLLM is a tool that lets you run large language models — the kind of AI that powers chatbots like ChatGPT — directly inside a web browser, with no server doing the work behind the scenes. Everything happens on the user's own machine, accelerated by WebGPU, a modern browser standard that lets web pages tap into the computer's graphics card for fast computation.
The project is designed as a drop-in replacement for the OpenAI API. If you have an app that already talks to ChatGPT, you can point it at WebLLM and keep the same code, including features like streaming responses and structured JSON output. Function calling is listed as a work in progress. WebLLM ships with support for several open-source model families including Llama 3, Phi 3, Gemma, Mistral, and Qwen, and you can compile and load your own custom models in the MLC format.
You would reach for WebLLM if you want to ship AI features in a web app without paying for a cloud API or sending user data off the user's device, or if offline use matters for your audience. It can offload work to Web Workers or Service Workers so the user interface stays responsive, and it can be packaged into Chrome extensions. The package is written in TypeScript and is published on NPM; it can also be loaded straight from a CDN for quick prototyping in tools like JSFiddle or CodePen. It is a companion to the broader MLC LLM project, which targets the same models across other hardware environments. The full README is longer than what was provided.
Where it fits
- Add AI chat to a web app without paying for a cloud API or sending any user data to a server.
- Replace OpenAI API calls in an existing web app with on-device AI by swapping in WebLLM with the same code.
- Build a Chrome extension with built-in AI that works offline using local model inference in the browser.
- Prototype an AI-powered web tool quickly by loading WebLLM from a CDN in JSFiddle or CodePen.