whisper.cpp

C++ ★ 51k updated 3d ago

Port of OpenAI's Whisper model in C/C++

C++ port of OpenAI's Whisper speech-to-text model that runs offline on any device, from desktops to Raspberry Pi, without Python or heavy dependencies.

CC++CMakeMetalCUDAWebAssemblyAVXsetup: moderatecomplexity 3/5

whisper.cpp is a C and C++ port of OpenAI's Whisper speech recognition model, which converts spoken audio into text. The original Whisper model was released by OpenAI as a Python implementation, which is convenient but requires Python, PyTorch, and significant dependencies to run. This project reimplements the same model inference from scratch in pure C and C++, making it possible to run speech-to-text conversion on almost any device without heavy software dependencies.

The core innovation is that the same model can now run efficiently on devices ranging from a desktop GPU down to a Raspberry Pi, an iPhone, or an Android device, entirely offline without sending audio to a server. It achieves this through platform-specific optimizations: on Apple Silicon Macs and iPhones it uses Apple's Metal GPU acceleration and Core ML framework, on NVIDIA GPUs it uses CUDA, on x86 CPUs it uses AVX instructions, and it even supports WebAssembly for running in a browser. The models come in several sizes from tiny to large, trading off accuracy against memory usage and speed. You download a model file in the ggml format, build the project with CMake, and then pass it an audio file to get a transcript.

You would use whisper.cpp when you need offline, on-device speech-to-text transcription without cloud services, when you want to embed Whisper into a non-Python application, or when you need to run it on a resource-constrained device. Common applications include transcribing recordings, building voice command interfaces, and generating subtitles. The tech stack is C and C++ with no mandatory external dependencies, built using CMake, with optional hardware-acceleration backends for Apple, NVIDIA, and Vulkan.

Where it fits

Transcribe audio files to text on your computer or phone without uploading to a server.
Build voice command interfaces that respond to spoken input entirely offline.
Generate subtitles for videos using speech recognition on your own hardware.
Embed speech-to-text into a non-Python application or resource-constrained device.

Open on GitHub → Full breakdown on explaingit →