DeepSpeech

C++ ★ 27k updated 1y ago ▣ archived

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

DeepSpeech was Mozilla's open-source offline speech-to-text engine that ran entirely on-device, even on a Raspberry Pi, but is now discontinued and no longer maintained.

C++PythonTensorFlowsetup: hardcomplexity 4/5

DeepSpeech was Mozilla's open-source speech-to-text engine — software that listens to audio and converts spoken words into written text, entirely on-device without sending anything to the cloud. It was designed to run offline, which made it attractive for privacy-sensitive applications or situations where internet access wasn't available.

A key technical achievement was its ability to run on low-power hardware: it could transcribe speech in real time on a Raspberry Pi (a credit-card-sized computer costing around $35), as well as on more powerful GPU servers. This range made it useful for everything from embedded smart home devices to large-scale transcription pipelines.

Note: this project has been discontinued by Mozilla and is no longer actively maintained. For developers looking for a similar capability today, Mozilla's work here influenced several successor projects, and alternatives like Whisper (from OpenAI) have largely taken over this space. The code and pre-trained models remain available for historical reference or for projects that need to build on the existing foundation, but you should not start a new project expecting ongoing updates or support.

Where it fits

Build an offline voice transcription tool that converts speech to text without sending audio to the cloud
Add voice command recognition to a Raspberry Pi project like a local smart home assistant
Transcribe audio files in bulk on a GPU server without per-call API costs

Open on GitHub → Full breakdown on explaingit →