DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeech was Mozilla's open-source offline speech-to-text engine that ran entirely on-device, even on a Raspberry Pi, but is now discontinued and no longer maintained.
DeepSpeech was Mozilla's open-source speech-to-text engine — software that listens to audio and converts spoken words into written text, entirely on-device without sending anything to the cloud. It was designed to run offline, which made it attractive for privacy-sensitive applications or situations where internet access wasn't available.
A key technical achievement was its ability to run on low-power hardware: it could transcribe speech in real time on a Raspberry Pi (a credit-card-sized computer costing around $35), as well as on more powerful GPU servers. This range made it useful for everything from embedded smart home devices to large-scale transcription pipelines.
Note: this project has been discontinued by Mozilla and is no longer actively maintained. For developers looking for a similar capability today, Mozilla's work here influenced several successor projects, and alternatives like Whisper (from OpenAI) have largely taken over this space. The code and pre-trained models remain available for historical reference or for projects that need to build on the existing foundation, but you should not start a new project expecting ongoing updates or support.
Where it fits
- Build an offline voice transcription tool that converts speech to text without sending audio to the cloud
- Add voice command recognition to a Raspberry Pi project like a local smart home assistant
- Transcribe audio files in bulk on a GPU server without per-call API costs