gitmyhub

speech_recognition

Python ★ 9.0k updated 4d ago

Speech recognition module for Python, supporting several engines and APIs, online and offline.

A Python library that converts spoken audio to text, wrapping many speech recognition services and offline engines behind a single consistent API so you can switch providers without rewriting your code.

PythonPyAudioWhisperVoskPocketSphinxsetup: easycomplexity 2/5

SpeechRecognition is a Python library that converts spoken audio into text. Its main strength is that it acts as a unified wrapper around many different speech recognition services and engines, so you can switch between them without rewriting your code. You install it with a single pip command and can have audio transcribed with just a few lines of Python.

The services it supports span both online APIs and options that work entirely offline. Online options include Google Speech Recognition, Google Cloud Speech, Microsoft Azure Speech, Wit.ai, IBM Speech to Text, Groq's Whisper API, and the Cohere Transcribe API. Offline options include CMU Sphinx, Vosk, Snowboy (for detecting specific trigger words), and OpenAI Whisper running locally on your own machine. OpenAI-compatible self-hosted servers such as Ollama are also supported through the OpenAI API path.

You can feed audio to the library from a microphone connected to your computer or from an audio file. The library includes tools to help manage microphone input, such as calibrating sensitivity to the ambient noise level in the room and listening in the background while other code keeps running. Examples in the repository demonstrate common tasks: recording from a microphone, transcribing a file, saving audio to disk, and adjusting recognition settings.

Not every dependency is required upfront. The core package installs cleanly and you only add the additional libraries for the specific engine you want to use. For instance, microphone input requires PyAudio, Sphinx requires PocketSphinx, Whisper requires the whisper package, and Vosk requires the vosk package. This keeps the installation lightweight if you only need one or two of the supported engines.

The library is designed for Python 3.9 and above and is distributed via PyPI. Source code and an issue tracker are on GitHub. The README includes a complete reference document for every public class and method.

Where it fits