gitmyhub

livecaption

Python ★ 150 updated 4d ago

Real-time on-device speech transcription + translation for macOS (Apple Silicon). Streaming ASR, speaker diarization, and live English→Chinese translation on Apple GPU/MLX — terminal CLI, no UI, no cloud.

livecaption is a command-line tool for Mac computers with Apple Silicon chips that listens to audio in real time, converts speech to text, and translates that text from English to Chinese. Everything runs locally on the device with no internet connection or cloud service required. The output goes to the terminal window or a text file.

The tool can capture audio from a microphone, from system audio (such as the sound coming out of a Zoom or Teams call), or from both at the same time. It can also process a pre-recorded audio file. When listening to a conversation, it automatically identifies up to four different speakers and labels each line with a speaker tag like S1 or S2, so you can tell who said what.

Under the hood, three separate AI models run in sequence. A speech recognition model converts spoken words into text. A speaker identification model figures out who is talking at each moment. A translation model then converts the transcribed text into Chinese. All three models run on the Mac's built-in graphics chip rather than the CPU, which keeps performance fast while the machine handles other tasks. The tool also uses a two-pass approach to accuracy: it shows a rough real-time transcript as you speak, then quietly re-processes each completed sentence to produce a cleaner final version.

Setting up the tool requires a package manager called uv. On first run, it downloads the AI models automatically from Hugging Face, which totals roughly 3.5 gigabytes. Capturing system audio (meeting output rather than just the microphone) requires a small extra step: a helper program must be compiled from source, and macOS needs explicit permission granted in the Privacy settings under Screen and System Audio Recording. The README explains this permission step in detail because macOS sometimes grants it silently and incorrectly.

The README is written primarily in Chinese. The description above is based on the available content.