gitmyhub

MeloTTS

Python ★ 7.5k updated 1y ago

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

A Python text-to-speech library that converts written text into natural-sounding speech in 7 languages with regional accent options, designed to run fast enough for real-time use on an ordinary CPU without needing a GPU.

PythonPyTorchHuggingFaceVITSsetup: moderatecomplexity 2/5

MeloTTS is a text-to-speech library that converts written text into spoken audio. It was built by researchers at MIT and the company MyShell.ai, and it supports multiple languages and regional accents. Supported languages include English (with American, British, Indian, and Australian accent options), Spanish, French, Chinese, Japanese, and Korean. The Chinese model has a special feature: it can handle sentences that mix Chinese and English words in the same utterance.

The library is designed to run fast enough for real-time use on a standard CPU, meaning you do not need expensive graphics hardware to generate speech with it. This makes it practical for developers building applications on ordinary machines or cloud servers without GPU resources.

Users have three main ways to get started: trying it without any installation via a hosted option, installing it locally and using it through a Python API or command line, or training the system on a custom dataset to produce a different voice style. Pre-trained model files are hosted on HuggingFace, a common platform for sharing AI models. A web-based interface is also available for testing speech output interactively.

The library is published under the MIT license, which means it is free to use in both personal projects and commercial products. The voice synthesis technology is built on top of earlier research systems called VITS and VITS2. The project was created by a small team from Tsinghua University and MIT, with community contributions adding the web and command-line interfaces. The README is short and focused on getting users started quickly rather than explaining the technical details of how the models work.

Where it fits