gitmyhub

clone-voice

Python ★ 9.0k updated 9mo ago ▣ archived

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频

A browser-based voice cloning tool that uses a short audio sample to generate new speech in any voice, type text or re-dub an existing audio clip in the cloned voice.

Pythonxtts_v2Hugging FaceCUDAsetup: hardcomplexity 3/5

Clone Voice is a voice cloning tool with a browser-based interface that lets you take a short audio recording of any person's voice and use it to generate new speech. You can either type text and have it spoken in the cloned voice, or take an existing audio clip and re-produce it in that voice. The README is written primarily in Chinese, with an English version linked separately.

The tool is built on a speech synthesis model called xtts_v2, developed by coqui.ai, which is licensed for personal learning and research only, not for commercial use. It supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. The README notes that English output quality is good and Chinese quality is acceptable.

For Windows users, a precompiled version is available as a downloadable package. You double-click an executable file, wait for a web page to open automatically, and then use the interface by clicking through the options. The model files, which are roughly 3 gigabytes, need to be downloaded and placed in a specific folder. No coding is required for the precompiled path.

For users on Linux or macOS, or those who want to run from source, the process involves Python 3.9 through 3.11, setting up a virtual environment, installing dependencies, and downloading the model files from Hugging Face, which requires a working proxy connection for users in China since those services are blocked there. The README includes detailed troubleshooting notes for proxy-related failures, which it identifies as the most common source of errors. If the machine has an Nvidia GPU, CUDA acceleration can be enabled for faster processing.

The same developer also maintains related tools for video translation with dubbing, speech-to-text transcription, and vocal separation from background music.

Where it fits