openclaw-skill-videotranslate
An OpenClaw skill for video translation and auto-dubbing. Translates English subtitles to Chinese, generates TTS audio with automatic duration alignment, and muxes them into lossless multi-track videos. Features a 3D adaptive scheduler to prevent LLM/TTS API rate limits.
An OpenClaw plugin that translates video subtitles and optionally dubs the audio track into a target language, with a smart scheduler that handles API rate limits automatically for long videos.
This is a plugin for OpenClaw, an AI workflow framework, that translates video subtitles and optionally replaces the audio track with a dubbed version in the target language. It is written in Python and requires ffmpeg to be installed on your system.
The plugin has two modes. In subtitle-only mode, it translates the subtitle file from the source language to the target language, embeds both the original and translated subtitle tracks in the output video, and writes a standalone subtitle file. In the default dubbing mode, it also generates spoken audio in the target language by calling a text-to-speech service, stretches or compresses each audio clip so it fits within the timing of the original subtitle, and packages everything into a single video file with two audio tracks and two subtitle tracks, defaulting to the translated versions.
Because translating and generating speech for a long video involves sending many requests to AI and speech APIs, the plugin includes a scheduling system it calls the 3D Adaptive Scheduler. This system monitors three things at once: how many subtitle lines to group into each request, how much text to include per request in terms of token or character count, and how many requests to send at the same time. When an API responds with a rate limit error, the scheduler backs off and retries with smaller batches. This avoids failing the whole job because of temporary API limits.
You configure the plugin through a manifest file where you specify the video file, the source and target languages, which translation service and text-to-speech service to use, and the API credentials for those services. Both LLM-based endpoints and conventional web API endpoints are supported for translation and speech synthesis.
The video output is a lossless MKV container that preserves the original video quality. Tests use pytest and a property-based testing library called hypothesis.
Where it fits
- Translate a video's subtitle file from one language to another and embed both the original and translated tracks in the output video.
- Add a dubbed audio track to a video by generating timed speech from the translated subtitles, packaged with the original audio as a second track.
- Process long videos through translation and text-to-speech APIs without job failures caused by rate limits, using the built-in adaptive scheduler.