video-use

Python ★ 10.0k updated 1mo ago

Edit videos with coding agents

video-use is an open source tool that lets an AI coding agent edit raw video footage through a chat interface. You drop your video files into a folder, describe what you want in plain English, and the agent produces a finished edit as a single output file. It works with Claude Code and other agents that have shell access.

The tool handles several common post-production tasks automatically. It removes filler words like "um" and "uh" along with false starts and dead air between takes. It applies color grading to each segment, adds audio fades at every cut to prevent clicks, and burns in subtitles. It can also generate animated overlays using supported animation libraries, with each animation handled by a parallel sub-agent. After rendering, it runs a self-evaluation pass that checks every cut boundary for visual jumps or audio issues before showing you the result.

The AI never watches the video directly. Instead, it reads the video through two layers of structured data. The first is an audio transcript produced by ElevenLabs Scribe, which provides word-level timestamps, speaker identification, and audio event labels for every take. The second is an on-demand visual composite that shows a filmstrip, waveform, and word labels as a PNG image for any specific time range. This approach keeps the token cost low compared to feeding raw video frames to the model.

Setup requires an ElevenLabs API key, ffmpeg, and Python. The agent can handle the installation itself if you paste a provided setup prompt into your agent session. All output files are written to an edit subfolder next to your source footage, and the agent saves session notes to a project file so future sessions can continue from where the last one left off.

Open on GitHub → Full breakdown on explaingit →