FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool. LLM-based AI clipping integrated.
FunClip is a free, locally-run tool that automatically transcribes speech in a video file and then lets you cut out specific clips based on the transcript. The idea is simple: instead of manually scrubbing through a video to find a specific moment, you search the spoken words, select the sentences you want, and the tool cuts out that segment of the video for you.
The transcription step uses speech recognition models developed by Alibaba's research lab, which are particularly strong at Chinese audio. The tool also does speaker identification, so if a video has multiple people talking, you can ask for all the moments where a specific person spoke and clip those out together. Custom vocabulary can be added to improve recognition accuracy for names, brand terms, or technical jargon.
A newer feature connects the tool to large language models such as Alibaba's Qwen or OpenAI's GPT. After the transcript is generated, you can prompt one of these models to identify which parts of the video are most interesting or relevant to a topic, and FunClip will then clip those sections automatically. The idea is to let AI decide what to cut rather than reading through the full transcript yourself.
The interface runs as a local web page powered by Gradio, a library that turns Python scripts into browser-based forms. You upload a video, wait for transcription, copy the text you want, and click a button to get the clip. The tool can also burn subtitles into the clipped video, and it exports SRT subtitle files for the full video and for each selected segment.
Installation requires Python and a few packages. Optional features for embedded subtitles also need FFmpeg and ImageMagick. The tool can be used through the browser interface, on the command line, or tried online without any installation via hosted versions on Modelscope and HuggingFace.