mrcarlsama-social-transcriber-skill

Python ★ 21 updated 25d ago

把单条抖音/小红书链接整理成本地视频、音频、字幕和逐字稿的 Codex / Claude Code Skill。

A skill for AI coding agents that downloads a single Douyin or Xiaohongshu video, extracts the audio, transcribes speech locally, and saves a subtitle file plus metadata with no global installs required.

Pythonfaster-whisperyt-dlpPlaywrightffmpegsetup: easycomplexity 2/5

This is a skill for Codex and Claude Code that processes a single Douyin (the Chinese TikTok) or Xiaohongshu (RedNote) video link and saves it as local files. Given a video link, the skill downloads the video, extracts the audio, runs speech-to-text transcription locally, and produces a subtitle file plus a word-level timestamped transcript. It also saves platform metadata such as the title, description, author name, publication time, and engagement counts where the platform returns them.

The skill is not a bulk scraper. It handles one link at a time and does not support downloading a creator's full channel, searching for content, or accessing private or paid content.

Setup requires only the uv Python package manager to be installed first. All other dependencies, including yt-dlp for downloading, faster-whisper for transcription, imageio-ffmpeg for audio extraction, and Playwright for generating visitor cookies when needed, are declared inside the scripts themselves and installed automatically by uv on first run. No global Python, ffmpeg, or other tools need to be pre-installed. No GPU is required, though transcribing long videos on CPU will be slow.

For cookie handling, the skill first attempts a bare download. If the platform requires cookies, it generates a temporary visitor-state cookie by loading the public page in an isolated browser context. It does not read your browser's saved logins, and the temporary cookie file is deleted after the task. You can also supply your own cookie file explicitly as a fallback.

Output files land in a timestamped folder under outputs/. A video link produces the downloaded video, the extracted audio, the raw ASR transcript, a polished transcript generated by the AI assistant, a .srt subtitle file, and a metadata folder with manifest, report, and word-timing files. Xiaohongshu image posts are also supported and produce only the images and text description without audio processing.

Where it fits

Download a Douyin video and get a local subtitle file plus word-timed transcript for translation or repurposing.
Transcribe a Xiaohongshu video to text without installing Python, ffmpeg, or Whisper globally.
Pull metadata such as title, author, and engagement counts from a Chinese social media video to analyze content.

Open on GitHub → Full breakdown on explaingit →