gitmyhub

jacky-motion

HTML ★ 93 updated 1mo ago

Chinese voiceover script to 16:9 motion HTML skill for creator workflows

An AI agent skill that turns a Chinese voiceover script into a single animated HTML file formatted for 16:9 video recording, following a six-step workflow from script review through shot planning, style lock-in, HTML generation, and optional TTS audio injection.

HTMLsetup: easycomplexity 2/5

This repository is an agent skill that takes a Chinese voiceover script and turns it into an animated HTML file formatted for 16:9 screens. The intended users are content creators making short-form educational or opinion videos, particularly those covering AI tools, business analysis, or research topics.

The skill follows a six-step workflow: it first reviews the script for information density and readability, then breaks it into timed shot beats, locks down a single visual style, generates the HTML, verifies the output, and prepares for voiceover recording. Each shot beat must include a core piece of information, a visual action (such as highlighting, connecting, or expanding), and short on-screen text. These rules are enforced by the skill during review, not left to the creator to invent.

The output is a single HTML file you open in a browser. You click through the slides manually and record your screen to capture the animation. If you want the slides to advance automatically alongside narration, the skill supports a text-to-speech flow that injects audio and enables auto-play mode. The final video is produced by recording the browser full-screen with a tool like QuickTime or OBS.

Four built-in visual styles cover most common video types: a minimal presentation style suited to AI and tech topics, an editorial magazine style for research and methodology content, a finance broadcast style for business and industry analysis, and a newspaper evidence style for news events and investigative reporting.

The design philosophy is that layout and animation should serve the clarity of the information, not the other way around. Each visual choice is meant to support what the voiceover is saying, not to add decoration for its own sake. The skill is written and documented primarily in Chinese. It is licensed under MIT.

Where it fits