SmartDirector
A research framework for AI video generation that takes multiple keyframes placed at specific points in time and produces cinematic clips that respect both the visual appearance and narrative intent of each keyframe.
SmartDirector is a research project from teams at the Chinese Academy of Sciences, Youku Moku-Lab, and Huazhong University of Science and Technology. It addresses a specific problem in AI video generation: most existing tools take a text description or a start and end frame, but give you little control over how the story inside the video develops or how the timing of scenes feels. SmartDirector aims to fix that by letting you specify multiple keyframes, images placed at particular points along the video, which the system then uses to build a coherent cinematic clip that respects the visual and narrative intent of each keyframe.
The framework works in two stages. The first stage, called Director-Gen, produces a lower-resolution video that is conditioned on all the keyframes you provide. The second stage, called Director-SR, takes that rough output and sharpens it using the high-resolution versions of the keyframes as reference anchors, recovering fine visual detail in the final result. The system can handle a single continuous shot, a multi-shot sequence where scenes change, or extensions of an existing video clip.
To train the models, the team built a data pipeline that extracts single-shot and multi-shot sequences from movies, covering both tightly framed scenes and longer narrative arcs. According to the paper, SmartDirector outperforms comparable methods in experiments.
As of the project page launch in May 2026, the code and dataset are not yet publicly available. The README states they are pending a corporate compliance and security review before release. The repository currently serves as a paper announcement and citation reference. A preprint is available on arXiv under the identifier 2605.27891.
Where it fits
- Generate a multi-shot video sequence where each scene transition is anchored to a specific keyframe you provide at a chosen time position.
- Extend an existing video clip using SmartDirector's keyframe-conditioned generation to continue the narrative beyond the original ending.
- Apply the Director-SR stage to sharpen a rough low-resolution video draft using the original high-resolution keyframes as detail-recovery anchors.
- Reproduce the SmartDirector evaluation from the paper on a custom movie sequence using the provided data extraction pipeline.