ComfyUI-BerniniStudio
ComfyUI node to assist with Bernini video editing prompting (Ollama + Prompt Presets)
A ComfyUI plugin that simplifies using ByteDance's Bernini video editing model by collapsing complex multi-node workflows into a single node, with optional AI-powered prompt generation from reference images using a local Ollama model.
ComfyUI is a visual tool for building AI image and video generation workflows by connecting nodes together, similar to wiring up a flowchart. This repository is an add-on for ComfyUI that simplifies working with Bernini, a video editing model released by ByteDance. Bernini can take an existing video and modify it based on text instructions, replace objects in a scene with reference images, generate entirely new videos from text descriptions, or animate a still image.
Normally, using Bernini inside ComfyUI requires wiring together five or more separate nodes to handle all the text encoding, video encoding, and image preparation steps. This plugin collapses all of that into a single node, reducing setup time and visual clutter in the workflow.
The plugin also includes a prompt enhancement panel powered by Ollama, which is a tool for running AI language models on your own computer. When you connect reference images, the panel can send those images to a locally running vision model to generate accurate descriptions, which are then folded into your video editing prompt automatically. This helps produce more consistent results when you want the edited video to match a specific person, object, or visual style.
The project supports several editing modes including video-to-video editing, reference-image-guided editing, text-to-video generation, and single-frame image editing. Each mode corresponds to how the Bernini model was trained to accept input.
The author describes this as a personal tool built with AI assistance and shared without any support or maintenance commitment. It requires a specific version of ComfyUI that includes Bernini model support, along with the Bernini model weights downloaded separately.
Where it fits
- Edit an existing video using text instructions inside ComfyUI without manually wiring together multiple nodes.
- Generate a new video from a text description using the Bernini model through a simple single-node interface.
- Automatically generate accurate video editing prompts by feeding reference images to a local Ollama vision model.
- Animate a still image into a short video using the Bernini model's image animation mode.