gitmyhub

FramePack

Python ★ 17k updated 8mo ago

Lets make video diffusion practical!

Desktop software that turns a still image and a text prompt into a video by generating frames one chunk at a time, using a fixed-memory technique that lets a 6 GB laptop GPU produce 60-second clips at 30 fps.

PythonPyTorchCUDAGradiosetup: hardcomplexity 4/5

FramePack is the official implementation of a research project on video generation by neural networks — specifically, a technique called "Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models." In plain terms, it is software that turns a still image and a text prompt into a video, generating the video one chunk of frames at a time. The headline promise on the README is "video diffusion, but feels like image diffusion."

The technical idea is that FramePack compresses the past frames it has already generated into a fixed-size context, so the work needed to predict the next frame stays the same no matter how long the video gets. That means a single GPU can keep generating frames for a minute-long clip without running out of memory, and the project ships as functional desktop software with its own sampling system and memory management. The repository also documents follow-up versions FramePack-F1 and the upcoming FramePack-P1, which adds "Planned Anti-Drifting" and "History Discretization" to keep long generations from drifting away from the prompt.

Anyone curious about generating short videos from a still image and prompt is the audience, particularly people who do not have access to a data-center GPU — the README states a 6 GB Nvidia laptop GPU is enough to generate a 60-second video at 30 frames per second using a 13-billion-parameter model. The software runs on Linux or Windows with an RTX 30, 40, or 50 series card, offers a one-click Windows package, and on Linux installs through pip on Python 3.10 with PyTorch. It exposes a Gradio web GUI. The full README is longer than what was provided.

Where it fits