gitmyhub

StreamDiffusion

Python ★ 11k updated 1y ago

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

A Python library that makes AI image generation real-time, producing over 100 frames per second on a consumer GPU by treating generation as a continuous stream rather than one-off requests.

PythonPyTorchCUDATensorRTDockerStable Diffusionsetup: hardcomplexity 4/5

StreamDiffusion is a Python library that makes AI image generation fast enough to work in real time. Standard diffusion-based image generators (the kind behind tools like Stable Diffusion) take a fraction of a second to a few seconds per image, which is too slow for interactive applications. StreamDiffusion restructures the generation process so images can be produced at dozens of frames per second on a consumer GPU.

The core idea is to treat image generation as a continuous stream rather than a series of one-off requests. Several technical approaches contribute to the speed: batching frames together, reusing intermediate computation across frames, and applying filters to skip redundant work when consecutive frames are similar. On an Nvidia RTX 4090, the library can generate over 100 frames per second from a text prompt and around 94 frames per second when transforming an input image.

Two interactive demos come with the project. One lets you type a text description and watch the AI generate matching images in real time as you type or tweak the prompt. The other uses a live webcam feed or screen capture and continuously applies an AI style or transformation to whatever the camera sees, updating visually as you move.

Installation requires a CUDA-capable Nvidia GPU, Python 3.10, PyTorch, and an optional TensorRT plugin for maximum speed. Docker support is also included. The library wraps around the Hugging Face Diffusers ecosystem, so any model that works with the standard Stable Diffusion pipeline can be plugged in.

The project comes from a research team and is accompanied by a paper on arXiv. It is designed for developers building interactive creative tools, live video effects, or any application where real-time image generation is needed.

Where it fits