live-music-diffusion-models

Python ★ 46 updated 28d ago

A research codebase for generating continuous music in real time using AI diffusion models, producing audio block by block so it can stream indefinitely for live performances.

PythonPyTorchStable AudioJupytersetup: hardcomplexity 5/5

This is a research codebase from a 2026 academic paper about generating music in real time using AI. The system is called Live Music Diffusion Models, or LMDMs. The core idea is that instead of generating an entire piece of music all at once, the model produces audio in short blocks, one after another, using a sliding window of recent audio as context. This allows the model to keep generating music continuously as long as needed, which is useful for live performance settings.

The approach builds on top of existing text-to-audio diffusion technology, specifically from a project called Stable Audio. The researchers fine-tuned those base models and added a technique called ARC-forcing, which is a training method for making autoregressive (block-by-block) generation more consistent and stable. The repository provides four model configurations, covering two different ways of handling how the model attends to context: one that sees context in both directions and one that only looks backward, matching what you would need for true streaming output.

Training happens in two steps. First you fine-tune a pre-existing music diffusion model on your own audio data. Then you apply ARC-forcing on top of that fine-tuned model to prepare it for block-by-block streaming generation. The README notes that the memory requirements grow with how far ahead the model is allowed to roll out, so hardware planning matters.

Inference is handled through a function that denoises one block at a time and optionally reuses cached computations for faster streaming. A Jupyter notebook in the repository walks through a complete example from loading a checkpoint to producing audio output.

This is a public-facing code release. The researchers note that the internal development code used during the project is available on request.

Where it fits

Fine-tune a music diffusion model on your own audio recordings for personalized real-time generation
Run a streaming music session for a live performance using block-by-block AI generation
Experiment with ARC-forcing to study how autoregressive training improves consistency in streaming audio models

Open on GitHub → Full breakdown on explaingit →