Steerable-music-transformer
Official PyTorch implementation of "Steerable Rhythmic Complexity in Autoregressive Music Generation" (EI Accepted). A bar-level conditional micro-Transformer using REMI+ syntax for exact density control and harmonic decoupling.
A small AI model that generates Bach-style choral music with precise bar-by-bar control over rhythmic density, without breaking the harmonic structure. Trained on 349 Bach pieces using a compact 4-layer Transformer.
This repository is the code for a research paper about generating music with precise control over how rhythmically busy each musical bar sounds. The problem the paper addresses is that when AI music systems are asked to produce denser rhythms, they often break the harmonic structure of the music at the same time. This project introduces a method to adjust one of those qualities independently of the other.
The approach works by training a small neural network on 349 four-part choral pieces by J.S. Bach. Before training, each bar of music in the dataset is labeled with a complexity tag on a scale from Level 1 to Level 10, indicating how densely packed with notes that bar is. The model learns to generate music according to whichever tag is supplied at the start of each bar, allowing you to dial the rhythmic density up or down on a bar-by-bar basis. The research uses a music encoding format called REMI+ to represent the notes as a sequence of tokens, similar to how text is represented for language models.
The model itself is described as a micro-Transformer: only four layers and eight attention heads, which is quite small compared to typical language models. The README argues that with well-prepared, high-purity training data you do not need a much larger model to achieve reliable, steerable output. The paper reports a Pearson correlation of 0.893 between the requested complexity level and the actual note density of what the model generates, and shows through a separate analysis that increasing rhythmic density does not measurably increase harmonic noise.
To reproduce the results, you build the training dataset from MIDI files with a provided script, train the model with another script, generate samples, and then produce evaluation charts. The MIDI source data comes from the open-source music21 library.
Where it fits
- Generate Bach-style choral music where you control how note-dense each bar sounds, from sparse to busy.
- Research how to steer AI music generation without accidentally distorting the harmonic quality.
- Reproduce the paper's experiments by building the dataset from MIDI files and training the model yourself.
- Explore how small, well-trained AI models can produce controllable musical output without needing massive compute.