ACE-Step-1.5
The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
ACE-Step 1.5 is a local AI model that generates complete music tracks from text descriptions, running on your own computer's graphics card and producing results in under 10 seconds on a gaming GPU.
ACE-Step 1.5 is a locally-running music generation model that turns text descriptions and optional reference audio into complete music tracks. You describe what you want in plain language, and the model produces an audio file. It is designed to run on a personal computer rather than in the cloud, with a minimum of around 4GB of graphics card memory required and better results with more.
The system uses two parts working together. A language model acts as a planner: it reads your description and expands it into a detailed blueprint covering song structure, lyrics, tempo, key, and style. Then a second component called a Diffusion Transformer takes that blueprint and generates the actual audio. This two-stage approach means you can give a simple description like a genre and mood and get a full structured song in return, or you can control the details precisely with lyrics and metadata.
Generation is fast for an open-source model. A full song takes under two seconds on high-end server hardware and under ten seconds on a consumer gaming GPU. Tracks can range from ten seconds to ten minutes long. The model supports lyrics in over 50 languages.
Beyond basic generation, the README describes a range of editing features. You can generate a cover version of an existing song, edit specific sections of an audio file, add layers to a track, or automatically create background music for a vocal recording. A LoRA training feature lets you fine-tune the model's style on a small collection of your own songs, which the README says takes about an hour on a mid-range GPU.
Installation uses a package manager called uv. After cloning the repository and running a sync command, a web interface opens at a local address. Portable pre-built packages for Windows and macOS are also available. A free online demo exists at the project's website for those who do not want to install anything locally.
Where it fits
- Generate an original music track in any genre and mood by typing a plain-language description.
- Create a cover version of an existing song by providing the original audio as a reference input.
- Fine-tune the model on a small collection of your own songs to generate music that matches your personal style.
- Automatically generate background music for a vocal recording you already have.