AudioBookKJ-v2.1
AudioBook KJ is an experimental local AI studio for turning written scripts into narrated audio and video projects. It is not a polished product; the repository is described as a public source snapshot intended for people who want to study the architecture, borrow ideas, or experiment with the workflow.
The application has a React frontend and a Python backend. The frontend manages a timeline view where you arrange script lines, audio clips, music, and visual assets. The backend handles text-to-speech generation using local AI models built on Torch, Transformers, and a tool called OmniVoice, which can use an NVIDIA GPU for faster generation. Audio mixing uses pydub and FFmpeg. A Chrome extension called FlowKit acts as a bridge between the local backend and Google Flow, a browser-based AI workflow tool from Google Labs, for tasks like script generation and image creation.
The general workflow runs in seven stages: import and clean a script, extract character references and scene metadata using AI helpers, convert script lines to speech audio clips, arrange and mix the audio timeline, manage visual assets tied to scenes, optionally use the FlowKit extension to pull in browser-based AI outputs, and finally export the combined result.
On Windows, a run.bat launcher script handles most of the setup: it checks for Git, Node.js, Python, and FFmpeg, offers to install missing tools using Windows Package Manager, installs dependencies, and opens the app in the browser. On other platforms, more manual setup is needed.
Hardware requirements are substantial for local AI generation. The README recommends at least 16 to 32 GB of RAM, an SSD with 20 to 30 GB free, and an NVIDIA GPU with 6 to 8 GB of VRAM. CPU-only generation works but is slower. First launch can take a long time because the system downloads model weights.
Private voice reference files and generated media are intentionally excluded from the public repository. The code may need adjustment before it runs on a different machine.