auto-improve
GAN-style self-improvement loop for any text artifact: mutate, grade with a SEPARATE model, keep only verified wins (pairwise-judged), revert the rest. The git history is the improvement log.
auto-improve is a Python tool that automatically rewrites any text file in a loop, keeping only edits that pass a two-round AI quality test. A separate AI judges each change to avoid self-flattery bias, and every accepted edit is saved as a git commit.
auto-improve is a Python tool that automatically rewrites and refines any text file, stopping after each edit only if the new version is genuinely better than the previous one. Point it at a file, and it runs a loop: generate a set of proposed edits, score each one against a quality rubric, run a head-to-head comparison between the best candidate and the current version, and keep the change only if it clearly wins. Edits that do not pass that test are discarded and the file reverts to what it was before.
The quality check is done by a separate AI model from the one writing the edits. This matters because a model asked to improve something will almost always describe its own output as an improvement, whether or not it actually is. By separating the writer from the judge, the tool avoids that trap. The comparison step goes further: it runs the same head-to-head twice with the two versions in different positions, and only keeps the change if it wins both times. This removes a known bias where AI judges tend to favor whichever option appears first.
You can give the tool a rubric, which is a markdown file listing the qualities you care about and how much weight each gets. If you do not have one, the tool will infer a rubric from the file itself given a one-line description of the goal. The rubric is what gets optimized, so the more precise it is, the better the results.
Every accepted change is committed to a git branch, so the improvement history is visible as a series of diffs rather than a single rewrite. This also means the tool requires the file to be inside a git repository to work.
The tool works on emails, landing pages, blog posts, prompts, configuration files, API designs, or any other text. It uses Google Gemini by default, requires Python 3.9 or later, and has one external dependency beyond the standard library.
Where it fits
- Automatically improve a landing page or blog post draft until the writing stops getting better.
- Refine AI prompts iteratively so each version is meaningfully stronger than the last.
- Polish emails or config files using a custom quality rubric that weights what matters to you.
- Review improvement history as clean git diffs instead of one opaque rewrite.