instruct-pix2pix

Python ★ 6.9k updated 2y ago

An AI image editor from UC Berkeley that applies plain-English instructions to photos, say 'add snow' or 'turn him into a cyborg' and it edits the image using Stable Diffusion.

PythonPyTorchStable DiffusionGradiosetup: hardcomplexity 4/5

InstructPix2Pix is a research project from UC Berkeley that lets you edit images by describing the change you want in plain English. You provide an input image and a text instruction like "turn him into a cyborg" or "add snow," and the model produces a new version of the image with that edit applied. It was published as an academic paper and this repository contains the code to run it and the data used to train it.

The model is built on top of Stable Diffusion, a popular open-source image generation model. Fine-tuning Stable Diffusion on paired image examples, before and after an edit, taught the model to follow editing instructions while preserving the content of the original image that should remain unchanged.

Running the model requires a GPU with more than 18 gigabytes of memory. You can edit a single image from the command line by passing in the image file and your instruction as text. There is also an interactive web application powered by Gradio that lets you upload images and type instructions in a browser interface. Parameters like the number of diffusion steps and guidance strength can be adjusted to tune the quality and faithfulness of the result.

The training dataset consists of around 454,000 examples, each containing an original image, an editing instruction, and the edited result. The dataset was built in two stages: first, GPT-3 was fine-tuned to generate captions and matching edit instructions, and then Stable Diffusion combined with a technique called Prompt-to-Prompt converted those paired text captions into paired images. Two versions of the dataset are available for download: a full random-sample version and a higher-quality filtered version selected using CLIP scoring.

Where it fits

Edit photos with plain-English commands like 'make it look vintage' or 'add a sunset' without manual masking.
Run an interactive browser-based image editor using the included Gradio web app.
Download the 454,000-example training dataset of paired before-and-after edits to fine-tune your own model.

Open on GitHub → Full breakdown on explaingit →