pix2pix
Image-to-image translation with conditional adversarial nets
A 2017 research implementation of image-to-image translation using conditional GANs, train a model to convert images from one visual style into another, like maps to satellite photos or sketches to shoes.
Pix2pix is a research implementation of image-to-image translation, a technique that trains a computer to convert images from one visual style into another. You give it pairs of matched images, for example a map on one side and the corresponding satellite photo on the other, and the model learns to generate the second type of image from the first. The paper describing this work was published at CVPR 2017.
The approach uses a type of AI architecture called a conditional generative adversarial network. Two neural networks train against each other: one generates images, the other tries to detect whether the generated images look realistic. Over many training steps, the generator improves until its output is difficult to distinguish from real images in the target style.
The repository includes several example datasets you can download and train on directly. These include building facade labels mapped to facade photos, city street annotations mapped to street scene photographs, pencil-edge sketches mapped to shoe or handbag photos, and daytime outdoor scenes mapped to their nighttime equivalents. Pre-trained model weights for these pairs are also available so you can test the results without training from scratch.
This version is written in Lua using the Torch deep learning framework and requires an NVIDIA GPU with CUDA to train at a reasonable speed. The README notes that a newer and more actively maintained Python implementation exists in a companion repository for anyone who prefers that setup.
To use it, you install Torch and two packages, download a dataset, run the training script pointing it at your data folder, and then run the test script to generate translated images. The output is saved as image files and an HTML page for viewing results. Training a basic example like the facades dataset takes roughly two hours on a capable GPU.
Where it fits
- Train a model to convert architectural sketch labels into realistic building facade photos using the included facades dataset.
- Generate nighttime scenes from daytime outdoor photographs by training on matched day/night image pairs.
- Turn pencil edge sketches of shoes or handbags into photorealistic product images.
- Test pre-trained pix2pix models on included example datasets without writing any training code.