pix2pixHD

Python ★ 6.9k updated 1y ago

Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD is a research project from NVIDIA that turns simple label maps into photorealistic images at high resolution, up to 2048 by 1024 pixels. A label map is a diagram where each region is painted a flat color corresponding to a category, such as sky, road, car, or building. The model learns to translate those color-coded regions into convincing photographs that look like real street scenes or real faces.

The technique is based on a type of AI called a generative adversarial network, or GAN, which was published in a research paper at the CVPR 2018 conference. The repository contains the code that accompanied that paper, along with pre-trained model weights for city street images so you can test it without training from scratch.

There are two main use cases demonstrated. The first converts maps of city street layouts into photo-realistic images that look like actual urban scenes, using the Cityscapes dataset. The second converts maps of face geometry into photo-realistic portraits, and includes an interactive editing interface that lets you swap individual features like hair or eye color while keeping the rest of the face consistent.

Running the pre-trained model requires a machine running Linux or macOS with an NVIDIA GPU that has at least 11 GB of video memory. Training a new model at the highest resolution requires a GPU with 24 GB of memory, though lower-resolution training works with 12 GB. The code also supports training on your own images if you can provide paired label-and-photo datasets.

This repository is a research release and has not been updated since the paper was published. It uses Python and PyTorch and is intended for researchers exploring image synthesis techniques rather than for production deployment.

Open on GitHub → Full breakdown on explaingit →