guided-diffusion

Python ★ 7.4k updated 1y ago

OpenAI's 2021 research code for generating high-quality images using diffusion models with classifier guidance, includes pre-trained models at multiple resolutions and scripts to run or train your own.

PythonPyTorchCUDAsetup: hardcomplexity 5/5

This is the research code released by OpenAI alongside a 2021 paper showing that diffusion models could produce higher-quality images than GANs, which had been the dominant approach for AI image generation. Diffusion models work by learning to gradually remove noise from a random starting image until a realistic picture emerges. This repository added two important improvements over OpenAI's earlier diffusion work: classifier guidance and architecture refinements.

Classifier guidance is a technique where a separately trained image classifier is used during the image generation process to steer the output toward a specific category. For example, you can tell the model to generate an image that looks like a particular class of object, and a classifier running alongside the generator nudges each step in that direction. Adjusting a scale parameter controls how strongly the classifier influences the output, trading off diversity for accuracy to the target class.

The repository provides pre-trained model weights for several image sizes: 64x64, 128x128, 256x256, and 512x512. There are also upsampler models that take a low-resolution generated image and increase its resolution. Additionally, models trained on LSUN datasets (bedroom, cat, and horse scenes) are available for download.

To use the code, you download the model files and run Python scripts with command-line flags specifying the model architecture and sampling settings. The README includes exact commands for each pre-trained model. Training your own models is also supported, with scripts for both standard diffusion training and classifier training.

This repository is primarily a research artifact. It requires familiarity with Python and running scripts from a terminal. No graphical interface is provided.

Where it fits

Generate images at 64x64 to 512x512 resolution using pre-trained diffusion models with classifier guidance to steer output toward a target category.
Train your own diffusion model or image classifier on a custom dataset using the provided training scripts.
Upscale low-resolution generated images to higher resolution using the included upsampler models.

Open on GitHub → Full breakdown on explaingit →