PhotoMaker
PhotoMaker [CVPR 2024]
An AI tool from Tencent that generates new photos of a real person in any scene or style from just a few reference photos, keeping their face consistent across images without any model training step.
PhotoMaker is a research project from Tencent that generates new photos of a specific real person by learning what they look like from a small set of reference images. You give it one or more photos of someone along with a text description of the scene you want, and it produces new images showing that person in the described setting or style. The person's facial identity stays consistent across the generated images without any lengthy training step, which is the central claim of the work. The paper was presented at CVPR 2024.
The system is built on top of Stable Diffusion XL, a popular open-source AI image generation model. PhotoMaker adds an adapter layer that encodes the identity from the reference photos and injects it into the generation process using what the authors call stacked ID embedding. Two versions exist: V1 for realistic-looking output, and V2 (supported by Tencent's HunyuanDiT team) with improved accuracy in preserving facial details. The tool can also be combined with other customization add-ons called LoRA modules, and it supports additional control tools like ControlNet and T2I-Adapter.
Running PhotoMaker requires a GPU with at least 11 GB of memory, Python 3.8 or higher, and PyTorch 2.0 or higher. Installation is done through pip, and the model weights download automatically from Hugging Face the first time you use it. The code integrates with the standard diffusers Python library, so developers already familiar with that workflow can add PhotoMaker as an adapter without rebuilding a pipeline from scratch.
Live demos are available on Hugging Face Spaces for both the realistic and stylization modes, and community members have built implementations for ComfyUI, Replicate, and Windows environments. The stylization mode produces illustrated or artistic renderings of the same person, achieved by swapping the base model and enabling LoRA modules for style.
Where it fits
- Generate realistic photos of a specific person in different settings using just a handful of reference snapshots.
- Create illustrated or artistic portraits of a real person by combining PhotoMaker with a style LoRA module.
- Build a custom portrait generation pipeline by adding PhotoMaker as an adapter on top of an existing diffusers workflow.
- Run experiments in ComfyUI or Replicate using community-built PhotoMaker nodes without writing Python code.