HY-WU

Python ★ 294 updated 3mo ago

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

HY-WU is a research framework for AI-powered image editing using text instructions. The core capability it demonstrates is transferring visual attributes between images — for example, taking an outfit from one photo and placing it onto a person in a different photo, or transferring a face's identity while keeping the background and pose unchanged.

The technical problem it solves is a longstanding tradeoff in AI image generation: customizing a model to handle a specific input (like a particular person's outfit) normally requires fine-tuning the model on that input, which is slow and expensive. HY-WU instead generates lightweight adapter weights on the fly for each individual request, based on the input images and a text instruction. These adapters modify the image generation process without permanently altering the underlying model. This means you can process each image editing request independently without any training or optimization at inference time.

The framework is designed to scale to very large models. The main checkpoint requires multiple high-end GPUs due to the model's size. You would use this if you are an AI researcher studying image editing, personalization, or parameter generation, or if you want to run text-guided image editing that involves transferring visual elements between images. It is written in Python and was published by Tencent's Hunyuan research team.

Open on GitHub → Full breakdown on explaingit →