Z-Image

Python ★ 12k updated 4mo ago

Z-Image is an open-source family of 6-billion-parameter AI image generation models that creates images from text prompts, including a fast turbo variant, a flexible base for fine-tuning, and an image editing model.

PythonHugging FaceModelScopesetup: hardcomplexity 4/5

Z-Image is an AI image generation model family developed by Tongyi-MAI and released in late 2025. Built on a 6-billion-parameter architecture, the family generates images from text descriptions and is available as open-source checkpoints on Hugging Face and ModelScope.

The family includes four variants. Z-Image is the foundation model, focused on high visual quality, aesthetic range, and variety across artistic styles, identities, poses, and compositions. It supports negative prompting and is designed to be straightforward to fine-tune for custom applications. Z-Image-Turbo is a faster, distilled version that generates images in under a second on enterprise hardware and fits within 16GB of GPU memory, making it usable on consumer-grade graphics cards. It is optimized for photorealistic output, text rendering in both English and Chinese, and close adherence to written instructions. In a December 2025 benchmark, Z-Image-Turbo ranked as the top open-source model on a third-party text-to-image leaderboard.

Z-Image-Omni-Base is a general checkpoint capable of both generating and editing images. It is intended as the most flexible starting point for researchers and developers who want to build custom fine-tuned variants from scratch. Z-Image-Edit is a version specifically fine-tuned for editing existing images based on natural-language instructions.

Model weights for Z-Image and Z-Image-Turbo are available on Hugging Face and ModelScope, with live demo spaces provided for both. The Omni-Base and Edit variants were listed as coming soon at the time of the README. A technical report covering the architecture and training process is available on arXiv.

Where it fits

Generate photorealistic images from text prompts using Z-Image-Turbo on a consumer GPU with 16GB of VRAM
Edit existing images using natural-language instructions with the Z-Image-Edit variant
Fine-tune the Z-Image-Omni-Base checkpoint to build a custom image generation model for a specific visual style or domain
Benchmark Z-Image-Turbo against other open-source text-to-image models using the live demo on Hugging Face Spaces

Open on GitHub → Full breakdown on explaingit →