Add Z-Image Text-to-Image Generation Support#3261
Merged
ivarflakstad merged 9 commits intohuggingface:mainfrom Jan 2, 2026
Merged
Add Z-Image Text-to-Image Generation Support#3261ivarflakstad merged 9 commits intohuggingface:mainfrom
ivarflakstad merged 9 commits intohuggingface:mainfrom
Conversation
Open
|
awesome! stoked. |
Contributor
Author
Consistency TestI additionally used the online inference of modelscope and examples from Rust implementation to conduct consistency testing with the same prompt words and cfg. Almost identical images were generated, indicating that the current candle implementation is completely consistent with the original diffusers. |
Member
ivarflakstad
left a comment
There was a problem hiding this comment.
This is great! 🙌
I've verified the output on cuda and it looks great.
Most of my comments are nits or just that documentation is slightly off. Solid work.
Contributor
Author
|
Thank you for your review. I will repair the relevant content later. |
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
This PR introduces support for Z-Image, Alibaba's ~24B parameter text-to-image generation model using Flow Matching. The implementation follows Candle's architecture conventions and includes the full inference pipeline.
Model Overview
Z-Image is a state-of-the-art text-to-image model featuring:
Model Links:
🔧 Usage Examples
Basic Usage (CUDA)
cargo run --features cuda --example z_image --release -- \ --model-path weights/Z-Image-Turbo \ --prompt "A beautiful landscape with mountains and a lake" \ --width 1024 --height 768 \ --num-steps 8Using Metal (macOS)
cargo run --features metal --example z_image --release -- \ --model-path weights/Z-Image-Turbo \ --prompt "A futuristic city at night with neon lights" \ --width 1024 --height 1024 \ --num-steps 9Files Changed
New Files
candle-transformers/src/models/z_image/mod.rscandle-transformers/src/models/z_image/transformer.rscandle-transformers/src/models/z_image/text_encoder.rscandle-transformers/src/models/z_image/vae.rscandle-transformers/src/models/z_image/scheduler.rscandle-transformers/src/models/z_image/sampling.rscandle-transformers/src/models/z_image/preprocess.rscandle-examples/examples/z_image/main.rscandle-examples/examples/z_image/README.mdModified Files
candle-transformers/src/models/mod.rspub mod z_image;Implementation Highlights
1. Optimized Patchify/Unpatchify
The implementation uses optimized 6D tensor operations for the F=1 (single frame) case, avoiding Candle's 7D+ dimension limitations:
2. 3D RoPE Position Encoding
Implements 3D Rotary Position Embeddings with pre-computed sin/cos caches:
3. AdaLN Modulation with Tanh Gate
4. Dynamic Timestep Shifting
Image Size Requirements
Image dimensions must be divisible by 16:
Latent size formula:
latent = 2 × (image_size ÷ 16)📝 Testing Status
cargo check --features metalcargo clippy --workspace --tests --examples --benches -- -D warningscargo fmt --all -- --checkSample Output
Metal
Cuda
Checklist
cargo clippy --workspace --tests --examples --benches -- -D warningscargo fmt --all -- --checkReferences
Z-Image
Diffusers
Additional Fix: Clippy Warning in
candle-nnWhile implementing SDPA support for Z-Image, I discovered a minor clippy warning in
candle-nn/src/ops.rs:1040introduced by PR #3196. @EricLBuehlerIssue:
clippy::nonminimal_boolwarning