Add Z-Image Text-to-Image Generation Support by SpenserCai · Pull Request #3261 · huggingface/candle

SpenserCai · 2025-12-24T03:26:14Z

Summary

This PR introduces support for Z-Image, Alibaba's ~24B parameter text-to-image generation model using Flow Matching. The implementation follows Candle's architecture conventions and includes the full inference pipeline.

Model Overview

Z-Image is a state-of-the-art text-to-image model featuring:

Transformer: 24B parameter DiT with 30 main layers + 2 noise refiner + 2 context refiner
Text Encoder: Qwen3-based encoder (outputs second-to-last hidden states)
VAE: AutoEncoderKL with diffusers format weights
Scheduler: FlowMatchEulerDiscreteScheduler with dynamic timestep shifting
Position Encoding: 3D RoPE (Frame/Height/Width axes)

Model Links:

🔧 Usage Examples

Basic Usage (CUDA)

cargo run --features cuda --example z_image --release -- \
    --model-path weights/Z-Image-Turbo \
    --prompt "A beautiful landscape with mountains and a lake" \
    --width 1024 --height 768 \
    --num-steps 8

Using Metal (macOS)

cargo run --features metal --example z_image --release -- \
    --model-path weights/Z-Image-Turbo \
    --prompt "A futuristic city at night with neon lights" \
    --width 1024 --height 1024 \
    --num-steps 9

Files Changed

New Files

File	Lines	Description
`candle-transformers/src/models/z_image/mod.rs`	34	Module exports
`candle-transformers/src/models/z_image/transformer.rs`	940	Core Transformer (Config, TimestepEmbedder, RopeEmbedder, ZImageAttention, ZImageTransformerBlock, FinalLayer, ZImageTransformer2DModel)
`candle-transformers/src/models/z_image/text_encoder.rs`	453	Qwen3-based Text Encoder
`candle-transformers/src/models/z_image/vae.rs`	684	AutoEncoderKL (diffusers format)
`candle-transformers/src/models/z_image/scheduler.rs`	237	FlowMatchEulerDiscreteScheduler
`candle-transformers/src/models/z_image/sampling.rs`	133	Sampling utilities (noise generation, shift calculation)
`candle-transformers/src/models/z_image/preprocess.rs`	169	Input preprocessing (image postprocessing)
`candle-examples/examples/z_image/main.rs`	393	Complete inference example
`candle-examples/examples/z_image/README.md`	128	Example documentation

Modified Files

File	Change
`candle-transformers/src/models/mod.rs`	Added `pub mod z_image;`

Implementation Highlights

1. Optimized Patchify/Unpatchify

The implementation uses optimized 6D tensor operations for the F=1 (single frame) case, avoiding Candle's 7D+ dimension limitations:

// Patchify: (B, C, 1, H, W) → (B, num_patches, patch_dim)
// Matches Python: permute(1, 3, 5, 2, 4, 6, 0)
let x = x.permute((0, 2, 4, 3, 5, 1))?;  // (B, H_t, W_t, pH, pW, C)

2. 3D RoPE Position Encoding

Implements 3D Rotary Position Embeddings with pre-computed sin/cos caches:

pub struct RopeEmbedder {
    axes_dims: Vec<usize>,  // [32, 48, 48] for Frame/H/W
    axes_lens: Vec<usize>,  // [1536, 512, 512] max positions
    cos_cached: Vec<Tensor>,
    sin_cached: Vec<Tensor>,
}

3. AdaLN Modulation with Tanh Gate

// Z-Image specific: tanh gate instead of sigmoid
let gate_msa = gate_msa.tanh()?;
let gate_mlp = gate_mlp.tanh()?;

4. Dynamic Timestep Shifting

pub fn calculate_shift(seq_len: usize, base_seq: usize, max_seq: usize, base_shift: f64, max_shift: f64) -> f64 {
    let m = (max_shift - base_shift) / (max_seq - base_seq) as f64;
    base_shift + m * (seq_len - base_seq) as f64
}

Image Size Requirements

Image dimensions must be divisible by 16:

✅ 1024×1024, 1024×768, 768×1024, 512×512, 1280×720
❌ 1920×1080 (1080 is not divisible by 16)

Latent size formula: latent = 2 × (image_size ÷ 16)

📝 Testing Status

Test	Status
`cargo check --features metal`	✅ Pass
`cargo clippy --workspace --tests --examples --benches -- -D warnings`	✅ Pass
`cargo fmt --all -- --check`	✅ Pass
Inference test (1024×768, Metal)	✅ Pass
Inference test (1024×1024, Metal)	✅ Pass

Sample Output

Metal

Cuda

Checklist

Code compiles without errors
Passes cargo clippy --workspace --tests --examples --benches -- -D warnings
Passes cargo fmt --all -- --check
Example runs successfully
README documentation added
Follows Candle architecture conventions
Weight mapping matches original implementation

References

Z-Image
Diffusers

Additional Fix: Clippy Warning in `candle-nn`

While implementing SDPA support for Z-Image, I discovered a minor clippy warning in candle-nn/src/ops.rs:1040 introduced by PR #3196. @EricLBuehler

Issue: clippy::nonminimal_bool warning

// Before
let supports_sdpa_full_mask = !self.mask.is_some() || q_seq <= k_seq;

// After
let supports_sdpa_full_mask = self.mask.is_none() || q_seq <= k_seq;

AlpineVibrations · 2025-12-25T01:22:27Z

awesome! stoked.

SpenserCai · 2025-12-26T10:08:34Z

Consistency Test

I additionally used the online inference of modelscope and examples from Rust implementation to conduct consistency testing with the same prompt words and cfg. Almost identical images were generated, indicating that the current candle implementation is completely consistent with the original diffusers.

ivarflakstad

This is great! 🙌
I've verified the output on cuda and it looks great.

Most of my comments are nits or just that documentation is slightly off. Solid work.

candle-examples/examples/z_image/main.rs

candle-examples/examples/z_image/README.md

candle-transformers/src/models/z_image/mod.rs

candle-transformers/src/models/z_image/text_encoder.rs

candle-transformers/src/models/z_image/transformer.rs

SpenserCai · 2026-01-01T03:44:37Z

Thank you for your review. I will repair the relevant content later.

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

ivarflakstad

lgtm! 👌

SpenserCai added 5 commits December 24, 2025 10:12

init z-image

01de593

fixed patchify, unpatchify and latent

e7071ae

update z_image examples readme

3f2781d

fixed clippy and rustfmt

7a40c96

fixed z_image example readme links

c5db88d

SpenserCai mentioned this pull request Dec 24, 2025

Model Wishlist #1177

Open

support sdpa and flash-attn in Z-Image and fixed sdpa clippy warning

c2e9336

ivarflakstad reviewed Dec 31, 2025

View reviewed changes

SpenserCai and others added 3 commits January 2, 2026 11:50

fix some readme

fb5bf25

Update candle-transformers/src/models/z_image/transformer.rs

ad22df8

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

support --model in example

04fb939

SpenserCai requested a review from ivarflakstad January 2, 2026 04:26

ivarflakstad approved these changes Jan 2, 2026

View reviewed changes

ivarflakstad merged commit 3a0d1cb into huggingface:main Jan 2, 2026
9 checks passed

SpenserCai deleted the z_image_support branch January 4, 2026 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Z-Image Text-to-Image Generation Support#3261

Add Z-Image Text-to-Image Generation Support#3261
ivarflakstad merged 9 commits intohuggingface:mainfrom
SpenserCai:z_image_support

SpenserCai commented Dec 24, 2025 •

edited

Loading

Uh oh!

AlpineVibrations commented Dec 25, 2025

Uh oh!

SpenserCai commented Dec 26, 2025

Uh oh!

ivarflakstad left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SpenserCai commented Jan 1, 2026

Uh oh!

ivarflakstad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SpenserCai commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Model Overview

🔧 Usage Examples

Basic Usage (CUDA)

Using Metal (macOS)

Files Changed

New Files

Modified Files

Implementation Highlights

1. Optimized Patchify/Unpatchify

2. 3D RoPE Position Encoding

3. AdaLN Modulation with Tanh Gate

4. Dynamic Timestep Shifting

Image Size Requirements

📝 Testing Status

Sample Output

Metal

Cuda

Checklist

References

Additional Fix: Clippy Warning in candle-nn

Uh oh!

AlpineVibrations commented Dec 25, 2025

Uh oh!

SpenserCai commented Dec 26, 2025

Consistency Test

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SpenserCai commented Jan 1, 2026

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SpenserCai commented Dec 24, 2025 •

edited

Loading

Additional Fix: Clippy Warning in `candle-nn`