[Draft] Feat. Z image npu support #13041

zhangtao0408 · 2026-01-28T07:18:55Z

What does this PR do?

Support Z-Image on Ascend NPU.

Test Codes

import torch
import torch_npu
from diffusers import ZImagePipeline

# Load the pipeline
pipe = ZImagePipeline.from_pretrained(
    "/tmp/weights/Z-Image",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("npu")
pipe.transformer.set_attention_backend("_native_npu")

# Generate image
prompt = "两名年轻亚裔女性紧密站在一起，背景为朴素的灰色纹理墙面，可能是室内地毯地面。左侧女性留着长卷发，身穿藏青色毛衣，左袖有奶油色褶皱装饰，内搭白色立领衬衫，下身白色裤子；佩戴小巧金色耳钉，双臂交叉于背后。右侧女性留直肩长发，身穿奶油色卫衣，胸前印有“Tun the tables”字样，下方为“New ideas”，搭配白色裤子；佩戴银色小环耳环，双臂交叉于胸前。两人均面带微笑直视镜头。照片，自然光照明，柔和阴影，以藏青、奶油白为主的中性色调，休闲时尚摄影，中等景深，面部和上半身对焦清晰，姿态放松，表情友好，室内环境，地毯地面，纯色背景。"
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1280,
    width=720,
    cfg_normalization=False,
    num_inference_steps=50,
    guidance_scale=4,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]

image.save("example.png")

Results

After PR #12979 #13017 , but before this pr

Attention backends are an experimental feature and the API may be subject to change.
  0%|                                                                                                                                                                    | 0/29 [00:00<?, ?it/s][W128 06:28:14.131704648 compiler_depend.ts:335] Warning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (function operator())
[E128 06:28:14.459774964 compiler_depend.ts:444] operator():build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 NPU function error: call aclnnFlashAttentionScore failed, error code is 561103
[ERROR] 2026-01-28-06:28:14 (PID:24925, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
EZ9999: Inner Error!
EZ9999[PID: 24925] 2026-01-28-06:28:14.989.206 (EZ9999):  get unsupported atten_mask shape, the shape is [2, 1, 1, 192]. B=[2], N=[30], Sq=[192], Skv=[192], supported atten_mask shape can be [B, N, Sq, Skv], [B, 1, Sq, Skv], [1, 1, Sq, Skv] and [Sq, Skv].[FUNC:AnalyzeOptionalInput][FILE:flash_attention_score_tiling_general.cpp][LINE:1598]
        TraceBack (most recent call last):
       fail to analyze context info.[FUNC:GetShapeAttrsInfo][FILE:flash_attention_score_tiling_general.cpp][LINE:866]
       Tiling failed
       Tiling Failed.
       Kernel Run failed. opType: 27, FlashAttentionScore
       launch failed for FlashAttentionScore, errno:561103.

Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0xffff85d948c0 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x68 (0xffff85d3c140 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x111d6b4 (0xfffddb50d6b4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0x29f0894 (0xfffddcde0894 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x9cc700 (0xfffddadbc700 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x9cd2dc (0xfffddadbd2dc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x9cb1f8 (0xfffddadbb1f8 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0xd29cc (0xffff85ba29cc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x80398 (0xffff91bf0398 in /lib/aarch64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0xe9e9c (0xffff91c59e9c in /lib/aarch64-linux-gnu/libc.so.6)

After this PR

> python3 test_zimage.py 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 69.52it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.76it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  5.62it/s]
Attention backends are an experimental feature and the API may be subject to change.
  0%|                                                                                                                                                                    | 0/50 [00:00<?, ?it/s][W128 07:21:39.857273128 compiler_depend.ts:335] Warning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (function operator())
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:26<00:00,  1.86it/s]

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

…n and refining mask checks.

…alidation and conversion logic.

…e performance and validation.

David Mo and others added 17 commits January 15, 2026 14:31

z-image support npu

eb382fa

Merge branch 'main' into main

654c530

[Bug Fix][Qwen-Image-Edit] Fix Qwen-Image-Edit series on NPU

b103f42

Enhance NPU attention handling by converting attention mask to boolea…

3ed2a75

…n and refining mask checks.

Refine attention mask handling in NPU attention function to improve v…

5005564

…alidation and conversion logic.

Clean Code

e042b0d

Update attention_dispatch.py

e4bbc6d

Refine attention mask processing in NPU attention functions to enhanc…

5c92a77

…e performance and validation.

Remove item() ops on npu fa backend.

8abfddd

attention_dispatch.py backup

677c0ff

Reuse NPU attention mask by _maybe_modify_attn_mask_npu

020a232

Merge branch 'main' into fix_npu_related_error

34da336

merge RopeEmbedderNPU into RopeEmbedder

1dc7cc5

Merge branch 'main' into main

5a5c479

Apply style fixes

b7d1325

Merge branch 'pr_13017' into z-image-npu-support

5a422df

Feat. Support Z-Image attention mask for NPU

e677072

zhangtao0408 changed the title ~~Feat. Z image npu support~~ [Draft] Feat. Z image npu support Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Feat. Z image npu support #13041

[Draft] Feat. Z image npu support #13041

zhangtao0408 commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Draft] Feat. Z image npu support #13041

Are you sure you want to change the base?

[Draft] Feat. Z image npu support #13041

Conversation

zhangtao0408 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Codes

Results

Before submitting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangtao0408 commented Jan 28, 2026 •

edited

Loading