From 318f2bf918b5a618453520a73dc441f07fc0802d Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Mon, 26 Jan 2026 01:06:59 +0100 Subject: [PATCH 1/8] add a real quick start guide --- docs/source/en/modular_diffusers/overview.md | 2 +- .../source/en/modular_diffusers/quickstart.md | 365 +++++------------- 2 files changed, 106 insertions(+), 261 deletions(-) diff --git a/docs/source/en/modular_diffusers/overview.md b/docs/source/en/modular_diffusers/overview.md index 7d07c4b73434..1f38639a7d90 100644 --- a/docs/source/en/modular_diffusers/overview.md +++ b/docs/source/en/modular_diffusers/overview.md @@ -24,7 +24,7 @@ The Modular Diffusers docs are organized as shown below. ## Quickstart -- A [quickstart](./quickstart) demonstrating how to implement an example workflow with Modular Diffusers. +- A [quickstart](./quickstart) shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it. ## ModularPipelineBlocks diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 32d14d84e243..18ecf598d335 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -12,333 +12,178 @@ specific language governing permissions and limitations under the License. # Quickstart -Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [`ModularPipelineBlocks`] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [`ModularPipeline`], a friendly user-facing interface developers can use. +Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [`ModularPipelineBlocks`] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [`ModularPipeline`], a friendly user-facing interface for running generation tasks. -This doc will show you how to implement a [Differential Diffusion](https://differential-diffusion.github.io/) pipeline with the modular framework. +This guide shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it. -## ModularPipelineBlocks +## Run a pipeline -[`ModularPipelineBlocks`] are *definitions* that specify the components, inputs, outputs, and computation logic for a single step in a pipeline. There are four types of blocks. - -- [`ModularPipelineBlocks`] is the most basic block for a single step. -- [`SequentialPipelineBlocks`] is a multi-block that composes other blocks linearly. The outputs of one block are the inputs to the next block. -- [`LoopSequentialPipelineBlocks`] is a multi-block that runs iteratively and is designed for iterative workflows. -- [`AutoPipelineBlocks`] is a collection of blocks for different workflows and it selects which block to run based on the input. It is designed to conveniently package multiple workflows into a single pipeline. - -[Differential Diffusion](https://differential-diffusion.github.io/) is an image-to-image workflow. Start with the `IMAGE2IMAGE_BLOCKS` preset, a collection of `ModularPipelineBlocks` for image-to-image generation. +[`ModularPipeline`] is the main interface for loading, running, and managing modular pipelines. ```py -from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS -IMAGE2IMAGE_BLOCKS = InsertableDict([ - ("text_encoder", StableDiffusionXLTextEncoderStep), - ("image_encoder", StableDiffusionXLVaeEncoderStep), - ("input", StableDiffusionXLInputStep), - ("set_timesteps", StableDiffusionXLImg2ImgSetTimestepsStep), - ("prepare_latents", StableDiffusionXLImg2ImgPrepareLatentsStep), - ("prepare_add_cond", StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep), - ("denoise", StableDiffusionXLDenoiseStep), - ("decode", StableDiffusionXLDecodeStep) -]) -``` - -## Pipeline and block states - -Modular Diffusers uses *state* to communicate data between blocks. There are two types of states. - -- [`PipelineState`] is a global state that can be used to track all inputs and outputs across all blocks. -- [`BlockState`] is a local view of relevant variables from [`PipelineState`] for an individual block. +import torch +from diffusers import ModularPipeline -## Customizing blocks +pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image") +pipe.load_components(torch_dtype=torch.bfloat16) +pipe.to("cuda") -[Differential Diffusion](https://differential-diffusion.github.io/) differs from standard image-to-image in its `prepare_latents` and `denoise` blocks. All the other blocks can be reused, but you'll need to modify these two. +image = pipe( + prompt="A cat astronaut floating in space", +).images[0] +image +``` -Create placeholder `ModularPipelineBlocks` for `prepare_latents` and `denoise` by copying and modifying the existing ones. +[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration and knows where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded. -Print the `denoise` block to see that it is composed of [`LoopSequentialPipelineBlocks`] with three sub-blocks, `before_denoiser`, `denoiser`, and `after_denoiser`. Only the `before_denoiser` sub-block needs to be modified to prepare the latent input for the denoiser based on the change map. +Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides. -```py -denoise_blocks = IMAGE2IMAGE_BLOCKS["denoise"]() -print(denoise_blocks) -``` +## Understand the structure -Replace the `StableDiffusionXLLoopBeforeDenoiser` sub-block with the new `SDXLDiffDiffLoopBeforeDenoiser` block. +The pipeline you loaded from `"Qwen/Qwen-Image"` is built from a [`ModularPipelineBlocks`] called `QwenImageAutoBlocks`. Print it to see its structure. ```py -# Copy existing blocks as placeholders -class SDXLDiffDiffPrepareLatentsStep(ModularPipelineBlocks): - """Copied from StableDiffusionXLImg2ImgPrepareLatentsStep - will modify later""" - # ... same implementation as StableDiffusionXLImg2ImgPrepareLatentsStep - -class SDXLDiffDiffDenoiseStep(StableDiffusionXLDenoiseLoopWrapper): - block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLLoopDenoiser, StableDiffusionXLLoopAfterDenoiser] - block_names = ["before_denoiser", "denoiser", "after_denoiser"] +print(pipe.blocks) ``` -### prepare_latents - -The `prepare_latents` block requires the following changes. - -- a processor to process the change map -- a new `inputs` to accept the user-provided change map, `timestep` for precomputing all the latents and `num_inference_steps` to create the mask for updating the image regions -- update the computation in the `__call__` method for processing the change map and creating the masks, and storing it in the [`BlockState`] - -```diff -class SDXLDiffDiffPrepareLatentsStep(ModularPipelineBlocks): - @property - def expected_components(self) -> List[ComponentSpec]: - return [ - ComponentSpec("vae", AutoencoderKL), - ComponentSpec("scheduler", EulerDiscreteScheduler), -+ ComponentSpec("mask_processor", VaeImageProcessor, config=FrozenDict({"do_normalize": False, "do_convert_grayscale": True})) - ] - @property - def inputs(self) -> List[Tuple[str, Any]]: - return [ - InputParam("generator"), -+ InputParam("diffdiff_map", required=True), -- InputParam("latent_timestep", required=True, type_hint=torch.Tensor), -+ InputParam("timesteps", type_hint=torch.Tensor), -+ InputParam("num_inference_steps", type_hint=int), - ] - - @property - def intermediate_outputs(self) -> List[OutputParam]: - return [ -+ OutputParam("original_latents", type_hint=torch.Tensor), -+ OutputParam("diffdiff_masks", type_hint=torch.Tensor), - ] - def __call__(self, components, state: PipelineState): - # ... existing logic ... -+ # Process change map and create masks -+ diffdiff_map = components.mask_processor.preprocess(block_state.diffdiff_map, height=latent_height, width=latent_width) -+ thresholds = torch.arange(block_state.num_inference_steps, dtype=diffdiff_map.dtype) / block_state.num_inference_steps -+ block_state.diffdiff_masks = diffdiff_map > (thresholds + (block_state.denoising_start or 0)) -+ block_state.original_latents = block_state.latents ``` +class QwenImageAutoBlocks -### denoise - -The `before_denoiser` sub-block requires the following changes. - -- a new `inputs` to accept a `denoising_start` parameter, `original_latents` and `diffdiff_masks` from the `prepare_latents` block -- update the computation in the `__call__` method for applying Differential Diffusion - -```diff -class SDXLDiffDiffLoopBeforeDenoiser(ModularPipelineBlocks): - @property - def description(self) -> str: - return ( - "Step within the denoising loop for differential diffusion that prepare the latent input for the denoiser" - ) - - @property - def inputs(self) -> List[str]: - return [ - InputParam("latents", required=True, type_hint=torch.Tensor), -+ InputParam("denoising_start"), -+ InputParam("original_latents", type_hint=torch.Tensor), -+ InputParam("diffdiff_masks", type_hint=torch.Tensor), - ] - - def __call__(self, components, block_state, i, t): -+ # Apply differential diffusion logic -+ if i == 0 and block_state.denoising_start is None: -+ block_state.latents = block_state.original_latents[:1] -+ else: -+ block_state.mask = block_state.diffdiff_masks[i].unsqueeze(0).unsqueeze(1) -+ block_state.latents = block_state.original_latents[i] * block_state.mask + block_state.latents * (1 - block_state.mask) - - # ... rest of existing logic ... -``` + Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks. -## Assembling the blocks + Supported workflows: + - `text2image`: requires `prompt` + - `image2image`: requires `prompt`, `image` + - `inpainting`: requires `prompt`, `mask_image`, `image` + - `controlnet_text2image`: requires `prompt`, `control_image` + ... -You should have all the blocks you need at this point to create a [`ModularPipeline`]. + Sub-blocks: + - text_encoder: QwenImageTextEncoderStep + - vae_encoder: QwenImageAutoVaeEncoderStep + - denoise: QwenImageAutoCoreDenoiseStep + - decode: QwenImageAutoDecodeStep +``` -Copy the existing `IMAGE2IMAGE_BLOCKS` preset and for the `set_timesteps` block, use the `set_timesteps` from the `TEXT2IMAGE_BLOCKS` because Differential Diffusion doesn't require a `strength` parameter. +From this output you can see two things: +- It supports multiple **workflows** (text2image, image2image, inpainting, etc.) +- It's composed of **sub_blocks** (text_encoder, vae_encoder, denoise, decode) -Set the `prepare_latents` and `denoise` blocks to the `SDXLDiffDiffPrepareLatentsStep` and `SDXLDiffDiffDenoiseStep` blocks you just modified. +### Workflows -Call [`SequentialPipelineBlocks.from_blocks_dict`] on the blocks to create a `SequentialPipelineBlocks`. +This pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. ```py -DIFFDIFF_BLOCKS = IMAGE2IMAGE_BLOCKS.copy() -DIFFDIFF_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"] -DIFFDIFF_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep -DIFFDIFF_BLOCKS["denoise"] = SDXLDiffDiffDenoiseStep +from diffusers.utils import load_image -dd_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_BLOCKS) -print(dd_blocks) -``` +input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true") -## ModularPipeline +image = pipe( + prompt="A cat astronaut floating in space", + image=input_image, +).images[0] +``` -Convert the [`SequentialPipelineBlocks`] into a [`ModularPipeline`] with the [`ModularPipeline.init_pipeline`] method. This initializes the expected components to load from a `modular_model_index.json` file. Explicitly load the components by calling [`ModularPipeline.load_components`]. +Learn more about conditional blocks in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide. -It is a good idea to initialize the [`ComponentManager`] with the pipeline to help manage the different components. Once you call [`~ModularPipeline.load_components`], the components are registered to the [`ComponentManager`] and can be shared between workflows. The example below uses the `collection` argument to assign the components a `"diffdiff"` label for better organization. +Use `get_workflow()` to extract the blocks for a specific workflow. ```py -from diffusers.modular_pipelines import ComponentsManager - -components = ComponentManager() - -dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", components_manager=components, collection="diffdiff") -dd_pipeline.load_componenets(torch_dtype=torch.float16) -dd_pipeline.to("cuda") +img2img_blocks = pipe.blocks.get_workflow("image2image") ``` -## Adding workflows +### Sub-blocks -Other workflows can be added to the [`ModularPipeline`] to support additional features without rewriting the entire pipeline from scratch. +Blocks are the building blocks of the modular system. They are *definitions* that specify the inputs, outputs, and computation logic for a step - and they can be composed together in different ways. -This section demonstrates how to add an IP-Adapter or ControlNet. - -### IP-Adapter - -Stable Diffusion XL already has a preset IP-Adapter block that you can use and doesn't require any changes to the existing Differential Diffusion pipeline. +Let's take a look at the `vae_encoder` block as an example. Use the `doc` property to see the full documentation for any block, including its inputs, outputs, and components. ```py -from diffusers.modular_pipelines.stable_diffusion_xl.encoders import StableDiffusionXLAutoIPAdapterStep - -ip_adapter_block = StableDiffusionXLAutoIPAdapterStep() +vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"] +print(vae_encoder_block.doc) ``` -Use the [`sub_blocks.insert`] method to insert it into the [`ModularPipeline`]. The example below inserts the `ip_adapter_block` at position `0`. Print the pipeline to see that the `ip_adapter_block` is added and it requires an `ip_adapter_image`. This also added two components to the pipeline, the `image_encoder` and `feature_extractor`. - +Just like `QwenImageAutoBlocks`, this block can be converted to a pipeline and run on its own. ```py -dd_blocks.sub_blocks.insert("ip_adapter", ip_adapter_block, 0) -``` +vae_encoder_pipe = vae_encoder_block.init_pipeline() -Call [`~ModularPipeline.init_pipeline`] to initialize a [`ModularPipeline`] and use [`~ModularPipeline.load_components`] to load the model components. Load and set the IP-Adapter to run the pipeline. +# Reuse the VAE we already loaded, we can reuse it with update_componenets() method +vae_encoder_pipe.update_components(vae=pipe.vae) -```py -dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff") -dd_pipeline.load_components(torch_dtype=torch.float16) -dd_pipeline.loader.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin") -dd_pipeline.loader.set_ip_adapter_scale(0.6) -dd_pipeline = dd_pipeline.to(device) +# Run just this block +image_latents = vae_encoder_pipe(image=input_image).image_latents +print(image_latents.shape) +``` -ip_adapter_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_orange.jpeg") -image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true") -mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true") +This reuses the VAE from our original pipeline instead of loading it again, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide. -prompt = "a green pear" -negative_prompt = "blurry" -generator = torch.Generator(device=device).manual_seed(42) +You can also add new blocks to compose new workflows. Let's add a canny edge detection block to create a ControlNet pipeline. -image = dd_pipeline( - prompt=prompt, - negative_prompt=negative_prompt, - num_inference_steps=25, - generator=generator, - ip_adapter_image=ip_adapter_image, - diffdiff_map=mask, - image=image, - output="images" -)[0] -``` +First, load the canny block from the Hub and insert it into the controlnet workflow. If you want to learn how to create your own custom blocks and share them on the Hub, check out the [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) guide. -### ControlNet +```py +from diffusers.modular_pipelines import ModularPipelineBlocks -Stable Diffusion XL already has a preset ControlNet block that can readily be used. +# Load a canny block from the Hub +canny_block = ModularPipelineBlocks.from_pretrained( + "diffusers-internal-dev/canny-filtering", + trust_remote_code=True, +) -```py -from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import StableDiffusionXLAutoControlNetInputStep +# Get the controlnet workflow and insert canny at the beginning +blocks = pipe.blocks.get_workflow("controlnet_text2image") +blocks.sub_blocks.insert("canny", canny_block, 0) -control_input_block = StableDiffusionXLAutoControlNetInputStep() +# Check the updated structure - notice the pipeline now takes "image" as input +# even though it's a controlnet pipeline, because canny preprocesses it into control_image +print(blocks.doc) ``` -However, it requires modifying the `denoise` block because that's where the ControlNet injects the control information into the UNet. - -Modify the `denoise` block by replacing the `StableDiffusionXLLoopDenoiser` sub-block with the `StableDiffusionXLControlNetLoopDenoiser`. +Create a pipeline from the modified blocks and load a ControlNet model. ```py -class SDXLDiffDiffControlNetDenoiseStep(StableDiffusionXLDenoiseLoopWrapper): - block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLControlNetLoopDenoiser, StableDiffusionXLDenoiseLoopAfterDenoiser] - block_names = ["before_denoiser", "denoiser", "after_denoiser"] +pipeline = blocks.init_pipeline("Qwen/Qwen-Image") +pipeline.load_components(torch_dtype=torch.bfloat16) -controlnet_denoise_block = SDXLDiffDiffControlNetDenoiseStep() +# Load the ControlNet model +controlnet_spec = pipeline.get_component_spec("controlnet") +controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union" +controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16) +pipeline.update_components(controlnet=controlnet) +pipeline.to("cuda") ``` -Insert the `controlnet_input` block and replace the `denoise` block with the new `controlnet_denoise_block`. Initialize a [`ModularPipeline`] and [`~ModularPipeline.load_components`] into it. +Now run the pipeline - the canny block preprocesses the image for ControlNet. ```py -dd_blocks.sub_blocks.insert("controlnet_input", control_input_block, 7) -dd_blocks.sub_blocks["denoise"] = controlnet_denoise_block - -dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff") -dd_pipeline.load_components(torch_dtype=torch.float16) -dd_pipeline = dd_pipeline.to(device) - -control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_tomato_canny.jpeg") -image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true") -mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true") +from diffusers.utils import load_image -prompt = "a green pear" -negative_prompt = "blurry" -generator = torch.Generator(device=device).manual_seed(42) +prompt = "cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney" +image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true") -image = dd_pipeline( +output = pipeline( prompt=prompt, - negative_prompt=negative_prompt, - num_inference_steps=25, - generator=generator, - control_image=control_image, - controlnet_conditioning_scale=0.5, - diffdiff_map=mask, image=image, - output="images" -)[0] -``` - -### AutoPipelineBlocks - -The Differential Diffusion, IP-Adapter, and ControlNet workflows can be bundled into a single [`ModularPipeline`] by using [`AutoPipelineBlocks`]. This allows automatically selecting which sub-blocks to run based on the inputs like `control_image` or `ip_adapter_image`. If none of these inputs are passed, then it defaults to the Differential Diffusion. - -Use `block_trigger_inputs` to only run the `SDXLDiffDiffControlNetDenoiseStep` block if a `control_image` input is provided. Otherwise, the `SDXLDiffDiffDenoiseStep` is used. - -```py -class SDXLDiffDiffAutoDenoiseStep(AutoPipelineBlocks): - block_classes = [SDXLDiffDiffControlNetDenoiseStep, SDXLDiffDiffDenoiseStep] - block_names = ["controlnet_denoise", "denoise"] - block_trigger_inputs = ["controlnet_cond", None] +).images[0] +output ``` -Add the `ip_adapter` and `controlnet_input` blocks. +## Next steps -```py -DIFFDIFF_AUTO_BLOCKS = IMAGE2IMAGE_BLOCKS.copy() -DIFFDIFF_AUTO_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep -DIFFDIFF_AUTO_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"] -DIFFDIFF_AUTO_BLOCKS["denoise"] = SDXLDiffDiffAutoDenoiseStep -DIFFDIFF_AUTO_BLOCKS.insert("ip_adapter", StableDiffusionXLAutoIPAdapterStep, 0) -DIFFDIFF_AUTO_BLOCKS.insert("controlnet_input",StableDiffusionXLControlNetAutoInput, 7) -``` - -Call [`SequentialPipelineBlocks.from_blocks_dict`] to create a [`SequentialPipelineBlocks`] and create a [`ModularPipeline`] and load in the model components to run. - -```py -dd_auto_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_AUTO_BLOCKS) -dd_pipeline = dd_auto_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff") -dd_pipeline.load_components(torch_dtype=torch.float16) -``` + + -## Share +Learn how to create your own blocks with custom logic in the [Building Custom Blocks](./custom_blocks) guide. -Add your [`ModularPipeline`] to the Hub with [`~ModularPipeline.save_pretrained`] and set `push_to_hub` argument to `True`. + + -```py -dd_pipeline.save_pretrained("YiYiXu/test_modular_doc", push_to_hub=True) -``` - -Other users can load the [`ModularPipeline`] with [`~ModularPipeline.from_pretrained`]. +Use [`ComponentsManager`](./components_manager) to share models across multiple pipelines and manage memory efficiently. -```py -import torch -from diffusers.modular_pipelines import ModularPipeline, ComponentsManager + + -components = ComponentsManager() +Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in [Mellon](./mellon) guide -diffdiff_pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-diffdiff-0704", trust_remote_code=True, components_manager=components, collection="diffdiff") -diffdiff_pipeline.load_components(torch_dtype=torch.float16) -``` + + From 809fc36e03cb346227062e8d858c2768b6bd81fe Mon Sep 17 00:00:00 2001 From: YiYi Xu Date: Sun, 25 Jan 2026 14:10:32 -1000 Subject: [PATCH 2/8] Update docs/source/en/modular_diffusers/quickstart.md --- docs/source/en/modular_diffusers/quickstart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 18ecf598d335..534582eac6d9 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -183,7 +183,7 @@ Use [`ComponentsManager`](./components_manager) to share models across multiple -Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in [Mellon](./mellon) guide +Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide From 56dd6cca1569c92427e642a5cbe9c96f3e000043 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Mon, 26 Jan 2026 02:01:23 +0100 Subject: [PATCH 3/8] update a bit more --- .../source/en/modular_diffusers/quickstart.md | 103 ++++++++++++++---- 1 file changed, 84 insertions(+), 19 deletions(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 534582eac6d9..22bebe0791ed 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -29,7 +29,7 @@ pipe.load_components(torch_dtype=torch.bfloat16) pipe.to("cuda") image = pipe( - prompt="A cat astronaut floating in space", + prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney", ).images[0] image ``` @@ -47,22 +47,31 @@ print(pipe.blocks) ``` ``` -class QwenImageAutoBlocks - - Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks. - - Supported workflows: - - `text2image`: requires `prompt` - - `image2image`: requires `prompt`, `image` - - `inpainting`: requires `prompt`, `mask_image`, `image` - - `controlnet_text2image`: requires `prompt`, `control_image` - ... - - Sub-blocks: - - text_encoder: QwenImageTextEncoderStep - - vae_encoder: QwenImageAutoVaeEncoderStep - - denoise: QwenImageAutoCoreDenoiseStep - - decode: QwenImageAutoDecodeStep +QwenImageAutoBlocks( + Class: SequentialPipelineBlocks + + Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using QwenImage. + + Supported workflows: + - `text2image`: requires `prompt` + - `image2image`: requires `prompt`, `image` + - `inpainting`: requires `prompt`, `mask_image`, `image` + - `controlnet_text2image`: requires `prompt`, `control_image` + ... + + Components: + text_encoder (`Qwen2_5_VLForConditionalGeneration`) + vae (`AutoencoderKLQwenImage`) + transformer (`QwenImageTransformer2DModel`) + ... + + Sub-Blocks: + [0] text_encoder (QwenImageAutoTextEncoderStep) + [1] vae_encoder (QwenImageAutoVaeEncoderStep) + [2] controlnet_vae_encoder (QwenImageOptionalControlNetVaeEncoderStep) + [3] denoise (QwenImageAutoCoreDenoiseStep) + [4] decode (QwenImageAutoDecodeStep) +) ``` From this output you can see two things: @@ -79,7 +88,7 @@ from diffusers.utils import load_image input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true") image = pipe( - prompt="A cat astronaut floating in space", + prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney", image=input_image, ).images[0] ``` @@ -96,6 +105,8 @@ img2img_blocks = pipe.blocks.get_workflow("image2image") Blocks are the building blocks of the modular system. They are *definitions* that specify the inputs, outputs, and computation logic for a step - and they can be composed together in different ways. +`QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property. Let's take a look at the `vae_encoder` block - use the `doc` property to see its inputs, outputs, and components. + Let's take a look at the `vae_encoder` block as an example. Use the `doc` property to see the full documentation for any block, including its inputs, outputs, and components. ```py @@ -130,14 +141,68 @@ canny_block = ModularPipelineBlocks.from_pretrained( trust_remote_code=True, ) -# Get the controlnet workflow and insert canny at the beginning +print(canny_block.doc) +``` +``` +class CannyBlock + + Inputs: + image (`Union[Image, ndarray]`): + Image to compute canny filter on + low_threshold (`int`, *optional*, defaults to 50): + Low threshold for the canny filter. + high_threshold (`int`, *optional*, defaults to 200): + High threshold for the canny filter. + ... + + Outputs: + control_image (`PIL.Image`): + Canny map for input image +``` + +The canny block takes an `image` as input and outputs a `control_image` - exactly what we need for ControlNet. Now get the controlnet workflow and check its inputs. + +```py +# Get the controlnet workflow blocks = pipe.blocks.get_workflow("controlnet_text2image") +print(blocks.doc) +``` +``` +class SequentialPipelineBlocks + + Inputs: + prompt (`str`): + The prompt or prompts to guide image generation. + control_image (`Image`): + Control image for ControlNet conditioning. + ... +``` +Notice it requires control_image as input. After inserting the canny block, the pipeline will accept a regular image instead. + +```py +# and insert canny at the beginning blocks.sub_blocks.insert("canny", canny_block, 0) # Check the updated structure - notice the pipeline now takes "image" as input # even though it's a controlnet pipeline, because canny preprocesses it into control_image print(blocks.doc) ``` +``` +class SequentialPipelineBlocks + + Inputs: + image (`Union[Image, ndarray]`): + Image to compute canny filter on + low_threshold (`int`, *optional*, defaults to 50): + Low threshold for the canny filter. + high_threshold (`int`, *optional*, defaults to 200): + High threshold for the canny filter. + prompt (`str`): + The prompt or prompts to guide image generation. + ... +``` + +Now the pipeline takes `image` as input - the canny block will preprocess it into `control_image` automatically. Create a pipeline from the modified blocks and load a ControlNet model. From fe4e4d7b66d9de9f29b2a6738aee6dbe3a4c8532 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Mon, 26 Jan 2026 02:03:39 +0100 Subject: [PATCH 4/8] fix --- docs/source/en/modular_diffusers/quickstart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 22bebe0791ed..241e46b00034 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -105,7 +105,7 @@ img2img_blocks = pipe.blocks.get_workflow("image2image") Blocks are the building blocks of the modular system. They are *definitions* that specify the inputs, outputs, and computation logic for a step - and they can be composed together in different ways. -`QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property. Let's take a look at the `vae_encoder` block - use the `doc` property to see its inputs, outputs, and components. +`QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property. Let's take a look at the `vae_encoder` block as an example. Use the `doc` property to see the full documentation for any block, including its inputs, outputs, and components. From 8483c069939d9fc665fee40aff77d0b89477edf7 Mon Sep 17 00:00:00 2001 From: YiYi Xu Date: Mon, 26 Jan 2026 08:26:06 -1000 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/modular_diffusers/overview.md | 2 +- docs/source/en/modular_diffusers/quickstart.md | 18 +++++++++--------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/source/en/modular_diffusers/overview.md b/docs/source/en/modular_diffusers/overview.md index 1f38639a7d90..8e27cad6eb91 100644 --- a/docs/source/en/modular_diffusers/overview.md +++ b/docs/source/en/modular_diffusers/overview.md @@ -24,7 +24,7 @@ The Modular Diffusers docs are organized as shown below. ## Quickstart -- A [quickstart](./quickstart) shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it. +- The [quickstart](./quickstart) shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it. ## ModularPipelineBlocks diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 241e46b00034..f5e904d72021 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -34,13 +34,13 @@ image = pipe( image ``` -[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration and knows where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded. +[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration to learn where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded. Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides. ## Understand the structure -The pipeline you loaded from `"Qwen/Qwen-Image"` is built from a [`ModularPipelineBlocks`] called `QwenImageAutoBlocks`. Print it to see its structure. +The pipeline is built from [`ModularPipelineBlocks`] specific to the model. For example, [`QwenImage`] is built from `QwenImageAutoBlocks`. Print it to see its structure. ```py print(pipe.blocks) @@ -74,9 +74,9 @@ QwenImageAutoBlocks( ) ``` -From this output you can see two things: -- It supports multiple **workflows** (text2image, image2image, inpainting, etc.) -- It's composed of **sub_blocks** (text_encoder, vae_encoder, denoise, decode) +The output returns: +- The supported workflows (text2image, image2image, inpainting, etc.) +- The Sub-Blocks it's composed of (text_encoder, vae_encoder, denoise, decode) ### Workflows @@ -107,14 +107,14 @@ Blocks are the building blocks of the modular system. They are *definitions* tha `QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property. -Let's take a look at the `vae_encoder` block as an example. Use the `doc` property to see the full documentation for any block, including its inputs, outputs, and components. +The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components. ```py vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"] print(vae_encoder_block.doc) ``` -Just like `QwenImageAutoBlocks`, this block can be converted to a pipeline and run on its own. +This block can be converted to a pipeline and run on its own with [`~ModularPipelineBlocks.init_pipeline`]. ```py vae_encoder_pipe = vae_encoder_block.init_pipeline() @@ -126,7 +126,7 @@ image_latents = vae_encoder_pipe(image=input_image).image_latents print(image_latents.shape) ``` -This reuses the VAE from our original pipeline instead of loading it again, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide. +It reuses the VAE from our original pipeline instead of reloading it, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide. You can also add new blocks to compose new workflows. Let's add a canny edge detection block to create a ControlNet pipeline. @@ -177,7 +177,7 @@ class SequentialPipelineBlocks Control image for ControlNet conditioning. ... ``` -Notice it requires control_image as input. After inserting the canny block, the pipeline will accept a regular image instead. +It requires control_image as input. After inserting the canny block, the pipeline will accept a regular image instead. ```py # and insert canny at the beginning From b6d05bb69c569eecb5de1e6da9ff8ac909515b9f Mon Sep 17 00:00:00 2001 From: YiYi Xu Date: Mon, 26 Jan 2026 08:26:40 -1000 Subject: [PATCH 6/8] Update docs/source/en/modular_diffusers/quickstart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/modular_diffusers/quickstart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index f5e904d72021..2f42ec732581 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -160,7 +160,7 @@ class CannyBlock Canny map for input image ``` -The canny block takes an `image` as input and outputs a `control_image` - exactly what we need for ControlNet. Now get the controlnet workflow and check its inputs. +Use `get_workflow` to extract the ControlNet workflow. ```py # Get the controlnet workflow From 7dc454f4662f6a2d8d94bad4fb488dad755b468a Mon Sep 17 00:00:00 2001 From: YiYi Xu Date: Mon, 26 Jan 2026 08:27:04 -1000 Subject: [PATCH 7/8] Update docs/source/en/modular_diffusers/quickstart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/modular_diffusers/quickstart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 2f42ec732581..5d03b72e316a 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -130,7 +130,7 @@ It reuses the VAE from our original pipeline instead of reloading it, keeping me You can also add new blocks to compose new workflows. Let's add a canny edge detection block to create a ControlNet pipeline. -First, load the canny block from the Hub and insert it into the controlnet workflow. If you want to learn how to create your own custom blocks and share them on the Hub, check out the [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) guide. +1. Load the canny block from the Hub and insert it into the ControlNet workflow. If you want to learn how to create your own custom blocks and share them on the Hub, check out the [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) guide. ```py from diffusers.modular_pipelines import ModularPipelineBlocks From 077b6972cf911329520c8534483589a0030aad88 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Mon, 26 Jan 2026 21:36:24 +0100 Subject: [PATCH 8/8] update more --- .../source/en/modular_diffusers/quickstart.md | 45 +++++++++++-------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/docs/source/en/modular_diffusers/quickstart.md b/docs/source/en/modular_diffusers/quickstart.md index 5d03b72e316a..857b90e53a08 100644 --- a/docs/source/en/modular_diffusers/quickstart.md +++ b/docs/source/en/modular_diffusers/quickstart.md @@ -40,8 +40,11 @@ Learn more about creating and loading pipelines in the [Creating a pipeline](htt ## Understand the structure -The pipeline is built from [`ModularPipelineBlocks`] specific to the model. For example, [`QwenImage`] is built from `QwenImageAutoBlocks`. Print it to see its structure. +A [`ModularPipeline`] has two parts: +- **State**: the loaded components (models, schedulers, processors) and configuration +- **Definition**: the [`ModularPipelineBlocks`] that specify inputs, outputs, expected components and computation logic +The blocks define *what* the pipeline does. Access them through `pipe.blocks`. ```py print(pipe.blocks) ``` @@ -80,8 +83,7 @@ The output returns: ### Workflows -This pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. - +`QwenImageAutoBlocks` is a [`ConditionalPipelineBlocks`], so this pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. ```py from diffusers.utils import load_image @@ -93,17 +95,14 @@ image = pipe( ).images[0] ``` -Learn more about conditional blocks in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide. - Use `get_workflow()` to extract the blocks for a specific workflow. - ```py img2img_blocks = pipe.blocks.get_workflow("image2image") ``` -### Sub-blocks +Conditional blocks are convenient for users, but their conditional logic adds complexity when customizing or debugging. Extracting a workflow gives you the specific blocks relevant to your workflow, making it easier to work with. Learn more in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide. -Blocks are the building blocks of the modular system. They are *definitions* that specify the inputs, outputs, and computation logic for a step - and they can be composed together in different ways. +### Sub-blocks `QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property. @@ -118,7 +117,7 @@ This block can be converted to a pipeline and run on its own with [`~ModularPipe ```py vae_encoder_pipe = vae_encoder_block.init_pipeline() -# Reuse the VAE we already loaded, we can reuse it with update_componenets() method +# Reuse the VAE we already loaded, we can reuse it with update_components() method vae_encoder_pipe.update_components(vae=pipe.vae) # Run just this block @@ -128,9 +127,11 @@ print(image_latents.shape) It reuses the VAE from our original pipeline instead of reloading it, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide. -You can also add new blocks to compose new workflows. Let's add a canny edge detection block to create a ControlNet pipeline. +Since blocks are composable, you can modify the pipeline's definition by adding, removing, or swapping blocks to create new workflows. In the next section, we'll add a canny edge detection block to a ControlNet pipeline, so you can pass a regular image instead of a pre-processed canny edge map. -1. Load the canny block from the Hub and insert it into the ControlNet workflow. If you want to learn how to create your own custom blocks and share them on the Hub, check out the [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) guide. +## Compose new workflows + +Let's add a canny edge detection block to a ControlNet pipeline. First, load a pre-built canny block from the Hub (see [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) to create your own). ```py from diffusers.modular_pipelines import ModularPipelineBlocks @@ -160,10 +161,10 @@ class CannyBlock Canny map for input image ``` -Use `get_workflow` to extract the ControlNet workflow. +Use `get_workflow` to extract the ControlNet workflow from [`QwenImageAutoBlocks`]. ```py -# Get the controlnet workflow +# Get the controlnet workflow that we want to work with blocks = pipe.blocks.get_workflow("controlnet_text2image") print(blocks.doc) ``` @@ -183,7 +184,9 @@ It requires control_image as input. After inserting the canny block, the pipelin # and insert canny at the beginning blocks.sub_blocks.insert("canny", canny_block, 0) -# Check the updated structure - notice the pipeline now takes "image" as input +# Check the updated structure: CannyBlock is now listed as first sub-block +print(blocks) +# Check the updated doc: notice the pipeline now takes "image" as input # even though it's a controlnet pipeline, because canny preprocesses it into control_image print(blocks.doc) ``` @@ -204,10 +207,15 @@ class SequentialPipelineBlocks Now the pipeline takes `image` as input - the canny block will preprocess it into `control_image` automatically. -Create a pipeline from the modified blocks and load a ControlNet model. - +Create a pipeline from the modified blocks and load a ControlNet model. We use [`ComponentsManager`] to enable CPU offloading for reduced memory usage (learn more in the [ComponentsManager](./components_manager) guide). ```py -pipeline = blocks.init_pipeline("Qwen/Qwen-Image") +from diffusers import ComponentsManager + +manager = ComponentsManager() +manager.enable_auto_cpu_offload(device="cuda:0") + +pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager) + pipeline.load_components(torch_dtype=torch.bfloat16) # Load the ControlNet model @@ -215,7 +223,6 @@ controlnet_spec = pipeline.get_component_spec("controlnet") controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union" controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16) pipeline.update_components(controlnet=controlnet) -pipeline.to("cuda") ``` Now run the pipeline - the canny block preprocesses the image for ControlNet. @@ -248,7 +255,7 @@ Use [`ComponentsManager`](./components_manager) to share models across multiple -Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide +Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide.