v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and more
Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!
Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!
Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.
(Image taken from the official repository)
The code snippet below shows how to use this pipeline for depth estimation:
import diffusers
import torch
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda")
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image)
vis = pipe.image_processor.visualize_depth(depth.prediction)
vis[0].save("einstein_depth.png")
depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
depth_16bit[0].save("einstein_depth_16bit.png")
Check out the API documentation here. We also have a detailed guide about the pipeline here.
Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.
from_single_file 🌀We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.
Some of the changes introduced in this refactor:
runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)
from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!
<div align="center"> <img src="https://github.com/huggingface/diffusers/assets/22957388/31f2b30b-e46f-4fc9-aeb7-a6dea50b474b" width=700/><br> <small>(Taken from the <a href="https://pixart-alpha.github.io/PixArt-sigma-project">project website</a>.)</small> </div> <br>import torch
from diffusers import PixArtSigmaPipeline
# You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
📃 Refer to the documentation here to learn more about PixArt Sigma.
Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.
@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.
import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
clip_sample=False,
beta_schedule="linear",
steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
model_id,
motion_adapter=adapter,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).enable_model_cpu_offload()
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
output = pipe(
prompt="a panda surfing in the ocean, realistic, high quality",
negative_prompt="low quality, worst quality",
num_inference_steps=20,
guidance_scale=8,
width=1024,
height=1024,
num_frames=16,
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
📜 Refer to the documentation to learn more.
@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:
...
adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
pipe.set_adapters("pixel", adapter_weight_scales)
image = pipe(
prompt, num_inference_steps=30, generator=torch.manual_seed(0)
).images[0]
✍️ Refer to our documentation for more details and a full-fledged example.
More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:
...
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.
Check out the documentation here.
ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.
ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.
Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.
We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.
from diffusers.schedulers import AysSchedules
sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
pipe = StableDiffusionXLPipeline.from_pretrained(
"SG161222/RealVisXL_V4.0",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")
prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up"
image = pipe(prompt=prompt, timesteps=sampling_schedule).images[0]
Check out the documentation here
device_map in Pipelines 🧪We have introduced experimental support for device_map in our pipelines. This feature becomes relevant when you have multiple accelerators to distribute the components of a pipeline. Currently, we support only “balanced” device_map. However, we plan to support other device mapping strategies relevant to diffusion models in the future.
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
device_map="balanced"
)
image = pipeline("a dog").images[0]
In cases where you might be limited to low VRAM accelerators, you can use device_map to benefit from them. Below, we simulate a situation where we have access to two GPUs, each having only a GB of VRAM (through the max_memory argument).
from diffusers import DiffusionPipeline
import torch
max_memory = {0:"1GB", 1:"1GB"}
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
use_safetensors=True,
device_map="balanced",
max_memory=max_memory
)
image = pipeline("a dog").images[0]
📜 Refer to the documentation to learn more about it.
VQGAN, proposed in Taming Transformers for High-Resolution Image Synthesis, is a crucial component in the modern generative image modeling toolbox. Once it is trained, its encoder can be leveraged to compute general-purpose tokens from input images.
Thanks to @isamu-isozaki, who contributed a script and related utilities to train VQGANs in #5483. For details, refer to the official training directory.
VideoProcessor ClassSimilar to the VaeImageProcessor class, we have introduced a VideoProcessor to help make the preprocessing and postprocessing of videos easier and a little more streamlined across the pipelines. Refer to the documentation to learn more.
Starting with this release, we provide guides and tutorials to help users get started with some of the most frequently used tasks in image and video generation. For this release, we have a series of three guides about outpainting with different techniques:
We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with SDXLCFGCutoffCallback.
import torch
from diffusers import DiffusionPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback
callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
pipeline = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution"
out = pipeline(
prompt=prompt,
num_inference_steps=25,
callback_on_step_end=callback,
)
Read more on our documentation 📜
from_pipe APIStarting with this release note, we will highlight the new community pipelines! More and more of our pipelines were added as community pipelines first and graduated as official pipelines once people started to use them a lot! We do not require community pipelines to follow diffusers’ coding style, so it is the easiest way to contribute to diffusers 😊
We also introduced a from_pipe API that’s very useful for the community pipelines that share checkpoints with our official pipelines and improve generation quality in some way:) You can use from_pipe(...) to load many community pipelines without additional memory requirements. With this API, you can easily switch between different pipelines to apply different techniques.
Read more about from_pipe API in our documentation 📃.
Here are four new community pipelines since our last release.
BoxDiff lets you use bounding box coordinates for a more controlled generation. Here is an example of how you can apply this technique on a stable diffusion pipeline you had created (i.e. pipe_sd in the below example)
pipe_box = DiffusionPipeline.from_pipe(
pipe_sd,
custom_pipeline="pipeline_stable_diffusion_boxdiff",
)
pipe_box.enable_model_cpu_offload()
phrases = ["aurora","reindeer","meadow","lake","mountain"]
boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]
boxes = [[x / 512 for x in box] for box in boxes]
generator = torch.Generator(device="cpu").manual_seed(42)
images = pipe_box(
prompt,
boxdiff_phrases=phrases,
boxdiff_boxes=boxes,
boxdiff_kwargs={
"attention_res": 16,
"normalize_eot": True
},
num_inference_steps=50,
generator=generator,
).images
Check out this community pipeline here
HD-Painter can enhance inpainting pipelines with improved prompt faithfulness and generate higher resolution (up to 2k). You can switch from BoxDiff to HD-Painter like this
pipe = DiffusionPipeline.from_pipe(
pipe_box,
custom_pipeline="hd_painter"
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
prompt = "wooden boat"
init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
mask_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/masks/2.png")
image = pipe (prompt, init_image, mask_image, use_rasg = True, use_painta = True, generator=torch.manual_seed(12345)).images[0]
Check out this community pipeline here
Differential Diffusion enables customization of the amount of change per pixel or per image region. It’s very effective in inpainting and outpainting.
pipeline = DiffusionPipeline.from_pipe(
pipe_sdxl,
custom_pipeline="pipeline_stable_diffusion_xl_differential_img2img",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
prompt = "a green pear"
negative_prompt = "blurry"
image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=7.5,
num_inference_steps=25,
original_image=image,
image=image,
strength=1.0,
map=mask,
).images[0]
Check out this community pipeline here.
FRESCO aka FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation enables zero-shot video-to-video translation. Learn more about it from here.
distutils by @sayakpaul in #7455IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline by @standardAI in #7262str_to_bool definition in testing utils by @DN6 in #7461Docs] Fix typos by @standardAI in #7451test_lora_layers_peft.py by @UmerHA in #7394ConsistencyDecoderVAE by @standardAI in #7290test_lora_fuse_nan on mps by @UmerHA in #7481final_sigma_zero to UniPCMultistep by @Beinsezii in #7517time_context) by @KimbingNg in #7268from_pipe method to DiffusionPipeline by @yiyixuxu in #7241rescale_betas_zero_snr by @Beinsezii in #7531test_freeu_enabled on MPS by @UmerHA in #7570transformer_2d forward logic into meaningful conditions. by @sayakpaul in #7489libsndfile1-dev and libgl1 from workflows by @sayakpaul in #7543device_map support to pipelines by @sayakpaul in #6857logger.warn with logger.warning by @Sai-Suraj-27 in #7643is_cosxl_edit arg in SDXL ip2p. by @sayakpaul in #7650optimization by @WentianZhang-ML in #7639ruff configuration to avoid deprecated configuration warning by @Sai-Suraj-27 in #7637optimization. by @WentianZhang-ML in #7698type annotations for compatability with python 3.8 by @Sai-Suraj-27 in #7648@classmethod by @Sai-Suraj-27 in #7653ModelMixin by @sayakpaul in #6396is_sequential_cpu_offload by @yiyixuxu in #7788resume_download deprecation by @Wauplin in #7843from_single_file logic with from_pretrained by @DN6 in #7496_optional_components in StableCascadeCombinedPipeline by @yiyixuxu in #7894timesteps and sigmas by @yiyixuxu in #7817contributing.md file by @Sai-Suraj-27 in #7638save_pretrained logic for compatibility by @rebel-kblee in #7821diffusers-cli env by @standardAI in #7403added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline by @detkov in #7924isinstance calls by @Sai-Suraj-27 in #7710cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline by @AlexeyZhuravlev in #7961docstrings according to the Google Style Guide by @Sai-Suraj-27 in #7717freedesktop_os_release() in diffusers cli for Python >=3.10 by @DN6 in #8235resume_download deprecation V2 by @Wauplin in #8267from_single_file docs by @DN6 in #8268raise messages by @standardAI in #8272The following contributors have made significant changes to the library over the last release:
IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)Docs] Fix typos (#7451)ConsistencyDecoderVAE (#7290)diffusers-cli env (#7403)raise messages (#8272)test_lora_layers_peft.py (#7394)test_lora_fuse_nan on mps (#7481)test_freeu_enabled on MPS (#7570)Fetched April 7, 2026