v0.27.0: Stable Cascade, Playground v2.5, EDM-style training, IP-Adapter image embeds, and more
We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.
from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
import torch
prior = StableCascadePriorPipeline.from_pretrained(
"stabilityai/stable-cascade-prior",
torch_dtype=torch.bfloat16,
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_emb = prior(prompt=prompt).image_embeddings[0]
decoder = StableCascadeDecoderPipeline.from_pretrained(
"stabilityai/stable-cascade",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(image_embeddings=image_emb, prompt=prompt).images[0]
image
📜 Check out the docs here to know more about the model.
Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.
PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2.5-1024px-aesthetic",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
image
Loading from the original single-file checkpoint is also supported:
from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler
import torch
url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(url)
pipeline.to(device="cuda", dtype=torch.float16)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipeline(prompt=prompt, guidance_scale=3.0).images[0]
image.save("playground_test_image.png")
You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic" \
--instance_data_dir="dog" \
--output_dir="dog-playground-lora" \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--use_8bit_adam \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
To know more, follow the instructions here.
EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.
To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗
If you’re interested in extending this formulation to other training scripts, we refer you to this PR.
To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.
Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.
This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()
prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]

📜 Check out the docs here to know more about TCD.
Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.
All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.
We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.
📜 To know about the exact usage of both of the above, refer to our official guide.
We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.
Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.
Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.
📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.
We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization. To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.
The code snippet below shows a usage:
import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
base_model_id,
vae=vae,
torch_dtype=torch.float16
).to(device)
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)
_ = pipe.invert(
image = image,
num_inversion_steps=50,
skip=0.2
)
edited_image = pipe(
editing_prompt=["tennis ball","tomato"],
reverse_editing_direction=[True,False],
edit_guidance_scale=[5.0,10.0],
edit_threshold=[0.9,0.85],)
<table>
<tr>
<td><img src="https://github.com/huggingface/diffusers/assets/22957388/9f914400-a4e4-4fc5-a27a-150b3212991f" alt="Tennis ball"></td>
<td><img src="https://github.com/huggingface/diffusers/assets/22957388/be4cb116-17b8-4293-9216-60ab6f3a819d" alt="Tomato ball"></td>
</tr>
</table>
📜 Check out the docs here to learn more about LEDITS++.
Thanks to @manuelbrack for contributing this in #6074.
config_file argument to ControlNetModel when using from_single_file by @DN6 in #6959PEFT / docs] Add a note about torch.compile by @younesbelkada in #6864strength parameter in Controlnet_img2img pipelines by @tlpss in #6951torch_dtype to set_module_tensor_to_device by @yiyixuxu in #6994load_model_dict_into_meta for ControlNet from_single_file by @DN6 in #7034disable_full_determinism from StableVideoDiffusion xformers test. by @DN6 in #7039Refactor] save_model_card function in text_to_image examples by @standardAI in #7051Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline by @standardAI in #7071PEFT / Core] Copy the state dict when passing it to load_lora_weights by @younesbelkada in #7058uv in the Dockerfiles. by @sayakpaul in #7094Docs] Fix typos by @standardAI in #7118rescale_betas_zero_snr by @Beinsezii in #7097Docs] Fix typos by @standardAI in #7131prepare_ip_adapter_image_embeds and skip load image_encoder by @yiyixuxu in #7016uv version for now and a minor change in the Slack notification by @sayakpaul in #7155torch.compile by @sayakpaul in #7161callback_on_step_end for StableDiffusionLDM3DPipeline by @rootonchair in #7149from_config by @yiyixuxu in #7192StableVideoDiffusionPipeline by @JinayJain in #7143denoising_end parameter to ControlNetPipeline for SDXL by @UmerHA in #6175depth_colored with color_map=None by @qqii in #7170return_dict and minor doc updates by @a-r-r-o-w in #7105export_to_video default by @DN6 in #6990logger.warning by @sayakpaul in #7289from_single_file by @DN6 in #7282UNet2DConditionModel documentation by @alexanderbonnet in #7291The following contributors have made significant changes to the library over the last release:
callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)Refactor] save_model_card function in text_to_image examples (#7051)Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)Docs] Fix typos (#7118)Docs] Fix typos (#7131)return_dict and minor doc updates (#7105)Fetched April 7, 2026