v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter
Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image
Many additional cool features are released:
Refer to the documentation to know more.
When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:
Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.
The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:
StableDiffusionXLControlNetPipelineStableDiffusionXLInstructPix2PixPipelineThe ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.
Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.
We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:
Here is how to use one:
from diffusers import AutoPipelineForTextToImage
import torch
pipe_t2i = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")
prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")
Without any extra memory, you can then switch to Image-to-Image
from diffusers import AutoPipelineForImageToImage
pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)
image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")
Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.
Refer to the documentation to know more.
We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:
from diffusers import AutoPipelineForTextToImage
import torch
pipe = AutoPipelineForTextToImage.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0]
image.save("image.png")
The following pipelines, which can be accessed via the "Auto" pipelines were added:
To know more, check out the following pages:
NOW: mask_image repaints white pixels and preserves black pixels.
Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.
Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:
# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)
# For PyTorch and Numpy input
mask = 1 - mask
Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:
from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
def download_image(url: str) -> Image.Image:
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")
Refer to the documentation to know more.
Thanks to @cross-attention for contributing this model in #3956.
We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.
Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.
pip install matplotlib
from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline
def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
"""Converts a depth map to a color image.
Args:
value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
vmax (float, optional): vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.
Returns:
numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
"""
if isinstance(value, torch.Tensor):
value = value.detach().cpu().numpy()
value = value.squeeze()
if invalid_mask is None:
invalid_mask = value == invalid_val
mask = np.logical_not(invalid_mask)
# normalize
vmin = np.percentile(value[mask],2) if vmin is None else vmin
vmax = np.percentile(value[mask],85) if vmax is None else vmax
if vmin != vmax:
value = (value - vmin) / (vmax - vmin) # vmin..vmax
else:
# Avoid 0-division
value = value * 0.
# squeeze last dim if it exists
# grey out the invalid values
value[invalid_mask] = np.nan
cmapper = matplotlib.cm.get_cmap(cmap)
if value_transform:
value = value_transform(value)
# value = value / value.max()
value = cmapper(value, bytes=True) # (nxmx4)
img = value[...]
img[invalid_mask] = background_color
if gamma_corrected:
img = img / 255
img = np.power(img, 2.2)
img = img * 255
img = img.astype(np.uint8)
return img
model = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)
img = Image.open('./images/zoedepth_in.png')
out = model.infer_pil(img)
zoedepth_image = Image.fromarray(colorize(out)).convert('RGB')
zoedepth_image.save('images/zoedepth.png')
adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_zoedepth_sd15v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)
pipe.to('cuda')
zoedepth_image_out = pipe(prompt="motorcycle", image=zoedepth_image).images[0]
zoedepth_image_out.save('images/zoedepth_out.png')
num_processes. by @eliphatfs in #3983noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156network_alpha when loading unet lora from old format by @Jackmin801 in #4221prompt embeds in sdxl by @xiaohu2015 in #4099The following contributors have made significant changes to the library over the last release:
Fetched April 7, 2026