Hugging Face/Diffusers

Releases2Avg0/wkVersionsv0.37.0 → v0.37.1

Patch Release: Fix LoRA attention processor for xformers.

[LoRA, Xformers] Fix xformers lora by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/5201

Patch Release: CPU offloading + Lora load/Text inv load & Multi Adapter

[Textual inversion] Refactor textual inversion to make it cleaner by @patrickvonplaten in #5076
t2i Adapter community member fix by @williamberman in #5090
remove unused adapter weights in constructor by @williamberman in #5088
[LoRA] don't break offloading for incompatible lora ckpts. by @sayakpaul in #5085

Patch Release v0.21.1: Fix import and config loading for `from_single_file`

Fix model offload bug when key isn't present by @DN6 in #5030
[Import] Don't force transformers to be installed by @patrickvonplaten in #5035
allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
[Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036

v0.21.0: Würstchen, Faster LoRA loading, Faster imports, T2I Adapters for SDXL, and more

Würstchen

Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.

Here is how to use the Würstchen as a pipeline:

import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS

pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(
	caption,
	height=1024,
	width=1536,
	prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
	prior_guidance_scale=4.0,
	num_images_per_prompt=4,
).images

To learn more about the pipeline, check out the official documentation.

This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.

👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen

T2I Adapters for Stable Diffusion XL (SDXL)

T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.

Below is an how to use the StableDiffusionXLAdapterPipeline.

First ensure, the controlnet_aux is installed:

pip install -U controlnet_aux==0.0.7

Then we can initialize the pipeline:

import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
                       StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid

# load adapter
adapter = T2IAdapter.from_pretrained(
    "TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
    model_id, subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id,
    vae=vae,
    adapter=adapter,
    scheduler=euler_a,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

# load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")

We then load an image to compute the lineart conditionings:

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)

Then we generate:

prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5,
).images[0]

Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline.

This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.

We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.

Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline in #4696.

Faster imports

We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines, models, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers.

With lazy imports:

real    0m0.417s
user    0m0.714s
sys     0m0.499s

Without lazy imports:

real    0m5.391s
user    0m5.299s
sys     0m1.273s

Faster LoRA loading

Previously, loading LoRA parameters using the load_lora_weights() used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage argument to the load_lora_weights() method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True to take the benefits.

LoRA fusing

LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.

For more information, have a look at the documentation and the original PR: https://github.com/huggingface/diffusers/pull/4473.

More support for LoRAs

Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.

All commits

fix: lora sdxl tests by @sayakpaul in #4652
Support tiled encode/decode for AutoencoderTiny by @Isotr0py in #4627
Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
add config_file to from_single_file by @zuojianghua in #4614
Add AudioLDM 2 by @sanchit-gandhi in #4549
[docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
[Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
[LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
[LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
Replaces DIFFUSERS_TEST_DEVICE backend list with trying device by @vvvm23 in #4673
add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
Add reference_attn & reference_adain support for sdxl by @zideliu in #4502
[Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
Fix all docs by @patrickvonplaten in #4721
fix bad error message when transformers is missing by @patrickvonplaten in #4714
Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
[Examples] fix checkpointing and casting bugs in train_text_to_image_lora_sdxl.py by @sayakpaul in #4632
[AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
[docs] add variant="fp16" flag by @realliujiaxu in #4678
[AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
fix dummy import for AudioLDM2 by @patil-suraj in #4741
change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
add a step_index counter by @yiyixuxu in #4347
[AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
[Tests] Fix paint by example by @patrickvonplaten in #4761
[fix] multi t2i adapter set total_downscale_factor by @williamberman in #4621
[Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
[LoRA] relax lora loading logic by @sayakpaul in #4610
[Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
fix sdxl_lwp empty neg_prompt error issue by @xhinker in #4743
improve setup.py by @sayakpaul in #4748
Torch device by @patrickvonplaten in #4755
[AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
Convert MusicLDM by @sanchit-gandhi in #4579
[WIP ] Proposal to address precision issues in CI by @DN6 in #4775
fix a bug in from_pretrained when load optional components by @yiyixuxu in #4745
fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
instance_prompt->class_prompt by @williamberman in #4784
refactor prepare_mask_and_masked_image with VaeImageProcessor by @yiyixuxu in #4444
Allow passing a checkpoint state_dict to convert_from_ckpt (instead of just a string path) by @cmdr2 in #4653
[SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
[Core] Support negative conditions in SDXL by @sayakpaul in #4774
Unet fix by @canberk17 in #4769
[Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
[docs] Fix syntax for compel by @stevhliu in #4794
[Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
[SDXL Lora] Fix last ben sdxl lora by @patrickvonplaten in #4797
[LoRA Attn Processors] Refactor LoRA Attn Processors by @patrickvonplaten in #4765
Update loaders.py by @chillpixelfun in #4805
[WIP] Add Fabric by @shauray8 in #4201
Fix save_path bug in textual inversion training script by @Yead in #4710
[Examples] Save SDXL LoRA weights with chosen precision by @mnslarcher in #4791
Fix Disentangle ONNX and non-ONNX pipeline by @DN6 in #4656
fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by @yiyixuxu in #4799
fix auto_pipeline: pass kwargs to load_config by @yiyixuxu in #4793
add StableDiffusionXLControlNetImg2ImgPipeline by @yiyixuxu in #4592
add models for T2I-Adapter-XL by @MC-E in #4696
Fuse loras by @patrickvonplaten in #4473
Fix convert_original_stable_diffusion_to_diffusers script by @wingrime in #4817
Support saving multiple t2i adapter models under one checkpoint by @VitjanZ in #4798
fix typo by @zideliu in #4822
VaeImageProcessor: Allow image resizing also for torch and numpy inputs by @gajendr-nikhil in #4832
[Core] refactor encode_prompt by @sayakpaul in #4617
Add loading ckpt from file for SDXL controlNet by @antigp in #4683
Fix Unfuse Lora by @patrickvonplaten in #4833
sketch inpaint from a1111 for non-inpaint models by @noskill in #4824
[docs] SDXL by @stevhliu in #4428
[Docs] improve the LoRA doc. by @sayakpaul in #4838
Fix potential type mismatch errors in SDXL pipelines by @hyk1996 in #4796
Fix image processor inputs width by @echarlaix in #4853
Remove warn with deprecate by @patrickvonplaten in #4850
[docs] ControlNet guide by @stevhliu in #4640
[SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
fix sdxl-inpaint fast test by @yiyixuxu in #4859
[docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by @dg845 in #4536
Add GLIGEN Text Image implementation by @tuanh123789 in #4777
Test Cleanup Precision issues by @DN6 in #4812
Fix link from API to using-diffusers by @pcuenca in #4856
[Docs] Korean translation update by @Snailpong in #4684
fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by @yiyixuxu in #4862
support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by @yiyixuxu in #4861
[WIP] masked_latent_inputs for inpainting pipeline by @yiyixuxu in #4819
[docs] DiffEdit guide by @stevhliu in #4722
[docs] Shap-E guide by @stevhliu in #4700
[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by @harutatsuakiyama in #4694
[Tests] Add combined pipeline tests by @patrickvonplaten in #4869
Retrieval Augmented Diffusion Models by @isamu-isozaki in #3297
check for unet_lora_layers in sdxl pipeline's save_lora_weights method by @ErwannMillon in #4821
Fix get_dummy_inputs for Stable Diffusion Inpaint Tests by @dg845 in #4845
allow passing components to connected pipelines when use the combined pipeline by @yiyixuxu in #4883
[Core] LoRA improvements pt. 3 by @sayakpaul in #4842
Add dropout parameter to UNet2DModel/UNet2DConditionModel by @dg845 in #4882
[Core] better support offloading when side loading is enabled. by @sayakpaul in #4855
Add --vae_precision option to the SDXL pix2pix script so that we have… by @bghira in #4881
[Test] Reduce CPU memory by @patrickvonplaten in #4897
fix a bug in StableDiffusionUpscalePipeline.run_safety_checker by @yiyixuxu in #4886
remove latent input for kandinsky prior_emb2emb pipeline by @yiyixuxu in #4887
[docs] Add stronger warning for SDXL height/width by @stevhliu in #4867
[Docs] add doc entry to explain lora fusion and use of different scales. by @sayakpaul in #4893
[Textual inversion] Relax loading textual inversion by @patrickvonplaten in #4903
[docs] Fix typo in Inpainting force unmasked area unchanged example by @dg845 in #4910
Würstchen model by @kashif in #3849
[InstructPix2Pix] Fix pipeline implementation and add docs by @sayakpaul in #4844
[StableDiffusionXLAdapterPipeline] add adapter_conditioning_factor by @patil-suraj in #4937
[StableDiffusionXLAdapterPipeline] allow negative micro conds by @patil-suraj in #4941
[examples] T2IAdapter training script by @patil-suraj in #4934
[Tests] add: tests for t2i adapter training. by @sayakpaul in #4947
guard save model hooks to only execute on main process by @williamberman in #4929
[Docs] add t2i adapter entry to overview of training scripts. by @sayakpaul in #4946
Temp Revert "[Core] better support offloading when side loading is enabled… by @williamberman in #4927
Revert revert and install accelerate main by @williamberman in #4963
[Docs] fix: minor formatting in the Würstchen docs by @sayakpaul in #4965
Lazy Import for Diffusers by @DN6 in #4829
[Core] Remove TF import checks by @patrickvonplaten in #4968
Make sure Flax pipelines can be loaded into PyTorch by @patrickvonplaten in #4971
Update README.md by @patrickvonplaten in #4973
Wuerstchen fixes by @kashif in #4942
Refactor model offload by @patrickvonplaten in #4514
[Bug Fix] Should pass the dtype instead of torch_dtype by @zhiqiang-canva in #4917
[Utils] Correct custom init sort by @patrickvonplaten in #4967
remove extra gligen in import by @DN6 in #4987
fix E721 Do not compare types, use isinstance() by @kashif in #4992
[Wuerstchen] fix combined pipeline's num_images_per_prompt by @kashif in #4989
fix image variation slow test by @DN6 in #4995
fix custom diffusion tests by @DN6 in #4996
[Lora] Speed up lora loading by @patrickvonplaten in #4994
[docs] Fix DiffusionPipeline.enable_sequential_cpu_offload docstring by @dg845 in #4952
Fix safety checker seq offload by @patrickvonplaten in #4998
Fix PR template by @stevhliu in #4984
examples fix t2i training by @patrickvonplaten in #5001

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@xhinker
- Add SDXL long weighted prompt pipeline (replace pr:4629) (#4661)
- fix sdxl_lwp empty neg_prompt error issue (#4743)
@zideliu
- Add reference_attn & reference_adain support for sdxl (#4502)
- fix typo (#4822)
@shauray8
- [WIP] Add Fabric (#4201)
@MC-E
- add models for T2I-Adapter-XL (#4696)
@tuanh123789
- Add GLIGEN Text Image implementation (#4777)
@Snailpong
- [Docs] Korean translation update (#4684)
@harutatsuakiyama
- [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (#4694)
@isamu-isozaki
- Retrieval Augmented Diffusion Models (#3297)

Patch Release 0.20.2 - Correct SDXL Inpaint Strength Default

Stable Diffusion XL's strength default was accidentally set to 1.0 when creating the pipeline. The default should be set to 0.9999 instead. This patch release fixes that.

All commits

[SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858

Patch Release: Fix `torch.compile()` support for ControlNets

https://github.com/huggingface/diffusers/commit/3eb498e7b4868bca7460d41cda52d33c3ede5502#r125606630 introduced a 🐛 that broke the torch.compile() support for ControlNets. This patch release fixes that.

All commits

[Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
[Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795

v0.20.0: SDXL ControlNets with MultiControlNet, GLIGEN, Tiny Autoencoder, SDXL DreamBooth LoRA in free-tier Colab, and more

SDXL ControlNets 🚀

The 🧨 diffusers team has trained two ControlNets on Stable Diffusion XL (SDXL):

Canny (diffusers/controlnet-canny-sdxl-1.0)
Depth (diffusers/controlnet-depth-sdxl-1.0)

You can find all the SDXL ControlNet checkpoints here, including some smaller ones (5 to 7x smaller).

To know more about how to use these ControlNets to perform inference, check out the respective model cards and the documentation. To train custom SDXL ControlNets, you can try out our training script.

MultiControlNet for SDXL

This release also introduces support for combining multiple ControlNets trained on SDXL and performing inference with them. Refer to the documentation to learn more.

GLIGEN

The GLIGEN model was developed by researchers and engineers from University of Wisconsin-Madison, Columbia University, and Microsoft. The StableDiffusionGLIGENPipeline can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes, if input images are given, this pipeline can insert objects described by text at the region defined by bounding boxes. Otherwise, it’ll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It’s trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.

(GIF from the official website)

Grounded inpainting

import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image

# Insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
    "masterful/gligen-1-4-inpainting-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

input_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gligen/livingroom_modern.png"
)
prompt = "a birthday cake"
boxes = [[0.2676, 0.6088, 0.4773, 0.7183]]
phrases = ["a birthday cake"]

images = pipe(
    prompt=prompt,
    gligen_phrases=phrases,
    gligen_inpaint_image=input_image,
    gligen_boxes=boxes,
    gligen_scheduled_sampling_beta=1,
    output_type="pil",
    num_inference_steps=50,
).images
images[0].save("./gligen-1-4-inpainting-text-box.jpg")

Grounded generation

import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image

# Generate an image described by the prompt and
# insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
    "masterful/gligen-1-4-generation-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage"
boxes = [[0.1387, 0.2051, 0.4277, 0.7090], [0.4980, 0.4355, 0.8516, 0.7266]]
phrases = ["a waterfall", "a modern high speed train running through the tunnel"]

images = pipe(
    prompt=prompt,
    gligen_phrases=phrases,
    gligen_boxes=boxes,
    gligen_scheduled_sampling_beta=1,
    output_type="pil",
    num_inference_steps=50,
).images
images[0].save("./gligen-1-4-generation-text-box.jpg")

Refer to the documentation to learn more.

Thanks to @nikhil-masterful for contributing GLIGEN in #4441.

Tiny Autoencoder

@madebyollin trained two Autoencoders (on Stable Diffusion and Stable Diffusion XL, respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use AutoencoderTiny to take advantage of it.

Here’s the example usage for Stable Diffusion:

import torch
from diffusers import DiffusionPipeline, AutoencoderTiny

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")

Refer to the documentation to learn more. Refer to this material to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like LoRA, fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.

Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via bitsandbytes), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.

Check out the Colab Notebook to learn more.

Thanks to @ethansmith2000 for improving the gradient checkpointing support in #4474.

Support of `push_to_hub` for models, schedulers, and pipelines

Our models, schedulers, and pipelines now support an option of push_to_hub via the save_pretrained() and also come with a push_to_hub() method. Below are some examples of usage.

Models

from diffusers import ControlNetModel

controlnet = ControlNetModel(
    block_out_channels=(32, 64),
    layers_per_block=2,
    in_channels=4,
    down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
    cross_attention_dim=32,
    conditioning_embedding_out_channels=(16, 32),
)
controlnet.push_to_hub("my-controlnet-model")
# or controlnet.save_pretrained("my-controlnet-model", push_to_hub=True)

Schedulers

from diffusers import DDIMScheduler

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
)
scheduler.push_to_hub("my-controlnet-scheduler")

Pipelines

from diffusers import (
    UNet2DConditionModel,
    AutoencoderKL,
    DDIMScheduler,
    StableDiffusionPipeline,
)
from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer

unet = UNet2DConditionModel(
    block_out_channels=(32, 64),
    layers_per_block=2,
    sample_size=32,
    in_channels=4,
    out_channels=4,
    down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
    up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
    cross_attention_dim=32,
)

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
)

vae = AutoencoderKL(
    block_out_channels=[32, 64],
    in_channels=3,
    out_channels=3,
    down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
    up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
    latent_channels=4,
)

text_encoder_config = CLIPTextConfig(
    bos_token_id=0,
    eos_token_id=2,
    hidden_size=32,
    intermediate_size=37,
    layer_norm_eps=1e-05,
    num_attention_heads=4,
    num_hidden_layers=5,
    pad_token_id=1,
    vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")

components = {
    "unet": unet,
    "scheduler": scheduler,
    "vae": vae,
    "text_encoder": text_encoder,
    "tokenizer": tokenizer,
    "safety_checker": None,
    "feature_extractor": None,
}
pipeline = StableDiffusionPipeline(**components)
pipeline.push_to_hub("my-pipeline")

Refer to the documentation to know more.

Thanks to @Wauplin for his generous and constructive feedback (refer to this #4218) on this feature.

Better support for loading Kohya-trained LoRA checkpoints

Providing seamless support for loading Kohya-trained LoRA checkpoints from diffusers is important for us. This is why we continue to improve our load_lora_weights() method. Check out the documentation to know more about what’s currently supported and the current limitations.

Thanks to @isidentical for extending their help in improving this support.

Better documentation for prompt weighting

Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. compel provides an easy way to do prompt weighting compatible with diffusers. To this end, we have worked on an improved guide. Check it out here.

Defaulting to serialize with `.safetensors`

Starting with this release, we will default to using .safetensors as our preferred serialization method. This change is reflected in all the training examples that we officially support.

All commits

0.20.0dev0 by @patrickvonplaten in #4299
update Kandinsky doc by @yiyixuxu in #4301
[Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
Fix SDXL conversion from original to diffusers by @duongna21 in #4280
fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
[Local loading] Correct bug with local files only by @patrickvonplaten in #4318
Fix typo documentation by @echarlaix in #4320
fix validation option for dreambooth training example by @xinyangli in #4317
[Tests] add test for pipeline import. by @sayakpaul in #4276
Honor the SDXL 1.0 licensing from the training scripts. by @sayakpaul in #4319
Update README_sdxl.md to correct the header by @sayakpaul in #4330
[SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
correct doc string for default value of guidance_scale by @Tanupriya-Singh in #4339
[ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
Fix repeat of negative prompt by @kathath in #4335
[SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
[Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
fix fp type in t2i adapter docs by @williamberman in #4350
Update README.md to have PyPI-friendly path by @sayakpaul in #4351
[SDXL-IP2P] Add gif for demonstrating training processes by @harutatsuakiyama in #4342
[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in #4370
Clean up duplicate lines in encode_prompt by @avoroshilov in #4369
minor doc fixes. by @sayakpaul in #4380
Update docs of unet_1d.py by @nishant42491 in #4394
[AutoPipeline] Correct naming by @patrickvonplaten in #4420
[ldm3d] documentation fixing typos by @estelleafl in #4284
Cleanup pass for flaky Slow Tests for Stable diffusion by @DN6 in #4415
support from_single_file for SDXL inpainting by @yiyixuxu in #4408
fix test_float16_inference by @yiyixuxu in #4412
train dreambooth fix pre encode class prompt by @williamberman in #4395
[docs] Fix SDXL docstring by @stevhliu in #4397
Update documentation by @echarlaix in #4422
remove mentions of textual inversion from sdxl. by @sayakpaul in #4404
[LoRA] Fix SDXL text encoder LoRAs by @sayakpaul in #4371
[docs] AutoPipeline tutorial by @stevhliu in #4273
[Pipelines] Add community pipeline for Zero123 by @kxhit in #4295
[Feat] add tiny Autoencoder for (almost) instant decoding by @sayakpaul in #4384
can call encode_prompt with out setting a text encoder instance variable by @williamberman in #4396
Accept pooled_prompt_embeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by @cmdr2 in #4309
Prevent online access when desired when using download_from_original_stable_diffusion_ckpt by @w4ffl35 in #4271
move tests to nightly by @DN6 in #4451
auto type conversion by @isNeil in #4270
Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by @Georgehe4 in #4454
Add rank argument to train_dreambooth_lora_sdxl.py by @levi in #4343
[docs] Distilled SD by @stevhliu in #4442
Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by @cmdr2 in #4298
fix typo to ensure make test-examples work correctly by @statelesshz in #4329
Fix bug caused by typo by @HeliosZhao in #4357
Delete the duplicate code for the contolnet img 2 img by @VV-A-VV in #4411
Support different strength for Stable Diffusion TensorRT Inpainting pipeline by @jinwonkim93 in #4216
add sdxl to prompt weighting by @patrickvonplaten in #4439
a few fix for kandinsky combined pipeline by @yiyixuxu in #4352
fix-format by @yiyixuxu in #4458
Cleanup Pass on flaky slow tests for Stable Diffusion by @DN6 in #4455
Fixed multi-token textual inversion training by @manosplitsis in #4452
TensorRT Inpaint pipeline: minor fixes by @asfiyab-nvidia in #4457
[Tests] Adds integration tests for SDXL LoRAs by @sayakpaul in #4462
Update README_sdxl.md by @patrickvonplaten in #4472
[SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by @patrickvonplaten in #4470
Add a data_dir parameter to the load_dataset method. by @AisingioroHao0 in #4482
[Examples] Support train_text_to_image_lora_sdxl.py by @okotaku in #4365
Log global_step instead of epoch to tensorboard by @mrlzla in #4493
Update lora.md to clarify SDXL support by @sayakpaul in #4503
[SDXL LoRA] fix batch size lora by @patrickvonplaten in #4509
Make sure fp16-fix is used as default by @patrickvonplaten in #4510
grad checkpointing by @ethansmith2000 in #4474
move pipeline only when running validation by @patrickvonplaten in #4515
Moving certain pipelines slow tests to nightly by @DN6 in #4469
add pipeline_class_name argument to Stable Diffusion conversion script by @yiyixuxu in #4461
Fix misc typos by @Georgehe4 in #4479
fix indexing issue in sd reference pipeline by @DN6 in #4531
Copy lora functions to XLPipelines by @wooyeolBaek in #4512
introduce minimalistic reimplementation of SDXL on the SDXL doc by @cloneofsimo in #4532
Fix push_to_hub in train_text_to_image_lora_sdxl.py example by @ra100 in #4535
Update README_sdxl.md to include the free-tier Colab Notebook by @sayakpaul in #4540
Changed code that converts tensors to PIL images in the write_your_own_pipeline notebook by @jere357 in #4489
Move slow tests to nightly by @DN6 in #4526
pin ruff version for quality checks by @DN6 in #4539
[docs] Clean scheduler api by @stevhliu in #4204
Move controlnet load local tests to nightly by @DN6 in #4543
Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by @patrickvonplaten in #4548
fix some typo error by @VV-A-VV in #4546
improve controlnet sdxl docs now that we have a good checkpoint. by @sayakpaul in #4556
[Doc] update sdxl-controlnet repo name by @yiyixuxu in #4564
[docs] Expand prompt weighting by @stevhliu in #4516
[docs] Remove attention slicing by @stevhliu in #4518
[docs] Add safetensors flag by @stevhliu in #4245
Convert Stable Diffusion ControlNet to TensorRT by @dotieuthien in #4465
Remove code snippets containing is_safetensors_available() by @chiral-carbon in #4521
Fixing repo_id regex validation error on windows platforms by @Mystfit in #4358
[Examples] fix: network_alpha -> network_alphas by @sayakpaul in #4572
[docs] Fix ControlNet SDXL docstring by @stevhliu in #4582
[Utility] adds an image grid utility by @sayakpaul in #4576
Fixed invalid pipeline_class_name parameter. by @AisingioroHao0 in #4590
Fix git-lfs command typo in docs by @clairefro in #4586
[Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by @sayakpaul in #4574
[Pipeline utils] feat: implement push_to_hub for standalone models, schedulers as well as pipelines by @sayakpaul in #4128
An invalid clerical error in sdxl finetune by @XDUWQ in #4608
[Docs] fix links in the controlling generation doc. by @sayakpaul in #4612
add: pushtohubmixin to pipelines and schedulers docs overview. by @sayakpaul in #4607
add: train to text image with sdxl script. by @sayakpaul in #4505
Add GLIGEN implementation by @nikhil-masterful in #4441
Update text2image.md to fix the links by @sayakpaul in #4626
Fix unipc use_karras_sigmas exception - fixes huggingface/diffusers#4580 by @reimager in #4581
[research_projects] SDXL controlnet script by @patil-suraj in #4633
[Core] feat: MultiControlNet support for SDXL ControlNet pipeline by @sayakpaul in #4597
[docs] PushToHubMixin by @stevhliu in #4622
[docs] MultiControlNet by @stevhliu in #4635
fix loading custom text encoder when using from_single_file by @DN6 in #4571
make things clear in the controlnet sdxl doc. by @sayakpaul in #4644
Fix UnboundLocalError during LoRA loading by @slessans in #4523
Support higher dimension LoRAs by @isidentical in #4625
[Safetensors] Make safetensors the default way of saving weights by @patrickvonplaten in #4235

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kxhit
- [Pipelines] Add community pipeline for Zero123 (#4295)
@okotaku
- [Examples] Support train_text_to_image_lora_sdxl.py (#4365)
@dotieuthien
- Convert Stable Diffusion ControlNet to TensorRT (#4465)
@nikhil-masterful
- Add GLIGEN implementation (#4441)

Patch release: Fix incorrect filenaming

0.19.3 is a patch release to make sure import diffusers works without transformers being installed.

It includes a fix of this issue.

All commits

[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/4370

Patch Release: Support for SDXL Kohya-style LoRAs, Fix batched inference SDXL Img2Img, Improve watermarker

We still had some bugs 🐛 in 0.19.1 some bugs, notably:

SDXL (Kohya-style) LoRA

The official SD-XL 1.0 LoRA (Kohya-styled) is now supported thanks to https://github.com/huggingface/diffusers/pull/4287. You can try it as follows:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors")
pipe.to("cuda")

prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle"
negative_prompt = "text, watermark"

image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=25).images[0]

In addition, a couple more SDXL LoRAs are now supported:

(SDXL 0.9:)

To know more details and the known limitations, please check out the documentation.

Thanks to @isidentical for their sincere help in the PR.

Batched inference

@bghira found that for SDXL Img2Img batched inference led to weird artifacts. That is fixed in: https://github.com/huggingface/diffusers/pull/4327.

Downloads

Under some circumstances SD-XL 1.0 can download ONNX weights which is corrected in https://github.com/huggingface/diffusers/pull/4338.

Improved SDXL behavior

https://github.com/huggingface/diffusers/pull/4346 allows the user to disable the watermarker under certain circumstances to improve the usability of SDXL.

All commits:

[SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
[ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
[SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
[Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287

Patch Release: Fix torch compile and local_files_only

In 0.19.0 some bugs :bug: found their way into the release. We're very sorry about this :pray:

This patch releases fixes all of them.

All commits

update Kandinsky doc by @yiyixuxu in #4301
[Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
Fix SDXL conversion from original to diffusers by @duongna21 in #4280
fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
[Local loading] Correct bug with local files only by @patrickvonplaten in #4318
Release: v0.19.1 by @patrickvonplaten (direct commit on v0.19.1-patch)

v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter

SDXL 1.0

Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image

Many additional cool features are released:

Pipelines for
- Img2Img
- Inpainting
Torch compile support
Model offloading
Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter

Refer to the documentation to know more.

New training scripts for SDXL

When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:

Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.

New pipelines for SDXL

The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:

StableDiffusionXLControlNetPipeline
StableDiffusionXLInstructPix2PixPipeline

The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.

Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.

The AutoPipeline API

We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:

Here is how to use one:

from diffusers import AutoPipelineForTextToImage
import torch

pipe_t2i = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")

prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")

Without any extra memory, you can then switch to Image-to-Image

from diffusers import AutoPipelineForImageToImage

pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)

image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")

Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.

Refer to the documentation to know more.

A new “combined pipeline” for the Kandinsky series

We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:

from diffusers import AutoPipelineForTextToImage
import torch

pipe = AutoPipelineForTextToImage.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0] 
image.save("image.png")

The following pipelines, which can be accessed via the "Auto" pipelines were added:

To know more, check out the following pages:

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels.

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)

# For PyTorch and Numpy input
mask = 1 - mask

Asymmetric VQGAN

Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:

from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline

def download_image(url: str) -> Image.Image:
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")

image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")

Refer to the documentation to know more.

Thanks to @cross-attention for contributing this model in #3956.

Improved support for loading Kohya-style LoRA checkpoints

We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.

Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.

T2I Adapter

pip install matplotlib

from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline

def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
    """Converts a depth map to a color image.

    Args:
        value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
        vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
        vmax (float, optional):  vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
        cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
        invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
        invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
        background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
        gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
        value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.

    Returns:
        numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
    """
    if isinstance(value, torch.Tensor):
        value = value.detach().cpu().numpy()

    value = value.squeeze()
    if invalid_mask is None:
        invalid_mask = value == invalid_val
    mask = np.logical_not(invalid_mask)

    # normalize
    vmin = np.percentile(value[mask],2) if vmin is None else vmin
    vmax = np.percentile(value[mask],85) if vmax is None else vmax
    if vmin != vmax:
        value = (value - vmin) / (vmax - vmin)  # vmin..vmax
    else:
        # Avoid 0-division
        value = value * 0.

    # squeeze last dim if it exists
    # grey out the invalid values

    value[invalid_mask] = np.nan
    cmapper = matplotlib.cm.get_cmap(cmap)
    if value_transform:
        value = value_transform(value)
        # value = value / value.max()
    value = cmapper(value, bytes=True)  # (nxmx4)

    img = value[...]
    img[invalid_mask] = background_color

    if gamma_corrected:
        img = img / 255
        img = np.power(img, 2.2)
        img = img * 255
        img = img.astype(np.uint8)
    return img

model = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)

img = Image.open('./images/zoedepth_in.png')

out = model.infer_pil(img)

zoedepth_image = Image.fromarray(colorize(out)).convert('RGB')

zoedepth_image.save('images/zoedepth.png')

adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_zoedepth_sd15v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)

pipe.to('cuda')
zoedepth_image_out = pipe(prompt="motorcycle", image=zoedepth_image).images[0]

zoedepth_image_out.save('images/zoedepth_out.png')

All commits

📝 Fix broken link to models documentation by @kadirnar in #4026
move to 0.19.0dev by @patrickvonplaten in #4048
[SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
Correct sdxl docs by @patrickvonplaten in #4058
Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
Update train_unconditional.py by @hjmnbnb in #3899
Trigger CI on ci-* branches by @Wauplin in #3635
Fix kandinsky remove safety by @patrickvonplaten in #4065
Multiply lr scheduler steps by num_processes. by @eliphatfs in #3983
[Community] Implementation of the IADB community pipeline by @tchambon in #3996
add kandinsky to readme table by @yiyixuxu in #4081
[From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
fix requirement in SDXL by @killah-t-cell in #4082
fix: minor things in the SDXL docs. by @sayakpaul in #4070
[Invisible watermark] Correct version by @patrickvonplaten in #4087
[Feat] add: utility for unloading lora. by @sayakpaul in #4034
[tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
Allow low precision vae sd xl by @patrickvonplaten in #4083
[SD-XL] Add inpainting by @patrickvonplaten in #4098
[Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
[From ckpt] replace with os path join by @patrickvonplaten in #3746
[From single file] Make accelerate optional by @patrickvonplaten in #4132
add noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911
Make setup.py compatible with pipenv by @apoorvaeternity in #4121
📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
t2i pipeline by @williamberman in #3932
[Docs] Korean translation update by @Snailpong in #4022
[Enhance] Add rank in dreambooth by @okotaku in #4112
Refactor execution device & cpu offload by @patrickvonplaten in #4114
Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
[Core] add: controlnet support for SDXL by @sayakpaul in #4038
Docs/bentoml integration by @larme in #4090
Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
feat: add act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994
Add controlnet and vae from single file by @patrickvonplaten in #4084
fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
Asymmetric vqgan by @cross-attention in #3956
Shap-E: add support for mesh output by @yiyixuxu in #4062
[From single file] Make sure that controlnet stays False for from_single_file by @patrickvonplaten in #4181
[ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
remove bentoml doc in favor of blogpost by @williamberman in #4182
Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
[Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
make enable_sequential_cpu_offload more generic for third-party devices by @statelesshz in #4191
Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156
[SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
📄 Renamed File for Better Understanding by @kadirnar in #4056
[docs] Clean up pipeline apis by @stevhliu in #3905
docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
[fix] network_alpha when loading unet lora from old format by @Jackmin801 in #4221
fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
fix a bug of prompt embeds in sdxl by @xiaohu2015 in #4099
Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
[SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
[SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
[docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
[docs] Other modalities by @stevhliu in #4205
docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
[Docs] Fix from pretrained docs by @patrickvonplaten in #4240
[ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
[SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
do not pass list to accelerator.init_trackers by @williamberman in #4248
[From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
[SDXL] Improve docs by @patrickvonplaten in #4196
[draft v2] AutoPipeline by @yiyixuxu in #4138
Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
[from_single_file] Fix circular import by @patrickvonplaten in #4259
Model path for sdxl wrong in dreambooth README by @rrva in #4261
[SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
[docs] Fix image in SDXL docs by @stevhliu in #4267
[SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
[Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
Where did this 'x' come from, Elon? by @camenduru in #4277
add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
Rename by @patrickvonplaten in #4294

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Snailpong
- [Docs] Korean translation update (#4022)
@clarencechen
- Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
@cross-attention
- Asymmetric vqgan (#3956)
@AmericanPresidentJimmyCarter
- Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
@harutatsuakiyama
- [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)

Patch Release: v0.18.2

Patch release to fix:

1. torch.compile for SD-XL for certain GPUs
1. from_single_file for all SD models
1. Fix broken ONNX export
1. Fix incorrect VAE FP16 casting
1. Deprecate loading variants that don't exist

Note:

Loading any stable diffusion safetensors or ckpt with StableDiffusionPipeline.from_single_file or StableDiffusionmg2ImgIPipeline.from_single_file or StableDiffusionInpaintPipeline.from_single_file or StableDiffusionXLPipeline.from_single_file, ...

is now almost as fast as from_pretrained(...) and it's much more tested now.

All commits:

Make sure torch compile doesn't access unet config by @patrickvonplaten in #4008
[DiffusionPipeline] Deprecate not throwing error when loading non-existant variant by @patrickvonplaten in #4011
Correctly keep vae in float16 when using PyTorch 2 or xFormers by @pcuenca in #4019
minor improvements to the SDXL doc. by @sayakpaul in #3985
Remove remaining not in upscale pipeline by @pcuenca in #4020
FIX force_download in download utility by @Wauplin in #4036
Improve single loading file by @patrickvonplaten in #4041
keep _use_default_values as a list type by @oOraph in #4040

Patch Release for Stable Diffusion XL 0.9

Patch release 0.18.1: Stable Diffusion XL 0.9 Research Release

Stable Diffusion XL 0.9 is now fully supported under the SDXL 0.9 Research License license here.

Having received access to stabilityai/stable-diffusion-xl-base-0.9, you can easily use it with diffusers:

Text-to-Image

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]

Refining the image output

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
image = refiner(prompt=prompt, image=image[None, :]).images[0]

Loading single file checkpoitns / original file format

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

Memory optimization via model offloading

- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

and

- refiner.to("cuda")
+ refiner.enable_model_cpu_offload()

Speed-up inference with torch.compile

+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)

Note: If you're running the model with < torch 2.0, please make sure to run:

+pipe.enable_xformers_memory_efficient_attention()
+refiner.enable_xformers_memory_efficient_attention()

For more details have a look at the official docs.

All commits

typo in safetensors (safetenstors) by @YoraiLevi in #3976
Fix code snippet for Audio Diffusion by @osanseviero in #3987
feat: add Dropout to Flax UNet by @SauravMaheshkar in #3894
Add 'rank' parameter to Dreambooth LoRA training script by @isidentical in #3945
Don't use bare prints in a library by @cmd410 in #3991
[Tests] Fix some slow tests by @patrickvonplaten in #3989
Add sdxl prompt embeddings by @patrickvonplaten in #3995

Shap-E, Consistency Models, Video2Video

Shap-E

Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.

We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.

Text to 3D

import torch
from diffusers import ShapEPipeline
from diffusers.utils import export_to_gif

ckpt_id = "openai/shap-e"
pipe = ShapEPipeline.from_pretrained(ckpt_id).to("cuda")

guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

gif_path = export_to_gif(images[0], "cake_3d.gif")

Image to 3D

import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image

ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(ckpt_id).to("cuda")

img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"
image = load_image(img_url)

generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0

images = pipe(
    image, 
    num_images_per_prompt=batch_size, 
    generator=generator, 
    guidance_scale=guidance_scale,
    num_inference_steps=64, 
    frame_size =256, 
    output_type="pil"
).images

gif_path = export_to_gif(images[0], "burger_sampled_3d.gif")

Original image

Generated

For more details, check out the official documentation.

The model was contributed by @yiyixuxu in https://github.com/huggingface/diffusers/pull/3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")

For more details, see the official docs.

The model was contributed by our community members @dg845 and @ayushtues in https://github.com/huggingface/diffusers/pull/3492.

Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()

# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()

prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)

For more details, check out the official docs.

It was contributed by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/3900.

All commits

remove seed by @yiyixuxu in #3734
Correct Token to upload docs by @patrickvonplaten in #3744
Correct another push token by @patrickvonplaten in #3745
[Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
[Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
[documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
Text2video zero refinements by @19and99 in #3733
[Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
[MultiControlNet] Allow save and load by @patrickvonplaten in #3747
Update pipeline_flax_stable_diffusion_controlnet.py by @jfozard in #3306
update conversion script for Kandinsky unet by @yiyixuxu in #3766
[docs] Fix Colab notebook cells by @stevhliu in #3777
[Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
[Enhance] Update reference by @okotaku in #3723
Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
Fix some bad comment in training scripts by @patrickvonplaten in #3798
Added LoRA loading to StableDiffusionKDiffusionPipeline by @tripathiarpan20 in #3751
UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
feat: add PR template. by @sayakpaul in #3786
Ldm3d first PR by @estelleafl in #3668
Complete set_attn_processor for prior and vae by @patrickvonplaten in #3796
fix typo by @Isotr0py in #3800
manual check for checkpoints_total_limit instead of using accelerate by @williamberman in #3681
[train text to image] add note to loading from checkpoint by @williamberman in #3806
device map legacy attention block weight conversion by @williamberman in #3804
[docs] Zero SNR by @stevhliu in #3776
[ldm3d] Fixed small typo by @estelleafl in #3820
[Examples] Improve the model card pushed from the train_text_to_image.py script by @sayakpaul in #3810
[Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
[Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
Update control_brightness.mdx by @dqueue in #3825
Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
Add ddpm kandinsky by @yiyixuxu in #3783
[docs] More API stuff by @stevhliu in #3835
relax tol attention conversion test by @williamberman in #3842
fix: random module seeding by @sayakpaul in #3846
fix audio_diffusion tests by @teticio in #3850
Correct bad attn naming by @patrickvonplaten in #3797
[Conversion] Small fixes by @patrickvonplaten in #3848
Fix some audio tests by @patrickvonplaten in #3841
[Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
Update Habana Gaudi doc by @regisss in #3863
Add guidance start/stop by @holwech in #3770
feat: rename single-letter vars in resnet.py by @SauravMaheshkar in #3868
Fixing the global_step key not found by @VincentNeemie in #3844
Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
fix sde add noise typo by @UranusITS in #3839
[Tests] add test for checking soft dependencies. by @sayakpaul in #3847
[Enhance] Add LoRA rank args in train_text_to_image_lora by @okotaku in #3866
[docs] Model API by @stevhliu in #3562
fix/docs: Fix the broken doc links by @Aisuko in #3897
Add video img2img by @patrickvonplaten in #3900
fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
fix/doc: no import torch issue by @Aisuko in #3923
Correct controlnet out of list error by @patrickvonplaten in #3928
Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
[ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
Improve memory text to video by @patrickvonplaten in #3930
revert automatic chunking by @patrickvonplaten in #3934
avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
Fix failing np tests by @patrickvonplaten in #3942
Add timestep_spacing and steps_offset to schedulers by @pcuenca in #3947
Add Consistency Models Pipeline by @dg845 in #3492
Update consistency_models.mdx by @sayakpaul in #3961
Make UNet2DConditionOutput pickle-able by @prathikr in #3857
[Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
[Text-to-video] Add torch.compile() compatibility by @sayakpaul in #3949
[SD-XL] Add new pipelines by @patrickvonplaten in #3859
Kandinsky 2.2 by @cene555 in #3903
Add Shap-E by @yiyixuxu in #3742
disable num attenion heads by @patrickvonplaten in #3969
Improve SD XL by @patrickvonplaten in #3968
fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@estelleafl
- Ldm3d first PR (#3668)
- [ldm3d] Fixed small typo (#3820)
- [ldm3d] Update code to be functional with the new checkpoints (#3875)
@AndyShih12
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
@dg845
- Add Consistency Models Pipeline (#3492)

Patch Release: v0.17.1

Patch release to fix timestep for inpainting

Stable Diffusion Inpaint & ControlNet inpaint - Correct timestep inpaint in #3749 by @patrickvonplaten

v0.17.0 Improved LoRA, Kandinsky 2.1, Torch Compile Speed-up & More

Kandinsky 2.1

Kandinsky 2.1 inherits best practices from DALL-E 2 and Latent Diffusion while introducing some new ideas.

Installation

pip install diffusers transformers accelerate

Code example

from diffusers import DiffusionPipeline
import torch

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")

prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manual_seed(12)
image_embeds, negative_image_embeds = pipe_prior(prompt, negative_prompt, guidance_scale=1.0, generator=generator).to_tuple()

image = t2i_pipe(prompt, negative_prompt=negative_prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds).images[0]
image.save("cheeseburger_monster.png")

To learn more about the Kandinsky pipelines, and more details about speed and memory optimizations, please have a look at the docs.

Thanks @ayushtues, for helping with the integration of Kandinsky 2.1!

UniDiffuser

UniDiffuser introduces a multimodal diffusion process that is capable of handling different generation tasks using a single unified approach:

Unconditional image and text generation
Joint image-text generation
Text-to-image generation
Image-to-text generation
Image variation
Text variation

Below is an example of how to use UniDiffuser for text-to-image generation:

import torch
from diffusers import UniDiffuserPipeline

model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to("cuda")

# This mode can be inferred from the input provided to the `pipe`. 
pipe.set_text_to_image_mode()

prompt = "an elephant under the sea"
sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0).images[0]
sample.save("elephant.png")

Check out the UniDiffuser docs to know more.

UniDiffuser was added by @dg845 in this PR.

LoRA

We're happy to support the A1111 formatted CivitAI LoRA checkpoints in a limited capacity.

First, download a checkpoint. We’ll use this one for demonstration purposes.

wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors

Next, we initialize a DiffusionPipeline:

import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "gsdf/Counterfeit-V2.5", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config, use_karras_sigmas=True
)

We then load the checkpoint downloaded from CivitAI:

pipeline.load_lora_weights(".", weight_name="light_and_shadow.safetensors")

(If you’re loading a checkpoint in the safetensors format, please ensure you have safetensors installed.)

And then it’s time for running inference:

prompt = "masterpiece, best quality, 1girl, at dusk"
negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), "
                   "bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")

images = pipeline(prompt=prompt, 
    negative_prompt=negative_prompt, 
    width=512, 
    height=768, 
    num_inference_steps=15, 
    num_images_per_prompt=4,
    generator=torch.manual_seed(0)
).images

Below is a comparison between the LoRA and the non-LoRA results:

Check out the docs to learn more.

Thanks to @takuma104 for contributing this feature via this PR.

Torch 2.0 Compile Speed-up

We introduced Torch 2.0 support for computing attention efficiently in 0.13.0. Since then, we have made a number of improvements to ensure the number of "graph breaks" in our models is reduced so that the models can be compiled with torch.compile(). As a result, we are happy to report massive improvements in the inference speed of our most popular pipelines. Check out this doc to know more.

Thanks to @Chillee for helping us with this. Thanks to @patrickvonplaten for fixing the problems stemming from "graph breaks" in this PR.

VAE pre-processing

We added a Vae Image processor class that provides a unified API for pipelines to prepare their image inputs, as well as post-processing their outputs. It supports resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.

With that, all Stable diffusion pipelines now accept image inputs in the format of Pytorch Tensor and Numpy array, in addition to PIL Image, and can produce outputs in these 3 formats. It will also accept and return latents. This means you can now take generated latents from one pipeline and pass them to another as inputs, without leaving the latent space. If you work with multiple pipelines, you can pass Pytorch Tensor between them without converting to PIL Image.

To learn more about the API, check out our doc here

ControlNet Img2Img & Inpainting

ControlNet is one of the most used diffusion models and upon strong demand from the community we added controlnet img2img and controlnet inpaint pipelines. This allows to use any controlnet checkpoint for both image-2-image setting as well as for inpaint.

:point_right: Inpaint: See controlnet inpaint model here :point_right: Image-to-Image: Any controlnet checkpoint can be used for image to image, e.g.:

from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch

import cv2
from PIL import Image

# download an image
image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
np_image = np.array(image)

# get canny image
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

# generate image
generator = torch.manual_seed(0)
image = pipe(
    "futuristic-looking woman",
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
).images[0]

Diffedit Zero-Shot Inpainting Pipeline

This pipeline (introduced in DiffEdit: Diffusion-based semantic image editing with mask guidance) allows for image editing with natural language. Below is an end-to-end example.

First, let’s load our pipeline:

import torch
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionDiffEditPipeline

sd_model_ckpt = "stabilityai/stable-diffusion-2-1"
pipeline = StableDiffusionDiffEditPipeline.from_pretrained(
    sd_model_ckpt,
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
pipeline.enable_vae_slicing()
generator = torch.manual_seed(0)

Then, we load an input image to edit using our method:

from diffusers.utils import load_image

img_url = "https://github.com/Xiang-cd/DiffEdit-stable-diffusion/raw/main/assets/origin.png"
raw_image = load_image(img_url).convert("RGB").resize((768, 768))

Then, we employ the source and target prompts to generate the editing mask:

source_prompt = "a bowl of fruits"
target_prompt = "a basket of fruits"
mask_image = pipeline.generate_mask(
    image=raw_image,
    source_prompt=source_prompt,
    target_prompt=target_prompt,
    generator=generator,
)

Then, we employ the caption and the input image to get the inverted latents:

inv_latents = pipeline.invert(prompt=source_prompt, image=raw_image, generator=generator).latents

Now, generate the image with the inverted latents and semantically generated mask:

image = pipeline(
    prompt=target_prompt,
    mask_image=mask_image,
    image_latents=inv_latents,
    generator=generator,
    negative_prompt=source_prompt,
).images[0]
image.save("edited_image.png")

Check out the docs to learn more about this pipeline.

Thanks to @clarencechen for contributing this pipeline in this PR.

Docs

Apart from these, we have made multiple improvements to the overall quality-of-life of our docs.

Thanks to @stevhliu for leading the charge here.

Misc

xformers attention processor fix when using LoRA (PR by @takuma104)
Pytorch 2.0 SDPA implementation of the LoRA attention processor (PR)

All commits

Post release for 0.16.0 by @patrickvonplaten in #3244
[docs] only mention one stage by @pcuenca in #3246
Write model card in controlnet training script by @pcuenca in #3229
[2064]: Add stochastic sampler (sample_dpmpp_sde) by @nipunjindal in #3020
[Stochastic Sampler][Slow Test]: Cuda test fixes by @nipunjindal in #3257
Remove required from tracker_project_name by @pcuenca in #3260
adding required parameters while calling the get_up_block and get_down_block by @init-22 in #3210
[docs] Update interface in repaint.mdx by @ernestchu in #3119
Update IF name to XL by @apolinario in #3262
fix typo in score sde pipeline by @fecet in #3132
Fix typo in textual inversion JAX training script by @jairtrejo in #3123
AudioDiffusionPipeline - fix encode method after config changes by @teticio in #3114
Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline"" by @patrickvonplaten in #3265
Fix community pipelines by @patrickvonplaten in #3266
update notebook by @yiyixuxu in #3259
[docs] add notes for stateful model changes by @williamberman in #3252
[LoRA] quality of life improvements in the loading semantics and docs by @sayakpaul in #3180
[Community Pipelines] EDICT pipeline implementation by @Joqsan in #3153
[Docs]zh translated docs update by @DrDavidS in #3245
Update logging.mdx by @standardAI in #2863
Add multiple conditions to StableDiffusionControlNetInpaintPipeline by @timegate in #3125
Let's make sure that dreambooth always uploads to the Hub by @patrickvonplaten in #3272
Diffedit Zero-Shot Inpainting Pipeline by @clarencechen in #2837
add constant learning rate with custom rule by @jason9075 in #3133
Allow disabling torch 2_0 attention by @patrickvonplaten in #3273
[doc] add link to training script by @yiyixuxu in #3271
temp disable spectogram diffusion tests by @williamberman in #3278
Changed sample[0] to images[0] by @IliaLarchenko in #3304
Typo in tutorial by @IliaLarchenko in #3295
Torch compile graph fix by @patrickvonplaten in #3286
Postprocessing refactor img2img by @yiyixuxu in #3268
[Torch 2.0 compile] Fix more torch compile breaks by @patrickvonplaten in #3313
fix: scale_lr and sync example readme and docs. by @sayakpaul in #3299
Update stable_diffusion.mdx by @mu94-csl in #3310
Fix missing variable assign in DeepFloyd-IF-II by @gitmylo in #3315
Correct doc build for patch releases by @patrickvonplaten in #3316
Add Stable Diffusion RePaint to community pipelines by @Markus-Pobitzer in #3320
Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) by @LuChengTHU in #3314
[docs] Improve LoRA docs by @stevhliu in #3311
Added input pretubation by @isamu-isozaki in #3292
Update write_own_pipeline.mdx by @csaybar in #3323
update controlling generation doc with latest goodies. by @sayakpaul in #3321
[Quality] Make style by @patrickvonplaten in #3341
Fix config dpm by @patrickvonplaten in #3343
Add the SDE variant of DPM-Solver and DPM-Solver++ by @LuChengTHU in #3344
Add upsample_size to AttnUpBlock2D, AttnDownBlock2D by @will-rice in #3275
Rename --only_save_embeds to --save_as_full_pipeline by @arrufat in #3206
[AudioLDM] Generalise conversion script by @sanchit-gandhi in #3328
Fix TypeError when using prompt_embeds and negative_prompt by @At-sushi in #2982
Fix pipeline class on README by @themrzmaster in #3345
Inpainting: typo in docs by @LysandreJik in #3331
Add use_Karras_sigmas to LMSDiscreteScheduler by @Isotr0py in #3351
Batched load of textual inversions by @pdoane in #3277
[docs] Fix docstring by @stevhliu in #3334
if dreambooth lora by @williamberman in #3360
Postprocessing refactor all others by @yiyixuxu in #3337
[docs] Improve safetensors docstring by @stevhliu in #3368
add: a warning message when using xformers in a PT 2.0 env. by @sayakpaul in #3365
StableDiffusionInpaintingPipeline - resize image w.r.t height and width by @rupertmenneer in #3322
[docs] Adapt a model by @stevhliu in #3326
[docs] Load safetensors by @stevhliu in #3333
[Docs] Fix stable_diffusion.mdx typo by @sudowind in #3398
Support ControlNet v1.1 shuffle properly by @takuma104 in #3340
[Tests] better determinism by @sayakpaul in #3374
[docs] Add transformers to install by @stevhliu in #3388
[deepspeed] partial ZeRO-3 support by @stas00 in #3076
Add omegaconf for tests by @patrickvonplaten in #3400
Fix various bugs with LoRA Dreambooth and Dreambooth script by @patrickvonplaten in #3353
Fix docker file by @patrickvonplaten in #3402
fix: deepseepd_plugin retrieval from accelerate state by @sayakpaul in #3410
[Docs] Add sigmoid beta_scheduler to docstrings of relevant Schedulers by @Laurent2916 in #3399
Don't install accelerate and transformers from source by @patrickvonplaten in #3415
Don't install transformers and accelerate from source by @patrickvonplaten in #3414
Improve fast tests by @patrickvonplaten in #3416
attention refactor: the trilogy by @williamberman in #3387
[Docs] update the PT 2.0 optimization doc with latest findings by @sayakpaul in #3370
Fix style rendering by @pcuenca in #3433
unCLIP scheduler do not use note by @williamberman in #3417
Replace deprecated command with environment file by @jongwooo in #3409
fix warning message pipeline loading by @patrickvonplaten in #3446
add stable diffusion tensorrt img2img pipeline by @asfiyab-nvidia in #3419
Refactor controlnet and add img2img and inpaint by @patrickvonplaten in #3386
[Scheduler] DPM-Solver (++) Inverse Scheduler by @clarencechen in #3335
[Docs] Fix incomplete docstring for resnet.py by @Laurent2916 in #3438
fix tiled vae blend extent range by @superlabs-dev in #3384
Small update to "Next steps" section by @pcuenca in #3443
Allow arbitrary aspect ratio in IFSuperResolutionPipeline by @devxpy in #3298
Adding 'strength' parameter to StableDiffusionInpaintingPipeline by @rupertmenneer in #3424
[WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded by @vimarshc in #3448
Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) by @7eu7d7 in #3404
Make dreambooth lora more robust to orig unet by @patrickvonplaten in #3462
Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) by @cmdr2 in #3463
Add min snr to text2img lora training script by @wfng92 in #3459
Add inpaint lora scale support by @Glaceon-Hyy in #3460
[From ckpt] Fix from_ckpt by @patrickvonplaten in #3466
Update full dreambooth script to work with IF by @williamberman in #3425
Add IF dreambooth docs by @williamberman in #3470
parameterize pass single args through tuple by @williamberman in #3477
attend and excite tests disable determinism on the class level by @williamberman in #3478
dreambooth docs torch.compile note by @williamberman in #3471
add: if entry in the dreambooth training docs. by @sayakpaul in #3472
[docs] Textual inversion inference by @stevhliu in #3473
[docs] Distributed inference by @stevhliu in #3376
[{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices by @williamberman in #3479
mps & onnx tests rework by @pcuenca in #3449
[Attention processor] Better warning message when shifting to AttnProcessor2_0 by @sayakpaul in #3457
[Docs] add note on local directory path. by @sayakpaul in #3397
Refactor full determinism by @patrickvonplaten in #3485
Fix DPM single by @patrickvonplaten in #3413
Add use_Karras_sigmas to DPMSolverSinglestepScheduler by @Isotr0py in #3476
Adds local_files_only bool to prevent forced online connection by @w4ffl35 in #3486
[Docs] Korean translation (optimization, training) by @Snailpong in #3488
DataLoader respecting EXIF data in Training Images by @Ambrosiussen in #3465
feat: allow disk offload for diffuser models by @hari10599 in #3285
[Community] reference only control by @okotaku in #3435
Support for cross-attention bias / mask by @Birch-san in #2634
do not scale the initial global step by gradient accumulation steps when loading from checkpoint by @williamberman in #3506
Fix bug in panorama pipeline when using dpmsolver scheduler by @Isotr0py in #3499
[Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU by @yingjie-han in #3105
[Community] ControlNet Reference by @okotaku in #3508
Allow custom pipeline loading by @patrickvonplaten in #3504
Make sure Diffusers works even if Hub is down by @patrickvonplaten in #3447
Improve README by @patrickvonplaten in #3524
Update README.md by @patrickvonplaten in #3525
Run torch.compile tests in separate subprocesses by @pcuenca in #3503
fix attention mask pad check by @williamberman in #3531
explicit broadcasts for assignments by @williamberman in #3535
[Examples/DreamBooth] refactor save_model_card utility in dreambooth examples by @sayakpaul in #3543
Fix panorama to support all schedulers by @Isotr0py in #3546
Add open parti prompts to docs by @patrickvonplaten in #3549
Add Kandinsky 2.1 by @yiyixuxu @ayushtues in #3308
fix broken change for vq pipeline by @yiyixuxu in #3563
[Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting by @patrickvonplaten in #3533
Fix loaded_token reference before definition by @eminn in #3523
renamed variable to input_ and output_ by @vikasmech in #3507
Correct inpainting controlnet docs by @patrickvonplaten in #3572
Fix controlnet guess mode euler by @patrickvonplaten in #3571
[docs] Add AttnProcessor to docs by @stevhliu in #3474
[WIP] Add UniDiffuser model and pipeline by @dg845 in #2963
Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled by @takuma104 in #3556
fix dreambooth attention mask by @linbo0518 in #3541
[IF super res] correctly normalize PIL input by @williamberman in #3536
[docs] Maintenance by @stevhliu in #3552
[docs] update the broken links by @brandonJY in #3568
[docs] Working with different formats by @stevhliu in #3534
remove print statements from attention processor. by @sayakpaul in #3592
Fix temb attention by @patrickvonplaten in #3607
[docs] update the broken links by @kadirnar in #3577
[UniDiffuser Tests] Fix some tests by @sayakpaul in #3609
#3487 Fix inpainting strength for various samplers by @rupertmenneer in #3532
[Community] Support StableDiffusionTilingPipeline by @kadirnar in #3586
[Community, Enhancement] Add reference tricks in README by @okotaku in #3589
[Feat] Enable State Dict For Textual Inversion Loader by @ghunkins in #3439
[Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline by @TheDenk in #3587
fix tests by @patrickvonplaten in #3614
Make sure we also change the config when setting encoder_hid_dim_type=="text_proj" and allow xformers by @patrickvonplaten in #3615
goodbye frog by @williamberman in #3617
update code to reflect latest changes as of May 30th by @prathikr in #3616
update dreambooth lora to work with IF stage II by @williamberman in #3560
Full Dreambooth IF stage II upscaling by @williamberman in #3561
[Docs] include the instruction-tuning blog link in the InstructPix2Pix docs by @sayakpaul in #3644
[Kandinsky] Improve kandinsky API a bit by @patrickvonplaten in #3636
Support Kohya-ss style LoRA file format (in a limited capacity) by @takuma104 in #3437
Iterate over unique tokens to avoid duplicate replacements for multivector embeddings by @lachlan-nicholson in #3588
fixed typo in example train_text_to_image.py by @kashif in #3608
fix inpainting pipeline when providing initial latents by @yiyixuxu in #3641
[Community Doc] Updated the filename and readme file. by @kadirnar in #3634
add Stable Diffusion TensorRT Inpainting pipeline by @asfiyab-nvidia in #3642
set config from original module but set compiled module on class by @williamberman in #3650
dreambooth if docs - stage II, more info by @williamberman in #3628
linting fix by @williamberman in #3653
Set step_rules correctly for piecewise_constant scheduler by @0x1355 in #3605
Allow setting num_cycles for cosine_with_restarts lr scheduler by @0x1355 in #3606
[docs] Load A1111 LoRA by @stevhliu in #3629
dreambooth upscaling fix added latents by @williamberman in #3659
Correct multi gpu dreambooth by @patrickvonplaten in #3673
Fix from_ckpt not working properly on windows by @LyubimovVladislav in #3666
Update Compel documentation for textual inversions by @pdoane in #3663
[UniDiffuser test] fix one test so that it runs correctly on V100 by @sayakpaul in #3675
[docs] More API fixes by @stevhliu in #3640
[WIP]Vae preprocessor refactor (PR1) by @yiyixuxu in #3557
small tweaks for parsing thibaudz controlnet checkpoints by @williamberman in #3657
move activation dispatches into helper function by @williamberman in #3656
[docs] Fix link to loader method by @stevhliu in #3680
Add function to remove monkey-patch for text encoder LoRA by @takuma104 in #3649
[LoRA] feat: add lora attention processor for pt 2.0. by @sayakpaul in #3594
refactor Image processor for x4 upscaler by @yiyixuxu in #3692
feat: when using PT 2.0 use LoRAAttnProcessor2_0 for text enc LoRA. by @sayakpaul in #3691
Fix the Kandinsky docstring examples by @freespirit in #3695
Support views batch for panorama by @Isotr0py in #3632
Fix from_ckpt for Stable Diffusion 2.x by @ctrysbita in #3662
Add draft for lora text encoder scale by @patrickvonplaten in #3626

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@nipunjindal
- [2064]: Add stochastic sampler (sample_dpmpp_sde) (#3020)
- [Stochastic Sampler][Slow Test]: Cuda test fixes (#3257)
@clarencechen
- Diffedit Zero-Shot Inpainting Pipeline (#2837)
- [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
@Markus-Pobitzer
- Add Stable Diffusion RePaint to community pipelines (#3320)
@takuma104
- Support ControlNet v1.1 shuffle properly (#3340)
- Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled (#3556)
- Support Kohya-ss style LoRA file format (in a limited capacity) (#3437)
- Add function to remove monkey-patch for text encoder LoRA (#3649)
@asfiyab-nvidia
- add stable diffusion tensorrt img2img pipeline (#3419)
- add Stable Diffusion TensorRT Inpainting pipeline (#3642)
@Snailpong
- [Docs] Korean translation (optimization, training) (#3488)
@okotaku
- [Community] reference only control (#3435)
- [Community] ControlNet Reference (#3508)
- [Community, Enhancement] Add reference tricks in README (#3589)
@Birch-san
- Support for cross-attention bias / mask (#2634)
@yingjie-han
- [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU (#3105)
@dg845
- [WIP] Add UniDiffuser model and pipeline (#2963)
@kadirnar
- [docs] update the broken links (#3577)
- [Community] Support StableDiffusionTilingPipeline (#3586)
- [Community Doc] Updated the filename and readme file. (#3634)
@TheDenk
- [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline (#3587)
@prathikr
- update code to reflect latest changes as of May 30th (#3616)

Patch Release: v0.16.1

v0.16.1: Patch Release to fix IF naming, community pipeline versioning, and to allow disable VAE PT 2 attention

merge conflict by @apolinario (direct commit on v0.16.1-patch)
Fix community pipelines by @patrickvonplaten in #3266
Allow disabling torch 2_0 attention by @patrickvonplaten in #3273

v0.16.0 DeepFloyd IF & ControlNet v1.1

DeepFloyd's IF: The open-sourced Imagen

IF

IF is a pixel-based text-to-image generation model and was released in late April 2023 by DeepFloyd.

The model architecture is strongly inspired by Google's closed-sourced Imagen and a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding:

Installation

pip install torch --upgrade  # diffusers' IF is optimized for torch 2.0
pip install diffusers --upgrade

Accept the License

Before you can use IF, you need to accept its usage conditions. To do so:

Make sure to have a Hugging Face account and be logged in
Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0
Log-in locally

from huggingface_hub import login

login()

and enter your Hugging Face Hub access token.

Code example

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {
    "feature_extractor": stage_1.feature_extractor,
    "safety_checker": stage_1.safety_checker,
    "watermarker": stage_1.watermarker,
}
stage_3 = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16
)
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
generator = torch.manual_seed(1)

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

# stage 1
image = stage_1(
    prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(
    image=image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    generator=generator,
    output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3
image = stage_3(prompt=prompt, image=image, noise_level=100, generator=generator).images
image[0].save("./if_stage_III.png")

For more details about speed and memory optimizations, please have a look at the blog or docs below.

Useful links

:point_right: The official codebase :point_right: Blog post :point_right: Space Demo :point_right: In-detail docs

ControlNet v1.1

Lvmin Zhang has released improved ControlNet checkpoints as well as a couple of new ones.

You can find all :firecracker: Diffusers checkpoints here Please have a look directly at the model cards on how to use the checkpoins:

Improved checkpoints:

Model Name	Control Image Overview	Control Image Example	Generated Image Example
lllyasviel/control_v11p_sd15_canny<br/> Trained with canny edge detection	A monochrome image with white edges on a black background.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_mlsd<br/> Trained with multi-level line segment detection	An image with annotated line segments.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11f1p_sd15_depth<br/> Trained with depth estimation	An image with depth information, usually represented as a grayscale image.	<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_normalbae<br/> Trained with surface normal estimation	An image with surface normal information, usually represented as a color-coded image.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_seg<br/> Trained with image segmentation	An image with segmented regions, usually represented as a color-coded image.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_lineart<br/> Trained with line art generation	An image with line art, usually black lines on a white background.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_openpose<br/> Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_scribble<br/> Trained with scribble-based image generation	An image with scribbles, usually random or user-drawn strokes.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_softedge<br/> Trained with soft edge image generation	An image with soft edges, usually to create a more painterly or artistic effect.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"/></a>

New checkpoints:

Model Name	Control Image Overview	Control Image Example	Generated Image Example
lllyasviel/control_v11e_sd15_ip2p<br/> Trained with pixel to pixel instruction	No condition .	<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_inpaint<br/> Trained with image inpainting	No condition.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"/></a>
lllyasviel/control_v11e_sd15_shuffle<br/> Trained with image shuffling	An image with shuffled patches or regions.	<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15s2_lineart_anime<br/> Trained with anime line art generation	An image with anime-style line art.	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"/></a>	<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"/></a>

All commits

[Tests] Speed up panorama tests by @sayakpaul in #3067
[Post release] v0.16.0dev by @patrickvonplaten in #3072
Adds profiling flags, computes train metrics average. by @andsteing in #3053
[Pipelines] Make sure that None functions are correctly not saved by @patrickvonplaten in #3080
doc string example remove from_pt by @yiyixuxu in #3083
[Tests] parallelize by @patrickvonplaten in #3078
Throw deprecation warning for return_cached_folder by @patrickvonplaten in #3092
Allow SD attend and excite pipeline to work with any size output images by @jcoffland in #2835
[docs] Update community pipeline docs by @stevhliu in #2989
Add to support Guess Mode for StableDiffusionControlnetPipleline by @takuma104 in #2998
fix default value for attend-and-excite by @yiyixuxu in #3099
remvoe one line as requested by gc team by @yiyixuxu in #3077
ddpm custom timesteps by @williamberman in #3007
Fix breaking change in pipeline_stable_diffusion_controlnet.py by @remorses in #3118
Add global pooling to controlnet by @patrickvonplaten in #3121
[Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
[Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
Improve deprecation warnings by @patrickvonplaten in #3131
Fix config deprecation by @patrickvonplaten in #3129
feat: verfication of multi-gpu support for select examples. by @sayakpaul in #3126
speed up attend-and-excite fast tests by @yiyixuxu in #3079
Optimize log_validation in train_controlnet_flax by @cgarciae in #3110
make style by @patrickvonplaten (direct commit on main)
Correct textual inversion readme by @patrickvonplaten in #3145
Add unet act fn to other model components by @williamberman in #3136
class labels timestep embeddings projection dtype cast by @williamberman in #3137
[ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model by @cmdr2 in #2705
add from_ckpt method as Mixin by @1lint in #2318
Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils by @asfiyab-nvidia in #2974
Correct Transformer2DModel.forward docstring by @off99555 in #3074
Update pipeline_stable_diffusion_inpaint_legacy.py by @hwuebben in #2903
Modified altdiffusion pipline to support altdiffusion-m18 by @superhero-7 in #2993
controlnet training resize inputs to multiple of 8 by @williamberman in #3135
adding custom diffusion training to diffusers examples by @nupurkmr9 in #3031
Update custom_diffusion.mdx by @mishig25 in #3165
Added distillation for quantization example on textual inversion. by @XinyuYe-Intel in #2760
make style by @patrickvonplaten (direct commit on main)
Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline by @clarencechen in #2942
[DreamBooth] add text encoder LoRA support in the DreamBooth training script by @sayakpaul in #3130
Update Habana Gaudi documentation by @regisss in #3169
Add model offload to x4 upscaler by @patrickvonplaten in #3187
[docs] Deterministic algorithms by @stevhliu in #3172
Update custom_diffusion.mdx to credit the author by @sayakpaul in #3163
Fix TensorRT community pipeline device set function by @asfiyab-nvidia in #3157
make from_flax work for controlnet by @yiyixuxu in #3161
[docs] Clarify training args by @stevhliu in #3146
Multi Vector Textual Inversion by @patrickvonplaten in #3144
Add Karras sigmas to HeunDiscreteScheduler by @youssefadr in #3160
[AudioLDM] Fix dtype of returned waveform by @sanchit-gandhi in #3189
Fix bug in train_dreambooth_lora by @crywang in #3183
[Community Pipelines] Update lpw_stable_diffusion pipeline by @SkyTNT in #3197
Make sure VAE attention works with Torch 2_0 by @patrickvonplaten in #3200
Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" by @williamberman in #3201
[Bug fix] Fix batch size attention head size mismatch by @patrickvonplaten in #3214
fix mixed precision training on train_dreambooth_inpaint_lora by @themrzmaster in #3138
adding enable_vae_tiling and disable_vae_tiling functions by @init-22 in #3225
Add ControlNet v1.1 docs by @patrickvonplaten in #3226
Fix issue in maybe_convert_prompt by @pdoane in #3188
Sync cache version check from transformers by @ychfan in #3179
Fix docs text inversion by @patrickvonplaten in #3166
add model by @patrickvonplaten in #3230
Allow return pt x4 by @patrickvonplaten in #3236
Allow fp16 attn for x4 upscaler by @patrickvonplaten in #3239
fix fast test by @patrickvonplaten in #3241
Adds a document on token merging by @sayakpaul in #3208
[AudioLDM] Update docs to use updated ckpt by @sanchit-gandhi in #3240
Release: v0.16.0 by @patrickvonplaten (direct commit on main)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@1lint
- add from_ckpt method as Mixin (#2318)
@asfiyab-nvidia
- Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils (#2974)
- Fix TensorRT community pipeline device set function (#3157)
@nupurkmr9
- adding custom diffusion training to diffusers examples (#3031)
@XinyuYe-Intel
- Added distillation for quantization example on textual inversion. (#2760)
@SkyTNT
- [Community Pipelines] Update lpw_stable_diffusion pipeline (#3197)

v0.15.1: Patch Release to fix safety checker, config access and uneven scheduler

Fixes bugs related to missing global pooling in controlnet, img2img processor issue with safety checker, uneven timesteps and better config deprecation

[Bug fix] Add global pooling to controlnet by @patrickvonplaten in #3121
[Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
[Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
[Bug fix] Fix config deprecation by @patrickvonplaten in #3129

v0.15.0 Beyond Image Generation

Taking Diffusers Beyond Image Generation

We are very excited about this release! It brings new pipelines for video and audio to diffusers, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers for other projects!

In addition to that, diffusers 0.15 includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!

🎬 Text-to-Video

Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.

Text-to-Video

Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "Spiderman is surfing"
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)

For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"

Text-to-Video Zero

Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.

For more information please have a look at PAIR/Text2Video-Zero

🔉 Audio Generation

Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models. The 0.15.0 release includes two powerful audio diffusion models.

AudioLDM

Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.

from diffusers import AudioLDMPipeline
import torch

repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

The resulting audio output can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)

For more information see cvssp/audioldm

Spectrogram Diffusion

This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.

from diffusers import SpectrogramDiffusionPipeline, MidiProcessor

pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
pipe = pipe.to("cuda")
processor = MidiProcessor()

# Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
output = pipe(processor("beethoven_hammerklavier_2.mid"))

audio = output.audios[0]

📗 New Docs

Documentation is crucially important for diffusers, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.

Check it out: https://huggingface.co/docs/diffusers/index

Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.

🪄 Stable UnCLIP

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:

from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16)
pipe.to("cuda")

# get image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
image = load_image(url)

# run image variation
image = pipe(image).images[0]

For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"

🚀 More ControlNet

ControlNet was released in diffusers in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!

Multi-ControlNet

Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[controlnet_pose, controlnet_canny]
).to("cuda")

pose_image = ...
canny_image = ...
prompt = ...

image = pipe(prompt=prompt, image=[pose_image, canny_image]).images[0]

And this is an example of how this affects generation:

Control Image1	Control Image2	Generated
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png">	<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png">	<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_and_canny_result_19.png">
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png">	(none)	<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_only_result_19.png">
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png">	(none)	<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_canny_only_result_19.png">

ControlNet Training

We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects :exploding_head:. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.

🐈‍⬛ Textual Inversion, Revisited

Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏

Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers. Please check the updated documentation for details.

🏃 Faster loading of cached pipelines

We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:

Previous: 2.27 sec
Now: 1.1 sec

Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.

This may not sound as much, but many people use diffusers for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.

This can be further reduced by forcing diffusers to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.

🔩 Weight prompting using `compel`

Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:

Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
At the same time, we adopted compel, by @damian0815, as a higher-level library to create the weighted embeddings.

You don't have to use compel to create the embeddings, but if you do, this is an example of how it looks in practice:

from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
from compel import Compel

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
prompt = "a red cat playing with a ball++"
prompt_embeds = compel_proc(prompt)

image = pipe(prompt_embeds=prompt_embeds, num_inference_steps=20).images[0]

As you can see, we assign more weight to the ball word using a compel-specific syntax (ball++). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.

You can read more details in the documentation.

🎲 Karras Sigmas for schedulers

Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !

See Add Karras pattern to discrete euler in #2956 for more information.

All commits

Adding support for safetensors and LoRa. by @Narsil in #2448
[Post release] Push post release by @patrickvonplaten in #2546
Correct section docs by @patrickvonplaten in #2540
adds xformers support to train_unconditional.py by @vvvm23 in #2520
Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
move test num_images_per_prompt to pipeline mixin by @williamberman in #2488
Training tutorial by @stevhliu in #2473
Fix regression introduced in #2448 by @Narsil in #2551
Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
[PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
[Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
Fix: controlnet docs format by @vicoooo26 in #2559
ema step, don't empty cuda cache by @williamberman in #2563
Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
[Unet1d] correct docs by @patrickvonplaten in #2565
[Training] Fix tensorboard typo by @patrickvonplaten in #2566
allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
Allow textual_inversion_flax script to use save_steps and revision flag by @haixinxu in #2075
add intermediate logging for dreambooth training script by @yiyixuxu in #2557
community controlnet inpainting pipelines by @williamberman in #2561
[docs] Move relevant code for text2image to docs by @stevhliu in #2537
[docs] Move DreamBooth training materials to docs by @stevhliu in #2547
[docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
Update quicktour by @stevhliu in #2463
Support revision in Flax text-to-image training by @pcuenca in #2567
fix the default value of doc by @xiaohu2015 in #2539
Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
[Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
[Docs] Weight prompting using compel by @patrickvonplaten in #2574
community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
[docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
add deps table check updated to ci by @williamberman in #2590
Add notebook doc img2img by @yiyixuxu in #2472
[docs] Build notebooks from Markdown by @stevhliu in #2570
[Docs] Fix link to colab by @patrickvonplaten in #2604
[docs] Update unconditional image generation docs by @stevhliu in #2592
Add OpenVINO documentation by @echarlaix in #2569
Support LoRA for text encoder by @haofanwang in #2588
fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
Fix typos by @standardAI in #2608
Migrate blog content to docs by @stevhliu in #2477
Add cache_dir to docs by @patrickvonplaten in #2624
Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
Up vesion at which we deprecate "revision='fp16'" since transformers is not released yet by @patrickvonplaten in #2623
[Tests] Split scheduler tests by @patrickvonplaten in #2630
Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
update paint by example docs by @williamberman in #2598
[From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
add translated docs by @LolitaSian in #2587
[Dreambooth] Editable number of class images by @Mr-Philo in #2251
Update quicktour.mdx by @standardAI in #2637
Update basic_training.mdx by @standardAI in #2639
controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
[docs] Update readme by @stevhliu in #2612
[Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
[docs] Build Jax notebooks for real by @stevhliu in #2641
Update loading.mdx by @standardAI in #2642
Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
Update schedulers.mdx by @standardAI in #2647
[attention] Fix attention by @patrickvonplaten in #2656
Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
[Tests] Adds a test suite for EMAModel by @sayakpaul in #2530
fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
image generation main process checks by @williamberman in #2631
[Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
AutoencoderKL: clamp indices of blend_h and blend_v to input size by @kig in #2660
Update README.md by @qwjaskzxl in #2653
[Lora] correct lora saving & loading by @patrickvonplaten in #2655
Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
controlnet integration tests num_inference_steps=3 by @williamberman in #2672
Controlnet training by @Ttl in #2545
[Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
[Tests] fix: slow serialization test by @sayakpaul in #2678
Update Dockerfile CUDA by @patrickvonplaten in #2682
T5Attention support for cross-attention by @kashif in #2654
Update custom_pipeline_overview.mdx by @standardAI in #2684
Update kerascv.mdx by @standardAI in #2685
Update img2img.mdx by @standardAI in #2688
Update conditional_image_generation.mdx by @standardAI in #2687
Update controlling_generation.mdx by @standardAI in #2690
Update unconditional_image_generation.mdx by @standardAI in #2686
Add image_processor by @yiyixuxu in #2617
[docs] Add overviews to each section by @stevhliu in #2657
[docs] Create better navigation on index by @stevhliu in #2658
[docs] Reorganize table of contents by @stevhliu in #2671
Rename attention by @patrickvonplaten in #2691
Adding use_safetensors argument to give more control to users by @Narsil in #2123
[docs] Add safety checker to ethical guidelines by @stevhliu in #2699
train_unconditional save restore unet parameters by @williamberman in #2706
Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
fix image link in inpaint doc by @yiyixuxu in #2693
[docs] Update ONNX doc to use optimum by @sayakpaul in #2702
Enabling gradient checkpointing for VAE by @Pie31415 in #2536
[Tests] Correct PT2 by @patrickvonplaten in #2724
Update mps.mdx by @standardAI in #2749
Update torch2.0.mdx by @standardAI in #2748
Update fp16.mdx by @standardAI in #2746
Update dreambooth.mdx by @standardAI in #2742
Update philosophy.mdx by @standardAI in #2752
Update text_inversion.mdx by @standardAI in #2751
add: controlnet entry to training section in the docs. by @sayakpaul in #2677
Update numbers for Habana Gaudi in documentation by @regisss in #2734
Improve Contribution Doc by @patrickvonplaten in #2043
Fix typos by @apivovarov in #2715
[1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
Fix mps tests on torch 2.0 by @pcuenca in #2766
Add option to set dtype in pipeline.to() method by @1lint in #2317
stable diffusion depth batching fix by @williamberman in #2757
[docs] update torch 2 benchmark by @pcuenca in #2764
[docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
[MS Text To Video] Add first text to video by @patrickvonplaten in #2738
mps: remove warmup passes by @pcuenca in #2771
Support for Offset Noise in examples by @haofanwang in #2753
add: section on multiple controlnets. by @sayakpaul in #2762
[Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
deduplicate training section in the docs. by @sayakpaul in #2788
[UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
[doc wip] literalinclude by @mishig25 in #2718
Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
Music Spectrogram diffusion pipeline by @kashif in #1044
[2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
[Docs] small fixes to the text to video doc. by @sayakpaul in #2787
Update train_text_to_image_lora.py by @haofanwang in #2767
Skip mps in text-to-video tests by @pcuenca in #2792
Flax controlnet by @yiyixuxu in #2727
[docs] Add Colab notebooks and Spaces by @stevhliu in #2713
Add AudioLDM by @sanchit-gandhi in #2232
Update train_text_to_image_lora.py by @haofanwang in #2795
Add ModelEditing pipeline by @bahjat-kawar in #2721
Relax DiT test by @kashif in #2808
Update onnxruntime package candidates by @PeixuanZuo in #2666
[Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
Update examples README.md to include the latest examples by @sayakpaul in #2839
Ruff: apply same rules as in transformers by @pcuenca in #2827
[Tests] Fix slow tests by @patrickvonplaten in #2846
Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
Helper function to disable custom attention processors by @pcuenca in #2791
improve stable unclip doc. by @sayakpaul in #2823
add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
[WIP]Flax training script for controlnet by @yiyixuxu in #2818
Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
[Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
updated onnx pndm test by @kashif in #2811
[Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
fix KarrasVePipeline bug by @junhsss in #2828
StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
Remove duplicate sentence in docstrings by @qqaatw in #2834
Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
[WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
Add last_epoch argument to optimization.get_scheduler by @felixblanke in #2850
[WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
[2761]: Add documentation for extra_in_channels UNet1DModel by @nipunjindal in #2817
[Tests] Adds a test to check if image_embeds None case is handled properly in StableUnCLIPImg2ImgPipeline by @sayakpaul in #2861
Update evaluation.mdx by @standardAI in #2862
Update overview.mdx by @standardAI in #2864
Update alt_diffusion.mdx by @standardAI in #2865
Update paint_by_example.mdx by @standardAI in #2869
Update stable_diffusion_safe.mdx by @standardAI in #2870
[Docs] Correct phrasing by @patrickvonplaten in #2873
[Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
[docs] Performance tutorial by @stevhliu in #2773
[Docs] add an example use for StableUnCLIPPipeline in the pipeline docs by @sayakpaul in #2897
add flax requirement by @yiyixuxu in #2894
Support fp16 in conversion from original ckpt by @burgalon in #2733
img2img.multiple.controlnets.pipeline by @mikegarts in #2833
add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
[docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
Add support Karras sigmas for StableDiffusionKDiffusionPipeline by @takuma104 in #2874
Fix textual inversion loading by @GuiyeC in #2914
Fix slow tests text inv by @patrickvonplaten in #2915
Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
Modify example with intel optimization by @mengfei25 in #2896
[2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
[Tests] Speed up test by @patrickvonplaten in #2919
Have fix current pipeline link by @guspan-tanadi in #2910
Update image_variation.mdx by @standardAI in #2911
Update controlnet.mdx by @standardAI in #2912
Update pipeline_stable_diffusion_controlnet.py by @patrickvonplaten in #2917
Check for all different packages of opencv by @wfng92 in #2901
fix: norm group test for UNet3D. by @sayakpaul in #2959
Update euler_ancestral.mdx by @standardAI in #2932
Update unipc.mdx by @standardAI in #2936
Update score_sde_ve.mdx by @standardAI in #2937
Update score_sde_vp.mdx by @standardAI in #2938
Update ddim.mdx by @standardAI in #2926
Update ddpm.mdx by @standardAI in #2929
Removing explicit markdown extension by @guspan-tanadi in #2944
Ensure validation image RGB not RGBA by @ernestchu in #2945
Use upload_folder in training scripts by @Wauplin in #2934
allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
fix post-processing by @yiyixuxu in #2968
[docs] Simplify loading guide by @stevhliu in #2694
update flax controlnet training script by @yiyixuxu in #2951
[Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
The variable name has been updated. by @kadirnar in #2970
Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
[Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
[scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
minor fix in controlnet flax example by @yiyixuxu in #2986
Explain how to install test dependencies by @pcuenca in #2983
docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
dynamic threshold sampling bug fixes and docs by @williamberman in #3003
Initial draft of Core ML docs by @pcuenca in #2987
[Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
Small typo correction in comments by @rogerioagjr in #3012
mps: skip unstable test by @pcuenca in #3037
Update contribution.mdx by @mishig25 in #3054
fix report tool by @patrickvonplaten in #3047
Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
[docs] Reusing components by @stevhliu in #3000
Fix imports for composable_stable_diffusion pipeline by @nthh in #3002
config fixes by @williamberman in #3060
accelerate min version for ProjectConfiguration import by @williamberman in #3042
AttentionProcessor.group_norm num_channels should be query_dim by @williamberman in #3046
Update documentation by @George-Ogden in #2996
Fix scheduler type mismatch by @pcuenca in #3041
Fix invocation of some slow Flax tests by @pcuenca in #3058
add only cross attention to simple attention blocks by @williamberman in #3011
Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
unet time embedding activation function by @williamberman in #3048
Attention processor cross attention norm group norm by @williamberman in #3021
Attn added kv processor torch 2.0 block by @williamberman in #3023
[Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
[LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
fix slow tsets by @patrickvonplaten in #3066
Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
[Docs] update Self-Attention Guidance docs by @SusungHong in #2952
Flax memory efficient attention by @pcuenca in #2889
[WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
fix pipeline setattr value == None by @williamberman in #3063
add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
[2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
Finish docs textual inversion by @patrickvonplaten in #3068
[Docs] refactor text-to-video zero by @sayakpaul in #3049
Update Flax TPU tests by @pcuenca in #3069
Fix a bug of pano when not doing CFG by @ernestchu in #3030
Text2video zero refinements by @19and99 in #3070

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Abhinay1997
- [Community Pipeline] Unclip Image Interpolation (#2400)
@ssube
- add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
- add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (#2597)
@haofanwang
- Support convert LoRA safetensors into diffusers format (#2403)
- Support LoRA for text encoder (#2588)
- Support for Offset Noise in examples (#2753)
- Update train_text_to_image_lora.py (#2767)
- Update train_text_to_image_lora.py (#2795)
@isamu-isozaki
- Added multitoken training for textual inversion. Issue 369 (#661)
@mikegarts
- community stablediffusion controlnet img2img pipeline (#2584)
- img2img.multiple.controlnets.pipeline (#2833)
@LolitaSian
- add translated docs (#2587)
@Ttl
- Controlnet training (#2545)
@nipunjindal
- [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
- [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
- [2761]: Add documentation for extra_in_channels UNet1DModel (#2817)
- [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline (#2902)
- [2905]: Add Karras pattern to discrete euler (#2956)
- [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
@bahjat-kawar
- Add ModelEditing pipeline (#2721)
- StableDiffusionModelEditingPipeline documentation (#2810)
@piEsposito
- add load textual inversion embeddings to stable diffusion (#2009)
@19and99
- [Pipeline] Add TextToVideoZeroPipeline (#2954)
- Text2video zero refinements (#3070)
@MuhHanif
- Flax memory efficient attention (#2889)

Patch release 4.55.2: for FA2 users!

Diffusers

Würstchen

T2I Adapters for Stable Diffusion XL (SDXL)

Faster imports

Faster LoRA loading

LoRA fusing

More support for LoRAs

All commits

Significant community contributions

All commits

All commits

SDXL ControlNets 🚀

MultiControlNet for SDXL

GLIGEN

Tiny Autoencoder

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Support of push_to_hub for models, schedulers, and pipelines

Better support for loading Kohya-trained LoRA checkpoints

Better documentation for prompt weighting

Defaulting to serialize with .safetensors

All commits

Significant community contributions

All commits

SDXL (Kohya-style) LoRA

Batched inference

Downloads

Improved SDXL behavior

All commits:

All commits

SDXL 1.0

New training scripts for SDXL

New pipelines for SDXL

The AutoPipeline API

A new “combined pipeline” for the Kandinsky series

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

Asymmetric VQGAN

Improved support for loading Kohya-style LoRA checkpoints

T2I Adapter

All commits

Significant community contributions

Patch release 0.18.1: Stable Diffusion XL 0.9 Research Release

Text-to-Image

Refining the image output

Loading single file checkpoitns / original file format

Memory optimization via model offloading

Speed-up inference with torch.compile

All commits

Shap-E

Text to 3D

Image to 3D

Consistency models

Video-to-Video

All commits

Significant community contributions

Kandinsky 2.1

Installation

Code example

UniDiffuser

LoRA

Torch 2.0 Compile Speed-up

VAE pre-processing

ControlNet Img2Img & Inpainting

Diffedit Zero-Shot Inpainting Pipeline

Docs

Misc

All commits

Significant community contributions

v0.16.1: Patch Release to fix IF naming, community pipeline versioning, and to allow disable VAE PT 2 attention

DeepFloyd's IF: The open-sourced Imagen

IF

Installation

Accept the License

Code example

Useful links

ControlNet v1.1

Improved checkpoints:

New checkpoints:

All commits

Significant community contributions

Taking Diffusers Beyond Image Generation

Support of `push_to_hub` for models, schedulers, and pipelines

Defaulting to serialize with `.safetensors`

🔩 Weight prompting using `compel`