releases.shpreview
Hugging Face/Diffusers

Diffusers

$npx -y @buildinternet/releases show diffusers
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases2Avg0/wkVersionsv0.37.0 → v0.37.1
Sep 27, 2023
Patch Release: Fix LoRA attention processor for xformers.
Sep 18, 2023
Patch Release: CPU offloading + Lora load/Text inv load & Multi Adapter
  • [Textual inversion] Refactor textual inversion to make it cleaner by @patrickvonplaten in #5076
  • t2i Adapter community member fix by @williamberman in #5090
  • remove unused adapter weights in constructor by @williamberman in #5088
  • [LoRA] don't break offloading for incompatible lora ckpts. by @sayakpaul in #5085
Sep 14, 2023
Patch Release v0.21.1: Fix import and config loading for `from_single_file`
  • Fix model offload bug when key isn't present by @DN6 in #5030
  • [Import] Don't force transformers to be installed by @patrickvonplaten in #5035
  • allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
  • [Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036
Sep 13, 2023
v0.21.0: Würstchen, Faster LoRA loading, Faster imports, T2I Adapters for SDXL, and more

Würstchen

Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.

Here is how to use the Würstchen as a pipeline:

import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS

pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(
	caption,
	height=1024,
	width=1536,
	prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
	prior_guidance_scale=4.0,
	num_images_per_prompt=4,
).images

To learn more about the pipeline, check out the official documentation.

This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.

👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen

T2I Adapters for Stable Diffusion XL (SDXL)

T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.

Below is an how to use the StableDiffusionXLAdapterPipeline.

First ensure, the controlnet_aux is installed:

pip install -U controlnet_aux==0.0.7

Then we can initialize the pipeline:

import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
                       StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid

# load adapter
adapter = T2IAdapter.from_pretrained(
    "TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
    model_id, subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id,
    vae=vae,
    adapter=adapter,
    scheduler=euler_a,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

# load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")

We then load an image to compute the lineart conditionings:

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)

Then we generate:

prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5,
).images[0]

Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline.

This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.

We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.

Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline in #4696.

Faster imports

We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines, models, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers.

With lazy imports:

real    0m0.417s
user    0m0.714s
sys     0m0.499s

Without lazy imports:

real    0m5.391s
user    0m5.299s
sys     0m1.273s

Faster LoRA loading

Previously, loading LoRA parameters using the load_lora_weights() used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage argument to the load_lora_weights() method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True to take the benefits.

LoRA fusing

LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.

For more information, have a look at the documentation and the original PR: https://github.com/huggingface/diffusers/pull/4473.

More support for LoRAs

Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.

All commits

  • fix: lora sdxl tests by @sayakpaul in #4652
  • Support tiled encode/decode for AutoencoderTiny by @Isotr0py in #4627
  • Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
  • add config_file to from_single_file by @zuojianghua in #4614
  • Add AudioLDM 2 by @sanchit-gandhi in #4549
  • [docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
  • [Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
  • [LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
  • [LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
  • Replaces DIFFUSERS_TEST_DEVICE backend list with trying device by @vvvm23 in #4673
  • add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
  • Add reference_attn & reference_adain support for sdxl by @zideliu in #4502
  • [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
  • rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
  • Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
  • Fix all docs by @patrickvonplaten in #4721
  • fix bad error message when transformers is missing by @patrickvonplaten in #4714
  • Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
  • [Examples] fix checkpointing and casting bugs in train_text_to_image_lora_sdxl.py by @sayakpaul in #4632
  • [AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
  • [docs] add variant="fp16" flag by @realliujiaxu in #4678
  • [AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
  • fix dummy import for AudioLDM2 by @patil-suraj in #4741
  • change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
  • add a step_index counter by @yiyixuxu in #4347
  • [AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
  • Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
  • Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
  • [Tests] Fix paint by example by @patrickvonplaten in #4761
  • [fix] multi t2i adapter set total_downscale_factor by @williamberman in #4621
  • [Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
  • [LoRA] relax lora loading logic by @sayakpaul in #4610
  • [Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
  • fix sdxl_lwp empty neg_prompt error issue by @xhinker in #4743
  • improve setup.py by @sayakpaul in #4748
  • Torch device by @patrickvonplaten in #4755
  • [AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
  • Convert MusicLDM by @sanchit-gandhi in #4579
  • [WIP ] Proposal to address precision issues in CI by @DN6 in #4775
  • fix a bug in from_pretrained when load optional components by @yiyixuxu in #4745
  • fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
  • Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
  • instance_prompt->class_prompt by @williamberman in #4784
  • refactor prepare_mask_and_masked_image with VaeImageProcessor by @yiyixuxu in #4444
  • Allow passing a checkpoint state_dict to convert_from_ckpt (instead of just a string path) by @cmdr2 in #4653
  • [SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
  • [Core] Support negative conditions in SDXL by @sayakpaul in #4774
  • Unet fix by @canberk17 in #4769
  • [Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
  • [docs] Fix syntax for compel by @stevhliu in #4794
  • [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
  • [SDXL Lora] Fix last ben sdxl lora by @patrickvonplaten in #4797
  • [LoRA Attn Processors] Refactor LoRA Attn Processors by @patrickvonplaten in #4765
  • Update loaders.py by @chillpixelfun in #4805
  • [WIP] Add Fabric by @shauray8 in #4201
  • Fix save_path bug in textual inversion training script by @Yead in #4710
  • [Examples] Save SDXL LoRA weights with chosen precision by @mnslarcher in #4791
  • Fix Disentangle ONNX and non-ONNX pipeline by @DN6 in #4656
  • fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by @yiyixuxu in #4799
  • fix auto_pipeline: pass kwargs to load_config by @yiyixuxu in #4793
  • add StableDiffusionXLControlNetImg2ImgPipeline by @yiyixuxu in #4592
  • add models for T2I-Adapter-XL by @MC-E in #4696
  • Fuse loras by @patrickvonplaten in #4473
  • Fix convert_original_stable_diffusion_to_diffusers script by @wingrime in #4817
  • Support saving multiple t2i adapter models under one checkpoint by @VitjanZ in #4798
  • fix typo by @zideliu in #4822
  • VaeImageProcessor: Allow image resizing also for torch and numpy inputs by @gajendr-nikhil in #4832
  • [Core] refactor encode_prompt by @sayakpaul in #4617
  • Add loading ckpt from file for SDXL controlNet by @antigp in #4683
  • Fix Unfuse Lora by @patrickvonplaten in #4833
  • sketch inpaint from a1111 for non-inpaint models by @noskill in #4824
  • [docs] SDXL by @stevhliu in #4428
  • [Docs] improve the LoRA doc. by @sayakpaul in #4838
  • Fix potential type mismatch errors in SDXL pipelines by @hyk1996 in #4796
  • Fix image processor inputs width by @echarlaix in #4853
  • Remove warn with deprecate by @patrickvonplaten in #4850
  • [docs] ControlNet guide by @stevhliu in #4640
  • [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
  • fix sdxl-inpaint fast test by @yiyixuxu in #4859
  • [docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by @dg845 in #4536
  • Add GLIGEN Text Image implementation by @tuanh123789 in #4777
  • Test Cleanup Precision issues by @DN6 in #4812
  • Fix link from API to using-diffusers by @pcuenca in #4856
  • [Docs] Korean translation update by @Snailpong in #4684
  • fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by @yiyixuxu in #4862
  • support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by @yiyixuxu in #4861
  • [WIP] masked_latent_inputs for inpainting pipeline by @yiyixuxu in #4819
  • [docs] DiffEdit guide by @stevhliu in #4722
  • [docs] Shap-E guide by @stevhliu in #4700
  • [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by @harutatsuakiyama in #4694
  • [Tests] Add combined pipeline tests by @patrickvonplaten in #4869
  • Retrieval Augmented Diffusion Models by @isamu-isozaki in #3297
  • check for unet_lora_layers in sdxl pipeline's save_lora_weights method by @ErwannMillon in #4821
  • Fix get_dummy_inputs for Stable Diffusion Inpaint Tests by @dg845 in #4845
  • allow passing components to connected pipelines when use the combined pipeline by @yiyixuxu in #4883
  • [Core] LoRA improvements pt. 3 by @sayakpaul in #4842
  • Add dropout parameter to UNet2DModel/UNet2DConditionModel by @dg845 in #4882
  • [Core] better support offloading when side loading is enabled. by @sayakpaul in #4855
  • Add --vae_precision option to the SDXL pix2pix script so that we have… by @bghira in #4881
  • [Test] Reduce CPU memory by @patrickvonplaten in #4897
  • fix a bug in StableDiffusionUpscalePipeline.run_safety_checker by @yiyixuxu in #4886
  • remove latent input for kandinsky prior_emb2emb pipeline by @yiyixuxu in #4887
  • [docs] Add stronger warning for SDXL height/width by @stevhliu in #4867
  • [Docs] add doc entry to explain lora fusion and use of different scales. by @sayakpaul in #4893
  • [Textual inversion] Relax loading textual inversion by @patrickvonplaten in #4903
  • [docs] Fix typo in Inpainting force unmasked area unchanged example by @dg845 in #4910
  • Würstchen model by @kashif in #3849
  • [InstructPix2Pix] Fix pipeline implementation and add docs by @sayakpaul in #4844
  • [StableDiffusionXLAdapterPipeline] add adapter_conditioning_factor by @patil-suraj in #4937
  • [StableDiffusionXLAdapterPipeline] allow negative micro conds by @patil-suraj in #4941
  • [examples] T2IAdapter training script by @patil-suraj in #4934
  • [Tests] add: tests for t2i adapter training. by @sayakpaul in #4947
  • guard save model hooks to only execute on main process by @williamberman in #4929
  • [Docs] add t2i adapter entry to overview of training scripts. by @sayakpaul in #4946
  • Temp Revert "[Core] better support offloading when side loading is enabled… by @williamberman in #4927
  • Revert revert and install accelerate main by @williamberman in #4963
  • [Docs] fix: minor formatting in the Würstchen docs by @sayakpaul in #4965
  • Lazy Import for Diffusers by @DN6 in #4829
  • [Core] Remove TF import checks by @patrickvonplaten in #4968
  • Make sure Flax pipelines can be loaded into PyTorch by @patrickvonplaten in #4971
  • Update README.md by @patrickvonplaten in #4973
  • Wuerstchen fixes by @kashif in #4942
  • Refactor model offload by @patrickvonplaten in #4514
  • [Bug Fix] Should pass the dtype instead of torch_dtype by @zhiqiang-canva in #4917
  • [Utils] Correct custom init sort by @patrickvonplaten in #4967
  • remove extra gligen in import by @DN6 in #4987
  • fix E721 Do not compare types, use isinstance() by @kashif in #4992
  • [Wuerstchen] fix combined pipeline's num_images_per_prompt by @kashif in #4989
  • fix image variation slow test by @DN6 in #4995
  • fix custom diffusion tests by @DN6 in #4996
  • [Lora] Speed up lora loading by @patrickvonplaten in #4994
  • [docs] Fix DiffusionPipeline.enable_sequential_cpu_offload docstring by @dg845 in #4952
  • Fix safety checker seq offload by @patrickvonplaten in #4998
  • Fix PR template by @stevhliu in #4984
  • examples fix t2i training by @patrickvonplaten in #5001

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @xhinker
    • Add SDXL long weighted prompt pipeline (replace pr:4629) (#4661)
    • fix sdxl_lwp empty neg_prompt error issue (#4743)
  • @zideliu
    • Add reference_attn & reference_adain support for sdxl (#4502)
    • fix typo (#4822)
  • @shauray8
    • [WIP] Add Fabric (#4201)
  • @MC-E
    • add models for T2I-Adapter-XL (#4696)
  • @tuanh123789
    • Add GLIGEN Text Image implementation (#4777)
  • @Snailpong
    • [Docs] Korean translation update (#4684)
  • @harutatsuakiyama
    • [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (#4694)
  • @isamu-isozaki
    • Retrieval Augmented Diffusion Models (#3297)
Aug 31, 2023
Patch Release 0.20.2 - Correct SDXL Inpaint Strength Default

Stable Diffusion XL's strength default was accidentally set to 1.0 when creating the pipeline. The default should be set to 0.9999 instead. This patch release fixes that.

All commits

  • [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
Aug 28, 2023
Patch Release: Fix `torch.compile()` support for ControlNets

https://github.com/huggingface/diffusers/commit/3eb498e7b4868bca7460d41cda52d33c3ede5502#r125606630 introduced a 🐛 that broke the torch.compile() support for ControlNets. This patch release fixes that.

All commits

  • [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
  • [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
Aug 17, 2023
v0.20.0: SDXL ControlNets with MultiControlNet, GLIGEN, Tiny Autoencoder, SDXL DreamBooth LoRA in free-tier Colab, and more

SDXL ControlNets 🚀

The 🧨 diffusers team has trained two ControlNets on Stable Diffusion XL (SDXL):

You can find all the SDXL ControlNet checkpoints here, including some smaller ones (5 to 7x smaller).

To know more about how to use these ControlNets to perform inference, check out the respective model cards and the documentation. To train custom SDXL ControlNets, you can try out our training script.

MultiControlNet for SDXL

This release also introduces support for combining multiple ControlNets trained on SDXL and performing inference with them. Refer to the documentation to learn more.

GLIGEN

The GLIGEN model was developed by researchers and engineers from University of Wisconsin-Madison, Columbia University, and Microsoft. The StableDiffusionGLIGENPipeline can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes, if input images are given, this pipeline can insert objects described by text at the region defined by bounding boxes. Otherwise, it’ll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It’s trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.

(GIF from the official website)

Grounded inpainting

import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image

# Insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
    "masterful/gligen-1-4-inpainting-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

input_image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gligen/livingroom_modern.png"
)
prompt = "a birthday cake"
boxes = [[0.2676, 0.6088, 0.4773, 0.7183]]
phrases = ["a birthday cake"]

images = pipe(
    prompt=prompt,
    gligen_phrases=phrases,
    gligen_inpaint_image=input_image,
    gligen_boxes=boxes,
    gligen_scheduled_sampling_beta=1,
    output_type="pil",
    num_inference_steps=50,
).images
images[0].save("./gligen-1-4-inpainting-text-box.jpg")

Grounded generation

import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image

# Generate an image described by the prompt and
# insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
    "masterful/gligen-1-4-generation-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage"
boxes = [[0.1387, 0.2051, 0.4277, 0.7090], [0.4980, 0.4355, 0.8516, 0.7266]]
phrases = ["a waterfall", "a modern high speed train running through the tunnel"]

images = pipe(
    prompt=prompt,
    gligen_phrases=phrases,
    gligen_boxes=boxes,
    gligen_scheduled_sampling_beta=1,
    output_type="pil",
    num_inference_steps=50,
).images
images[0].save("./gligen-1-4-generation-text-box.jpg")

Refer to the documentation to learn more.

Thanks to @nikhil-masterful for contributing GLIGEN in #4441.

Tiny Autoencoder

@madebyollin trained two Autoencoders (on Stable Diffusion and Stable Diffusion XL, respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use AutoencoderTiny to take advantage of it.

Here’s the example usage for Stable Diffusion:

import torch
from diffusers import DiffusionPipeline, AutoencoderTiny

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")

Refer to the documentation to learn more. Refer to this material to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like LoRA, fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.

Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via bitsandbytes), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.

Check out the Colab Notebook to learn more.

Thanks to @ethansmith2000 for improving the gradient checkpointing support in #4474.

Support of push_to_hub for models, schedulers, and pipelines

Our models, schedulers, and pipelines now support an option of push_to_hub via the save_pretrained() and also come with a push_to_hub() method. Below are some examples of usage.

Models

from diffusers import ControlNetModel

controlnet = ControlNetModel(
    block_out_channels=(32, 64),
    layers_per_block=2,
    in_channels=4,
    down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
    cross_attention_dim=32,
    conditioning_embedding_out_channels=(16, 32),
)
controlnet.push_to_hub("my-controlnet-model")
# or controlnet.save_pretrained("my-controlnet-model", push_to_hub=True)

Schedulers

from diffusers import DDIMScheduler

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
)
scheduler.push_to_hub("my-controlnet-scheduler")

Pipelines

from diffusers import (
    UNet2DConditionModel,
    AutoencoderKL,
    DDIMScheduler,
    StableDiffusionPipeline,
)
from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer

unet = UNet2DConditionModel(
    block_out_channels=(32, 64),
    layers_per_block=2,
    sample_size=32,
    in_channels=4,
    out_channels=4,
    down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
    up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
    cross_attention_dim=32,
)

scheduler = DDIMScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
)

vae = AutoencoderKL(
    block_out_channels=[32, 64],
    in_channels=3,
    out_channels=3,
    down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
    up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
    latent_channels=4,
)

text_encoder_config = CLIPTextConfig(
    bos_token_id=0,
    eos_token_id=2,
    hidden_size=32,
    intermediate_size=37,
    layer_norm_eps=1e-05,
    num_attention_heads=4,
    num_hidden_layers=5,
    pad_token_id=1,
    vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")

components = {
    "unet": unet,
    "scheduler": scheduler,
    "vae": vae,
    "text_encoder": text_encoder,
    "tokenizer": tokenizer,
    "safety_checker": None,
    "feature_extractor": None,
}
pipeline = StableDiffusionPipeline(**components)
pipeline.push_to_hub("my-pipeline")

Refer to the documentation to know more.

Thanks to @Wauplin for his generous and constructive feedback (refer to this #4218) on this feature.

Better support for loading Kohya-trained LoRA checkpoints

Providing seamless support for loading Kohya-trained LoRA checkpoints from diffusers is important for us. This is why we continue to improve our load_lora_weights() method. Check out the documentation to know more about what’s currently supported and the current limitations.

Thanks to @isidentical for extending their help in improving this support.

Better documentation for prompt weighting

Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. compel provides an easy way to do prompt weighting compatible with diffusers. To this end, we have worked on an improved guide. Check it out here.

Defaulting to serialize with .safetensors

Starting with this release, we will default to using .safetensors as our preferred serialization method. This change is reflected in all the training examples that we officially support.

All commits

  • 0.20.0dev0 by @patrickvonplaten in #4299
  • update Kandinsky doc by @yiyixuxu in #4301
  • [Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
  • Fix SDXL conversion from original to diffusers by @duongna21 in #4280
  • fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
  • [Local loading] Correct bug with local files only by @patrickvonplaten in #4318
  • Fix typo documentation by @echarlaix in #4320
  • fix validation option for dreambooth training example by @xinyangli in #4317
  • [Tests] add test for pipeline import. by @sayakpaul in #4276
  • Honor the SDXL 1.0 licensing from the training scripts. by @sayakpaul in #4319
  • Update README_sdxl.md to correct the header by @sayakpaul in #4330
  • [SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
  • correct doc string for default value of guidance_scale by @Tanupriya-Singh in #4339
  • [ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
  • Fix repeat of negative prompt by @kathath in #4335
  • [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
  • [Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
  • fix fp type in t2i adapter docs by @williamberman in #4350
  • Update README.md to have PyPI-friendly path by @sayakpaul in #4351
  • [SDXL-IP2P] Add gif for demonstrating training processes by @harutatsuakiyama in #4342
  • [SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in #4370
  • Clean up duplicate lines in encode_prompt by @avoroshilov in #4369
  • minor doc fixes. by @sayakpaul in #4380
  • Update docs of unet_1d.py by @nishant42491 in #4394
  • [AutoPipeline] Correct naming by @patrickvonplaten in #4420
  • [ldm3d] documentation fixing typos by @estelleafl in #4284
  • Cleanup pass for flaky Slow Tests for Stable diffusion by @DN6 in #4415
  • support from_single_file for SDXL inpainting by @yiyixuxu in #4408
  • fix test_float16_inference by @yiyixuxu in #4412
  • train dreambooth fix pre encode class prompt by @williamberman in #4395
  • [docs] Fix SDXL docstring by @stevhliu in #4397
  • Update documentation by @echarlaix in #4422
  • remove mentions of textual inversion from sdxl. by @sayakpaul in #4404
  • [LoRA] Fix SDXL text encoder LoRAs by @sayakpaul in #4371
  • [docs] AutoPipeline tutorial by @stevhliu in #4273
  • [Pipelines] Add community pipeline for Zero123 by @kxhit in #4295
  • [Feat] add tiny Autoencoder for (almost) instant decoding by @sayakpaul in #4384
  • can call encode_prompt with out setting a text encoder instance variable by @williamberman in #4396
  • Accept pooled_prompt_embeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by @cmdr2 in #4309
  • Prevent online access when desired when using download_from_original_stable_diffusion_ckpt by @w4ffl35 in #4271
  • move tests to nightly by @DN6 in #4451
  • auto type conversion by @isNeil in #4270
  • Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by @Georgehe4 in #4454
  • Add rank argument to train_dreambooth_lora_sdxl.py by @levi in #4343
  • [docs] Distilled SD by @stevhliu in #4442
  • Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by @cmdr2 in #4298
  • fix typo to ensure make test-examples work correctly by @statelesshz in #4329
  • Fix bug caused by typo by @HeliosZhao in #4357
  • Delete the duplicate code for the contolnet img 2 img by @VV-A-VV in #4411
  • Support different strength for Stable Diffusion TensorRT Inpainting pipeline by @jinwonkim93 in #4216
  • add sdxl to prompt weighting by @patrickvonplaten in #4439
  • a few fix for kandinsky combined pipeline by @yiyixuxu in #4352
  • fix-format by @yiyixuxu in #4458
  • Cleanup Pass on flaky slow tests for Stable Diffusion by @DN6 in #4455
  • Fixed multi-token textual inversion training by @manosplitsis in #4452
  • TensorRT Inpaint pipeline: minor fixes by @asfiyab-nvidia in #4457
  • [Tests] Adds integration tests for SDXL LoRAs by @sayakpaul in #4462
  • Update README_sdxl.md by @patrickvonplaten in #4472
  • [SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by @patrickvonplaten in #4470
  • Add a data_dir parameter to the load_dataset method. by @AisingioroHao0 in #4482
  • [Examples] Support train_text_to_image_lora_sdxl.py by @okotaku in #4365
  • Log global_step instead of epoch to tensorboard by @mrlzla in #4493
  • Update lora.md to clarify SDXL support by @sayakpaul in #4503
  • [SDXL LoRA] fix batch size lora by @patrickvonplaten in #4509
  • Make sure fp16-fix is used as default by @patrickvonplaten in #4510
  • grad checkpointing by @ethansmith2000 in #4474
  • move pipeline only when running validation by @patrickvonplaten in #4515
  • Moving certain pipelines slow tests to nightly by @DN6 in #4469
  • add pipeline_class_name argument to Stable Diffusion conversion script by @yiyixuxu in #4461
  • Fix misc typos by @Georgehe4 in #4479
  • fix indexing issue in sd reference pipeline by @DN6 in #4531
  • Copy lora functions to XLPipelines by @wooyeolBaek in #4512
  • introduce minimalistic reimplementation of SDXL on the SDXL doc by @cloneofsimo in #4532
  • Fix push_to_hub in train_text_to_image_lora_sdxl.py example by @ra100 in #4535
  • Update README_sdxl.md to include the free-tier Colab Notebook by @sayakpaul in #4540
  • Changed code that converts tensors to PIL images in the write_your_own_pipeline notebook by @jere357 in #4489
  • Move slow tests to nightly by @DN6 in #4526
  • pin ruff version for quality checks by @DN6 in #4539
  • [docs] Clean scheduler api by @stevhliu in #4204
  • Move controlnet load local tests to nightly by @DN6 in #4543
  • Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by @patrickvonplaten in #4548
  • fix some typo error by @VV-A-VV in #4546
  • improve controlnet sdxl docs now that we have a good checkpoint. by @sayakpaul in #4556
  • [Doc] update sdxl-controlnet repo name by @yiyixuxu in #4564
  • [docs] Expand prompt weighting by @stevhliu in #4516
  • [docs] Remove attention slicing by @stevhliu in #4518
  • [docs] Add safetensors flag by @stevhliu in #4245
  • Convert Stable Diffusion ControlNet to TensorRT by @dotieuthien in #4465
  • Remove code snippets containing is_safetensors_available() by @chiral-carbon in #4521
  • Fixing repo_id regex validation error on windows platforms by @Mystfit in #4358
  • [Examples] fix: network_alpha -> network_alphas by @sayakpaul in #4572
  • [docs] Fix ControlNet SDXL docstring by @stevhliu in #4582
  • [Utility] adds an image grid utility by @sayakpaul in #4576
  • Fixed invalid pipeline_class_name parameter. by @AisingioroHao0 in #4590
  • Fix git-lfs command typo in docs by @clairefro in #4586
  • [Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by @sayakpaul in #4574
  • [Pipeline utils] feat: implement push_to_hub for standalone models, schedulers as well as pipelines by @sayakpaul in #4128
  • An invalid clerical error in sdxl finetune by @XDUWQ in #4608
  • [Docs] fix links in the controlling generation doc. by @sayakpaul in #4612
  • add: pushtohubmixin to pipelines and schedulers docs overview. by @sayakpaul in #4607
  • add: train to text image with sdxl script. by @sayakpaul in #4505
  • Add GLIGEN implementation by @nikhil-masterful in #4441
  • Update text2image.md to fix the links by @sayakpaul in #4626
  • Fix unipc use_karras_sigmas exception - fixes huggingface/diffusers#4580 by @reimager in #4581
  • [research_projects] SDXL controlnet script by @patil-suraj in #4633
  • [Core] feat: MultiControlNet support for SDXL ControlNet pipeline by @sayakpaul in #4597
  • [docs] PushToHubMixin by @stevhliu in #4622
  • [docs] MultiControlNet by @stevhliu in #4635
  • fix loading custom text encoder when using from_single_file by @DN6 in #4571
  • make things clear in the controlnet sdxl doc. by @sayakpaul in #4644
  • Fix UnboundLocalError during LoRA loading by @slessans in #4523
  • Support higher dimension LoRAs by @isidentical in #4625
  • [Safetensors] Make safetensors the default way of saving weights by @patrickvonplaten in #4235

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @kxhit
    • [Pipelines] Add community pipeline for Zero123 (#4295)
  • @okotaku
    • [Examples] Support train_text_to_image_lora_sdxl.py (#4365)
  • @dotieuthien
    • Convert Stable Diffusion ControlNet to TensorRT (#4465)
  • @nikhil-masterful
    • Add GLIGEN implementation (#4441)
Jul 30, 2023
Patch release: Fix incorrect filenaming

0.19.3 is a patch release to make sure import diffusers works without transformers being installed.

It includes a fix of this issue.

All commits

[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/4370

Jul 28, 2023
Patch Release: Support for SDXL Kohya-style LoRAs, Fix batched inference SDXL Img2Img, Improve watermarker

We still had some bugs 🐛 in 0.19.1 some bugs, notably:

SDXL (Kohya-style) LoRA

The official SD-XL 1.0 LoRA (Kohya-styled) is now supported thanks to https://github.com/huggingface/diffusers/pull/4287. You can try it as follows:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors")
pipe.to("cuda")

prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle"
negative_prompt = "text, watermark"

image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=25).images[0]

In addition, a couple more SDXL LoRAs are now supported:

(SDXL 0.9:)

To know more details and the known limitations, please check out the documentation.

Thanks to @isidentical for their sincere help in the PR.

Batched inference

@bghira found that for SDXL Img2Img batched inference led to weird artifacts. That is fixed in: https://github.com/huggingface/diffusers/pull/4327.

Downloads

Under some circumstances SD-XL 1.0 can download ONNX weights which is corrected in https://github.com/huggingface/diffusers/pull/4338.

Improved SDXL behavior

https://github.com/huggingface/diffusers/pull/4346 allows the user to disable the watermarker under certain circumstances to improve the usability of SDXL.

All commits:

  • [SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
  • [ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
  • [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
  • [Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
Jul 27, 2023
Patch Release: Fix torch compile and local_files_only

In 0.19.0 some bugs :bug: found their way into the release. We're very sorry about this :pray:

This patch releases fixes all of them.

All commits

  • update Kandinsky doc by @yiyixuxu in #4301
  • [Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
  • Fix SDXL conversion from original to diffusers by @duongna21 in #4280
  • fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
  • [Local loading] Correct bug with local files only by @patrickvonplaten in #4318
  • Release: v0.19.1 by @patrickvonplaten (direct commit on v0.19.1-patch)
Jul 26, 2023
v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter

SDXL 1.0

Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image

Many additional cool features are released:

  • Pipelines for
    • Img2Img
    • Inpainting
  • Torch compile support
  • Model offloading
  • Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter

Refer to the documentation to know more.

New training scripts for SDXL

When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:

Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.

New pipelines for SDXL

The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:

  • StableDiffusionXLControlNetPipeline
  • StableDiffusionXLInstructPix2PixPipeline

The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.

Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.

The AutoPipeline API

We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:

Here is how to use one:

from diffusers import AutoPipelineForTextToImage
import torch

pipe_t2i = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")

prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")

Without any extra memory, you can then switch to Image-to-Image

from diffusers import AutoPipelineForImageToImage

pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)

image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")

Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.

Refer to the documentation to know more.

A new “combined pipeline” for the Kandinsky series

We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:

from diffusers import AutoPipelineForTextToImage
import torch

pipe = AutoPipelineForTextToImage.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0] 
image.save("image.png")

The following pipelines, which can be accessed via the "Auto" pipelines were added:

To know more, check out the following pages:

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels.

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)

# For PyTorch and Numpy input
mask = 1 - mask

Asymmetric VQGAN

Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:

from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline

def download_image(url: str) -> Image.Image:
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")

image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")

Refer to the documentation to know more.

Thanks to @cross-attention for contributing this model in #3956.

Improved support for loading Kohya-style LoRA checkpoints

We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.

Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.

T2I Adapter

pip install matplotlib
from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline

def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
    """Converts a depth map to a color image.

    Args:
        value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
        vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
        vmax (float, optional):  vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
        cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
        invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
        invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
        background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
        gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
        value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.

    Returns:
        numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
    """
    if isinstance(value, torch.Tensor):
        value = value.detach().cpu().numpy()

    value = value.squeeze()
    if invalid_mask is None:
        invalid_mask = value == invalid_val
    mask = np.logical_not(invalid_mask)

    # normalize
    vmin = np.percentile(value[mask],2) if vmin is None else vmin
    vmax = np.percentile(value[mask],85) if vmax is None else vmax
    if vmin != vmax:
        value = (value - vmin) / (vmax - vmin)  # vmin..vmax
    else:
        # Avoid 0-division
        value = value * 0.

    # squeeze last dim if it exists
    # grey out the invalid values

    value[invalid_mask] = np.nan
    cmapper = matplotlib.cm.get_cmap(cmap)
    if value_transform:
        value = value_transform(value)
        # value = value / value.max()
    value = cmapper(value, bytes=True)  # (nxmx4)

    img = value[...]
    img[invalid_mask] = background_color

    if gamma_corrected:
        img = img / 255
        img = np.power(img, 2.2)
        img = img * 255
        img = img.astype(np.uint8)
    return img

model = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)

img = Image.open('./images/zoedepth_in.png')

out = model.infer_pil(img)

zoedepth_image = Image.fromarray(colorize(out)).convert('RGB')

zoedepth_image.save('images/zoedepth.png')

adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_zoedepth_sd15v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)

pipe.to('cuda')
zoedepth_image_out = pipe(prompt="motorcycle", image=zoedepth_image).images[0]

zoedepth_image_out.save('images/zoedepth_out.png')

All commits

  • 📝 Fix broken link to models documentation by @kadirnar in #4026
  • move to 0.19.0dev by @patrickvonplaten in #4048
  • [SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
  • Correct sdxl docs by @patrickvonplaten in #4058
  • Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
  • Update train_unconditional.py by @hjmnbnb in #3899
  • Trigger CI on ci-* branches by @Wauplin in #3635
  • Fix kandinsky remove safety by @patrickvonplaten in #4065
  • Multiply lr scheduler steps by num_processes. by @eliphatfs in #3983
  • [Community] Implementation of the IADB community pipeline by @tchambon in #3996
  • add kandinsky to readme table by @yiyixuxu in #4081
  • [From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
  • fix requirement in SDXL by @killah-t-cell in #4082
  • fix: minor things in the SDXL docs. by @sayakpaul in #4070
  • [Invisible watermark] Correct version by @patrickvonplaten in #4087
  • [Feat] add: utility for unloading lora. by @sayakpaul in #4034
  • [tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
  • Allow low precision vae sd xl by @patrickvonplaten in #4083
  • [SD-XL] Add inpainting by @patrickvonplaten in #4098
  • [Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
  • [From ckpt] replace with os path join by @patrickvonplaten in #3746
  • [From single file] Make accelerate optional by @patrickvonplaten in #4132
  • add noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911
  • Make setup.py compatible with pipenv by @apoorvaeternity in #4121
  • 📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
  • t2i pipeline by @williamberman in #3932
  • [Docs] Korean translation update by @Snailpong in #4022
  • [Enhance] Add rank in dreambooth by @okotaku in #4112
  • Refactor execution device & cpu offload by @patrickvonplaten in #4114
  • Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
  • [Core] add: controlnet support for SDXL by @sayakpaul in #4038
  • Docs/bentoml integration by @larme in #4090
  • Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
  • feat: add act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994
  • Add controlnet and vae from single file by @patrickvonplaten in #4084
  • fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
  • Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
  • Asymmetric vqgan by @cross-attention in #3956
  • Shap-E: add support for mesh output by @yiyixuxu in #4062
  • [From single file] Make sure that controlnet stays False for from_single_file by @patrickvonplaten in #4181
  • [ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
  • remove bentoml doc in favor of blogpost by @williamberman in #4182
  • Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
  • [Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
  • make enable_sequential_cpu_offload more generic for third-party devices by @statelesshz in #4191
  • Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156
  • [SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
  • 📄 Renamed File for Better Understanding by @kadirnar in #4056
  • [docs] Clean up pipeline apis by @stevhliu in #3905
  • docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
  • [fix] network_alpha when loading unet lora from old format by @Jackmin801 in #4221
  • fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
  • fix a bug of prompt embeds in sdxl by @xiaohu2015 in #4099
  • Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
  • [SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
  • [SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
  • [docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
  • Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
  • [docs] Other modalities by @stevhliu in #4205
  • docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
  • [Docs] Fix from pretrained docs by @patrickvonplaten in #4240
  • [ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
  • [SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
  • Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
  • do not pass list to accelerator.init_trackers by @williamberman in #4248
  • [From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
  • [SDXL] Improve docs by @patrickvonplaten in #4196
  • [draft v2] AutoPipeline by @yiyixuxu in #4138
  • Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
  • [from_single_file] Fix circular import by @patrickvonplaten in #4259
  • Model path for sdxl wrong in dreambooth README by @rrva in #4261
  • [SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
  • [docs] Fix image in SDXL docs by @stevhliu in #4267
  • [SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
  • Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
  • Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
  • [Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
  • Where did this 'x' come from, Elon? by @camenduru in #4277
  • add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
  • Rename by @patrickvonplaten in #4294

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Snailpong
    • [Docs] Korean translation update (#4022)
  • @clarencechen
    • Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
  • @cross-attention
    • Asymmetric vqgan (#3956)
  • @AmericanPresidentJimmyCarter
    • Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
  • @harutatsuakiyama
    • [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)
Jul 11, 2023
Patch Release: v0.18.2

Patch release to fix:

    1. torch.compile for SD-XL for certain GPUs
    1. from_single_file for all SD models
    1. Fix broken ONNX export
    1. Fix incorrect VAE FP16 casting
    1. Deprecate loading variants that don't exist

Note:

Loading any stable diffusion safetensors or ckpt with StableDiffusionPipeline.from_single_file or StableDiffusionmg2ImgIPipeline.from_single_file or StableDiffusionInpaintPipeline.from_single_file or StableDiffusionXLPipeline.from_single_file, ...

is now almost as fast as from_pretrained(...) and it's much more tested now.

All commits:

  • Make sure torch compile doesn't access unet config by @patrickvonplaten in #4008
  • [DiffusionPipeline] Deprecate not throwing error when loading non-existant variant by @patrickvonplaten in #4011
  • Correctly keep vae in float16 when using PyTorch 2 or xFormers by @pcuenca in #4019
  • minor improvements to the SDXL doc. by @sayakpaul in #3985
  • Remove remaining not in upscale pipeline by @pcuenca in #4020
  • FIX force_download in download utility by @Wauplin in #4036
  • Improve single loading file by @patrickvonplaten in #4041
  • keep _use_default_values as a list type by @oOraph in #4040
Jul 7, 2023
Patch Release for Stable Diffusion XL 0.9

Patch release 0.18.1: Stable Diffusion XL 0.9 Research Release

Stable Diffusion XL 0.9 is now fully supported under the SDXL 0.9 Research License license here.

Having received access to stabilityai/stable-diffusion-xl-base-0.9, you can easily use it with diffusers:

Text-to-Image

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]

Refining the image output

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
image = refiner(prompt=prompt, image=image[None, :]).images[0]

Loading single file checkpoitns / original file format

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

Memory optimization via model offloading

- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

and

- refiner.to("cuda")
+ refiner.enable_model_cpu_offload()

Speed-up inference with torch.compile

+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)

Note: If you're running the model with < torch 2.0, please make sure to run:

+pipe.enable_xformers_memory_efficient_attention()
+refiner.enable_xformers_memory_efficient_attention()

For more details have a look at the official docs.

All commits

  • typo in safetensors (safetenstors) by @YoraiLevi in #3976
  • Fix code snippet for Audio Diffusion by @osanseviero in #3987
  • feat: add Dropout to Flax UNet by @SauravMaheshkar in #3894
  • Add 'rank' parameter to Dreambooth LoRA training script by @isidentical in #3945
  • Don't use bare prints in a library by @cmd410 in #3991
  • [Tests] Fix some slow tests by @patrickvonplaten in #3989
  • Add sdxl prompt embeddings by @patrickvonplaten in #3995
Jul 6, 2023
Shap-E, Consistency Models, Video2Video

Shap-E

Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.

We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.

Text to 3D

import torch
from diffusers import ShapEPipeline
from diffusers.utils import export_to_gif

ckpt_id = "openai/shap-e"
pipe = ShapEPipeline.from_pretrained(ckpt_id).to("cuda")

guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

gif_path = export_to_gif(images[0], "cake_3d.gif")

Image to 3D

import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image

ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(ckpt_id).to("cuda")

img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"
image = load_image(img_url)

generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0

images = pipe(
    image, 
    num_images_per_prompt=batch_size, 
    generator=generator, 
    guidance_scale=guidance_scale,
    num_inference_steps=64, 
    frame_size =256, 
    output_type="pil"
).images

gif_path = export_to_gif(images[0], "burger_sampled_3d.gif")

Original image

Generated

For more details, check out the official documentation.

The model was contributed by @yiyixuxu in https://github.com/huggingface/diffusers/pull/3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")

For more details, see the official docs.

The model was contributed by our community members @dg845 and @ayushtues in https://github.com/huggingface/diffusers/pull/3492.

Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()

# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()

prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)

For more details, check out the official docs.

It was contributed by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/3900.

All commits

  • remove seed by @yiyixuxu in #3734
  • Correct Token to upload docs by @patrickvonplaten in #3744
  • Correct another push token by @patrickvonplaten in #3745
  • [Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
  • [Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
  • [documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
  • Text2video zero refinements by @19and99 in #3733
  • [Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
  • [MultiControlNet] Allow save and load by @patrickvonplaten in #3747
  • Update pipeline_flax_stable_diffusion_controlnet.py by @jfozard in #3306
  • update conversion script for Kandinsky unet by @yiyixuxu in #3766
  • [docs] Fix Colab notebook cells by @stevhliu in #3777
  • [Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
  • [Enhance] Update reference by @okotaku in #3723
  • Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
  • Fix some bad comment in training scripts by @patrickvonplaten in #3798
  • Added LoRA loading to StableDiffusionKDiffusionPipeline by @tripathiarpan20 in #3751
  • UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
  • feat: add PR template. by @sayakpaul in #3786
  • Ldm3d first PR by @estelleafl in #3668
  • Complete set_attn_processor for prior and vae by @patrickvonplaten in #3796
  • fix typo by @Isotr0py in #3800
  • manual check for checkpoints_total_limit instead of using accelerate by @williamberman in #3681
  • [train text to image] add note to loading from checkpoint by @williamberman in #3806
  • device map legacy attention block weight conversion by @williamberman in #3804
  • [docs] Zero SNR by @stevhliu in #3776
  • [ldm3d] Fixed small typo by @estelleafl in #3820
  • [Examples] Improve the model card pushed from the train_text_to_image.py script by @sayakpaul in #3810
  • [Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
  • [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
  • Update control_brightness.mdx by @dqueue in #3825
  • Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
  • Add ddpm kandinsky by @yiyixuxu in #3783
  • [docs] More API stuff by @stevhliu in #3835
  • relax tol attention conversion test by @williamberman in #3842
  • fix: random module seeding by @sayakpaul in #3846
  • fix audio_diffusion tests by @teticio in #3850
  • Correct bad attn naming by @patrickvonplaten in #3797
  • [Conversion] Small fixes by @patrickvonplaten in #3848
  • Fix some audio tests by @patrickvonplaten in #3841
  • [Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
  • Update Habana Gaudi doc by @regisss in #3863
  • Add guidance start/stop by @holwech in #3770
  • feat: rename single-letter vars in resnet.py by @SauravMaheshkar in #3868
  • Fixing the global_step key not found by @VincentNeemie in #3844
  • Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
  • fix sde add noise typo by @UranusITS in #3839
  • [Tests] add test for checking soft dependencies. by @sayakpaul in #3847
  • [Enhance] Add LoRA rank args in train_text_to_image_lora by @okotaku in #3866
  • [docs] Model API by @stevhliu in #3562
  • fix/docs: Fix the broken doc links by @Aisuko in #3897
  • Add video img2img by @patrickvonplaten in #3900
  • fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
  • fix/doc: no import torch issue by @Aisuko in #3923
  • Correct controlnet out of list error by @patrickvonplaten in #3928
  • Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
  • [ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
  • Improve memory text to video by @patrickvonplaten in #3930
  • revert automatic chunking by @patrickvonplaten in #3934
  • avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
  • Fix failing np tests by @patrickvonplaten in #3942
  • Add timestep_spacing and steps_offset to schedulers by @pcuenca in #3947
  • Add Consistency Models Pipeline by @dg845 in #3492
  • Update consistency_models.mdx by @sayakpaul in #3961
  • Make UNet2DConditionOutput pickle-able by @prathikr in #3857
  • [Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
  • [Text-to-video] Add torch.compile() compatibility by @sayakpaul in #3949
  • [SD-XL] Add new pipelines by @patrickvonplaten in #3859
  • Kandinsky 2.2 by @cene555 in #3903
  • Add Shap-E by @yiyixuxu in #3742
  • disable num attenion heads by @patrickvonplaten in #3969
  • Improve SD XL by @patrickvonplaten in #3968
  • fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @estelleafl
    • Ldm3d first PR (#3668)
    • [ldm3d] Fixed small typo (#3820)
    • [ldm3d] Update code to be functional with the new checkpoints (#3875)
  • @AndyShih12
    • [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
  • @dg845
    • Add Consistency Models Pipeline (#3492)
Jun 12, 2023
Patch Release: v0.17.1

Patch release to fix timestep for inpainting

  • Stable Diffusion Inpaint & ControlNet inpaint - Correct timestep inpaint in #3749 by @patrickvonplaten
Jun 8, 2023
v0.17.0 Improved LoRA, Kandinsky 2.1, Torch Compile Speed-up & More

Kandinsky 2.1

Kandinsky 2.1 inherits best practices from DALL-E 2 and Latent Diffusion while introducing some new ideas.

Installation

pip install diffusers transformers accelerate

Code example

from diffusers import DiffusionPipeline
import torch

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")

prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manual_seed(12)
image_embeds, negative_image_embeds = pipe_prior(prompt, negative_prompt, guidance_scale=1.0, generator=generator).to_tuple()

image = t2i_pipe(prompt, negative_prompt=negative_prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds).images[0]
image.save("cheeseburger_monster.png")

To learn more about the Kandinsky pipelines, and more details about speed and memory optimizations, please have a look at the docs.

Thanks @ayushtues, for helping with the integration of Kandinsky 2.1!

UniDiffuser

UniDiffuser introduces a multimodal diffusion process that is capable of handling different generation tasks using a single unified approach:

  • Unconditional image and text generation
  • Joint image-text generation
  • Text-to-image generation
  • Image-to-text generation
  • Image variation
  • Text variation

Below is an example of how to use UniDiffuser for text-to-image generation:

import torch
from diffusers import UniDiffuserPipeline

model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to("cuda")

# This mode can be inferred from the input provided to the `pipe`. 
pipe.set_text_to_image_mode()

prompt = "an elephant under the sea"
sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0).images[0]
sample.save("elephant.png")

Check out the UniDiffuser docs to know more.

UniDiffuser was added by @dg845 in this PR.

LoRA

We're happy to support the A1111 formatted CivitAI LoRA checkpoints in a limited capacity.

First, download a checkpoint. We’ll use this one for demonstration purposes.

wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors

Next, we initialize a DiffusionPipeline:

import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "gsdf/Counterfeit-V2.5", torch_dtype=torch.float16, safety_checker=None
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config, use_karras_sigmas=True
)

We then load the checkpoint downloaded from CivitAI:

pipeline.load_lora_weights(".", weight_name="light_and_shadow.safetensors")

(If you’re loading a checkpoint in the safetensors format, please ensure you have safetensors installed.)

And then it’s time for running inference:

prompt = "masterpiece, best quality, 1girl, at dusk"
negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), "
                   "bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")

images = pipeline(prompt=prompt, 
    negative_prompt=negative_prompt, 
    width=512, 
    height=768, 
    num_inference_steps=15, 
    num_images_per_prompt=4,
    generator=torch.manual_seed(0)
).images

Below is a comparison between the LoRA and the non-LoRA results:

Check out the docs to learn more.

Thanks to @takuma104 for contributing this feature via this PR.

Torch 2.0 Compile Speed-up

We introduced Torch 2.0 support for computing attention efficiently in 0.13.0. Since then, we have made a number of improvements to ensure the number of "graph breaks" in our models is reduced so that the models can be compiled with torch.compile(). As a result, we are happy to report massive improvements in the inference speed of our most popular pipelines. Check out this doc to know more.

Thanks to @Chillee for helping us with this. Thanks to @patrickvonplaten for fixing the problems stemming from "graph breaks" in this PR.

VAE pre-processing

We added a Vae Image processor class that provides a unified API for pipelines to prepare their image inputs, as well as post-processing their outputs. It supports resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.

With that, all Stable diffusion pipelines now accept image inputs in the format of Pytorch Tensor and Numpy array, in addition to PIL Image, and can produce outputs in these 3 formats. It will also accept and return latents. This means you can now take generated latents from one pipeline and pass them to another as inputs, without leaving the latent space. If you work with multiple pipelines, you can pass Pytorch Tensor between them without converting to PIL Image.

To learn more about the API, check out our doc here

ControlNet Img2Img & Inpainting

ControlNet is one of the most used diffusion models and upon strong demand from the community we added controlnet img2img and controlnet inpaint pipelines. This allows to use any controlnet checkpoint for both image-2-image setting as well as for inpaint.

:point_right: Inpaint: See controlnet inpaint model here :point_right: Image-to-Image: Any controlnet checkpoint can be used for image to image, e.g.:

from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch

import cv2
from PIL import Image

# download an image
image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
np_image = np.array(image)

# get canny image
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

# generate image
generator = torch.manual_seed(0)
image = pipe(
    "futuristic-looking woman",
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
).images[0]

Diffedit Zero-Shot Inpainting Pipeline

This pipeline (introduced in DiffEdit: Diffusion-based semantic image editing with mask guidance) allows for image editing with natural language. Below is an end-to-end example.

First, let’s load our pipeline:

import torch
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionDiffEditPipeline

sd_model_ckpt = "stabilityai/stable-diffusion-2-1"
pipeline = StableDiffusionDiffEditPipeline.from_pretrained(
    sd_model_ckpt,
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
pipeline.enable_vae_slicing()
generator = torch.manual_seed(0)

Then, we load an input image to edit using our method:

from diffusers.utils import load_image

img_url = "https://github.com/Xiang-cd/DiffEdit-stable-diffusion/raw/main/assets/origin.png"
raw_image = load_image(img_url).convert("RGB").resize((768, 768))

Then, we employ the source and target prompts to generate the editing mask:

source_prompt = "a bowl of fruits"
target_prompt = "a basket of fruits"
mask_image = pipeline.generate_mask(
    image=raw_image,
    source_prompt=source_prompt,
    target_prompt=target_prompt,
    generator=generator,
) 

Then, we employ the caption and the input image to get the inverted latents:

inv_latents = pipeline.invert(prompt=source_prompt, image=raw_image, generator=generator).latents

Now, generate the image with the inverted latents and semantically generated mask:

image = pipeline(
    prompt=target_prompt,
    mask_image=mask_image,
    image_latents=inv_latents,
    generator=generator,
    negative_prompt=source_prompt,
).images[0]
image.save("edited_image.png")

Check out the docs to learn more about this pipeline.

Thanks to @clarencechen for contributing this pipeline in this PR.

Docs

Apart from these, we have made multiple improvements to the overall quality-of-life of our docs.

Thanks to @stevhliu for leading the charge here.

Misc

  • xformers attention processor fix when using LoRA (PR by @takuma104)
  • Pytorch 2.0 SDPA implementation of the LoRA attention processor (PR)

All commits

  • Post release for 0.16.0 by @patrickvonplaten in #3244
  • [docs] only mention one stage by @pcuenca in #3246
  • Write model card in controlnet training script by @pcuenca in #3229
  • [2064]: Add stochastic sampler (sample_dpmpp_sde) by @nipunjindal in #3020
  • [Stochastic Sampler][Slow Test]: Cuda test fixes by @nipunjindal in #3257
  • Remove required from tracker_project_name by @pcuenca in #3260
  • adding required parameters while calling the get_up_block and get_down_block by @init-22 in #3210
  • [docs] Update interface in repaint.mdx by @ernestchu in #3119
  • Update IF name to XL by @apolinario in #3262
  • fix typo in score sde pipeline by @fecet in #3132
  • Fix typo in textual inversion JAX training script by @jairtrejo in #3123
  • AudioDiffusionPipeline - fix encode method after config changes by @teticio in #3114
  • Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline"" by @patrickvonplaten in #3265
  • Fix community pipelines by @patrickvonplaten in #3266
  • update notebook by @yiyixuxu in #3259
  • [docs] add notes for stateful model changes by @williamberman in #3252
  • [LoRA] quality of life improvements in the loading semantics and docs by @sayakpaul in #3180
  • [Community Pipelines] EDICT pipeline implementation by @Joqsan in #3153
  • [Docs]zh translated docs update by @DrDavidS in #3245
  • Update logging.mdx by @standardAI in #2863
  • Add multiple conditions to StableDiffusionControlNetInpaintPipeline by @timegate in #3125
  • Let's make sure that dreambooth always uploads to the Hub by @patrickvonplaten in #3272
  • Diffedit Zero-Shot Inpainting Pipeline by @clarencechen in #2837
  • add constant learning rate with custom rule by @jason9075 in #3133
  • Allow disabling torch 2_0 attention by @patrickvonplaten in #3273
  • [doc] add link to training script by @yiyixuxu in #3271
  • temp disable spectogram diffusion tests by @williamberman in #3278
  • Changed sample[0] to images[0] by @IliaLarchenko in #3304
  • Typo in tutorial by @IliaLarchenko in #3295
  • Torch compile graph fix by @patrickvonplaten in #3286
  • Postprocessing refactor img2img by @yiyixuxu in #3268
  • [Torch 2.0 compile] Fix more torch compile breaks by @patrickvonplaten in #3313
  • fix: scale_lr and sync example readme and docs. by @sayakpaul in #3299
  • Update stable_diffusion.mdx by @mu94-csl in #3310
  • Fix missing variable assign in DeepFloyd-IF-II by @gitmylo in #3315
  • Correct doc build for patch releases by @patrickvonplaten in #3316
  • Add Stable Diffusion RePaint to community pipelines by @Markus-Pobitzer in #3320
  • Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) by @LuChengTHU in #3314
  • [docs] Improve LoRA docs by @stevhliu in #3311
  • Added input pretubation by @isamu-isozaki in #3292
  • Update write_own_pipeline.mdx by @csaybar in #3323
  • update controlling generation doc with latest goodies. by @sayakpaul in #3321
  • [Quality] Make style by @patrickvonplaten in #3341
  • Fix config dpm by @patrickvonplaten in #3343
  • Add the SDE variant of DPM-Solver and DPM-Solver++ by @LuChengTHU in #3344
  • Add upsample_size to AttnUpBlock2D, AttnDownBlock2D by @will-rice in #3275
  • Rename --only_save_embeds to --save_as_full_pipeline by @arrufat in #3206
  • [AudioLDM] Generalise conversion script by @sanchit-gandhi in #3328
  • Fix TypeError when using prompt_embeds and negative_prompt by @At-sushi in #2982
  • Fix pipeline class on README by @themrzmaster in #3345
  • Inpainting: typo in docs by @LysandreJik in #3331
  • Add use_Karras_sigmas to LMSDiscreteScheduler by @Isotr0py in #3351
  • Batched load of textual inversions by @pdoane in #3277
  • [docs] Fix docstring by @stevhliu in #3334
  • if dreambooth lora by @williamberman in #3360
  • Postprocessing refactor all others by @yiyixuxu in #3337
  • [docs] Improve safetensors docstring by @stevhliu in #3368
  • add: a warning message when using xformers in a PT 2.0 env. by @sayakpaul in #3365
  • StableDiffusionInpaintingPipeline - resize image w.r.t height and width by @rupertmenneer in #3322
  • [docs] Adapt a model by @stevhliu in #3326
  • [docs] Load safetensors by @stevhliu in #3333
  • [Docs] Fix stable_diffusion.mdx typo by @sudowind in #3398
  • Support ControlNet v1.1 shuffle properly by @takuma104 in #3340
  • [Tests] better determinism by @sayakpaul in #3374
  • [docs] Add transformers to install by @stevhliu in #3388
  • [deepspeed] partial ZeRO-3 support by @stas00 in #3076
  • Add omegaconf for tests by @patrickvonplaten in #3400
  • Fix various bugs with LoRA Dreambooth and Dreambooth script by @patrickvonplaten in #3353
  • Fix docker file by @patrickvonplaten in #3402
  • fix: deepseepd_plugin retrieval from accelerate state by @sayakpaul in #3410
  • [Docs] Add sigmoid beta_scheduler to docstrings of relevant Schedulers by @Laurent2916 in #3399
  • Don't install accelerate and transformers from source by @patrickvonplaten in #3415
  • Don't install transformers and accelerate from source by @patrickvonplaten in #3414
  • Improve fast tests by @patrickvonplaten in #3416
  • attention refactor: the trilogy by @williamberman in #3387
  • [Docs] update the PT 2.0 optimization doc with latest findings by @sayakpaul in #3370
  • Fix style rendering by @pcuenca in #3433
  • unCLIP scheduler do not use note by @williamberman in #3417
  • Replace deprecated command with environment file by @jongwooo in #3409
  • fix warning message pipeline loading by @patrickvonplaten in #3446
  • add stable diffusion tensorrt img2img pipeline by @asfiyab-nvidia in #3419
  • Refactor controlnet and add img2img and inpaint by @patrickvonplaten in #3386
  • [Scheduler] DPM-Solver (++) Inverse Scheduler by @clarencechen in #3335
  • [Docs] Fix incomplete docstring for resnet.py by @Laurent2916 in #3438
  • fix tiled vae blend extent range by @superlabs-dev in #3384
  • Small update to "Next steps" section by @pcuenca in #3443
  • Allow arbitrary aspect ratio in IFSuperResolutionPipeline by @devxpy in #3298
  • Adding 'strength' parameter to StableDiffusionInpaintingPipeline by @rupertmenneer in #3424
  • [WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded by @vimarshc in #3448
  • Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) by @7eu7d7 in #3404
  • Make dreambooth lora more robust to orig unet by @patrickvonplaten in #3462
  • Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) by @cmdr2 in #3463
  • Add min snr to text2img lora training script by @wfng92 in #3459
  • Add inpaint lora scale support by @Glaceon-Hyy in #3460
  • [From ckpt] Fix from_ckpt by @patrickvonplaten in #3466
  • Update full dreambooth script to work with IF by @williamberman in #3425
  • Add IF dreambooth docs by @williamberman in #3470
  • parameterize pass single args through tuple by @williamberman in #3477
  • attend and excite tests disable determinism on the class level by @williamberman in #3478
  • dreambooth docs torch.compile note by @williamberman in #3471
  • add: if entry in the dreambooth training docs. by @sayakpaul in #3472
  • [docs] Textual inversion inference by @stevhliu in #3473
  • [docs] Distributed inference by @stevhliu in #3376
  • [{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices by @williamberman in #3479
  • mps & onnx tests rework by @pcuenca in #3449
  • [Attention processor] Better warning message when shifting to AttnProcessor2_0 by @sayakpaul in #3457
  • [Docs] add note on local directory path. by @sayakpaul in #3397
  • Refactor full determinism by @patrickvonplaten in #3485
  • Fix DPM single by @patrickvonplaten in #3413
  • Add use_Karras_sigmas to DPMSolverSinglestepScheduler by @Isotr0py in #3476
  • Adds local_files_only bool to prevent forced online connection by @w4ffl35 in #3486
  • [Docs] Korean translation (optimization, training) by @Snailpong in #3488
  • DataLoader respecting EXIF data in Training Images by @Ambrosiussen in #3465
  • feat: allow disk offload for diffuser models by @hari10599 in #3285
  • [Community] reference only control by @okotaku in #3435
  • Support for cross-attention bias / mask by @Birch-san in #2634
  • do not scale the initial global step by gradient accumulation steps when loading from checkpoint by @williamberman in #3506
  • Fix bug in panorama pipeline when using dpmsolver scheduler by @Isotr0py in #3499
  • [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU by @yingjie-han in #3105
  • [Community] ControlNet Reference by @okotaku in #3508
  • Allow custom pipeline loading by @patrickvonplaten in #3504
  • Make sure Diffusers works even if Hub is down by @patrickvonplaten in #3447
  • Improve README by @patrickvonplaten in #3524
  • Update README.md by @patrickvonplaten in #3525
  • Run torch.compile tests in separate subprocesses by @pcuenca in #3503
  • fix attention mask pad check by @williamberman in #3531
  • explicit broadcasts for assignments by @williamberman in #3535
  • [Examples/DreamBooth] refactor save_model_card utility in dreambooth examples by @sayakpaul in #3543
  • Fix panorama to support all schedulers by @Isotr0py in #3546
  • Add open parti prompts to docs by @patrickvonplaten in #3549
  • Add Kandinsky 2.1 by @yiyixuxu @ayushtues in #3308
  • fix broken change for vq pipeline by @yiyixuxu in #3563
  • [Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting by @patrickvonplaten in #3533
  • Fix loaded_token reference before definition by @eminn in #3523
  • renamed variable to input_ and output_ by @vikasmech in #3507
  • Correct inpainting controlnet docs by @patrickvonplaten in #3572
  • Fix controlnet guess mode euler by @patrickvonplaten in #3571
  • [docs] Add AttnProcessor to docs by @stevhliu in #3474
  • [WIP] Add UniDiffuser model and pipeline by @dg845 in #2963
  • Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled by @takuma104 in #3556
  • fix dreambooth attention mask by @linbo0518 in #3541
  • [IF super res] correctly normalize PIL input by @williamberman in #3536
  • [docs] Maintenance by @stevhliu in #3552
  • [docs] update the broken links by @brandonJY in #3568
  • [docs] Working with different formats by @stevhliu in #3534
  • remove print statements from attention processor. by @sayakpaul in #3592
  • Fix temb attention by @patrickvonplaten in #3607
  • [docs] update the broken links by @kadirnar in #3577
  • [UniDiffuser Tests] Fix some tests by @sayakpaul in #3609
  • #3487 Fix inpainting strength for various samplers by @rupertmenneer in #3532
  • [Community] Support StableDiffusionTilingPipeline by @kadirnar in #3586
  • [Community, Enhancement] Add reference tricks in README by @okotaku in #3589
  • [Feat] Enable State Dict For Textual Inversion Loader by @ghunkins in #3439
  • [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline by @TheDenk in #3587
  • fix tests by @patrickvonplaten in #3614
  • Make sure we also change the config when setting encoder_hid_dim_type=="text_proj" and allow xformers by @patrickvonplaten in #3615
  • goodbye frog by @williamberman in #3617
  • update code to reflect latest changes as of May 30th by @prathikr in #3616
  • update dreambooth lora to work with IF stage II by @williamberman in #3560
  • Full Dreambooth IF stage II upscaling by @williamberman in #3561
  • [Docs] include the instruction-tuning blog link in the InstructPix2Pix docs by @sayakpaul in #3644
  • [Kandinsky] Improve kandinsky API a bit by @patrickvonplaten in #3636
  • Support Kohya-ss style LoRA file format (in a limited capacity) by @takuma104 in #3437
  • Iterate over unique tokens to avoid duplicate replacements for multivector embeddings by @lachlan-nicholson in #3588
  • fixed typo in example train_text_to_image.py by @kashif in #3608
  • fix inpainting pipeline when providing initial latents by @yiyixuxu in #3641
  • [Community Doc] Updated the filename and readme file. by @kadirnar in #3634
  • add Stable Diffusion TensorRT Inpainting pipeline by @asfiyab-nvidia in #3642
  • set config from original module but set compiled module on class by @williamberman in #3650
  • dreambooth if docs - stage II, more info by @williamberman in #3628
  • linting fix by @williamberman in #3653
  • Set step_rules correctly for piecewise_constant scheduler by @0x1355 in #3605
  • Allow setting num_cycles for cosine_with_restarts lr scheduler by @0x1355 in #3606
  • [docs] Load A1111 LoRA by @stevhliu in #3629
  • dreambooth upscaling fix added latents by @williamberman in #3659
  • Correct multi gpu dreambooth by @patrickvonplaten in #3673
  • Fix from_ckpt not working properly on windows by @LyubimovVladislav in #3666
  • Update Compel documentation for textual inversions by @pdoane in #3663
  • [UniDiffuser test] fix one test so that it runs correctly on V100 by @sayakpaul in #3675
  • [docs] More API fixes by @stevhliu in #3640
  • [WIP]Vae preprocessor refactor (PR1) by @yiyixuxu in #3557
  • small tweaks for parsing thibaudz controlnet checkpoints by @williamberman in #3657
  • move activation dispatches into helper function by @williamberman in #3656
  • [docs] Fix link to loader method by @stevhliu in #3680
  • Add function to remove monkey-patch for text encoder LoRA by @takuma104 in #3649
  • [LoRA] feat: add lora attention processor for pt 2.0. by @sayakpaul in #3594
  • refactor Image processor for x4 upscaler by @yiyixuxu in #3692
  • feat: when using PT 2.0 use LoRAAttnProcessor2_0 for text enc LoRA. by @sayakpaul in #3691
  • Fix the Kandinsky docstring examples by @freespirit in #3695
  • Support views batch for panorama by @Isotr0py in #3632
  • Fix from_ckpt for Stable Diffusion 2.x by @ctrysbita in #3662
  • Add draft for lora text encoder scale by @patrickvonplaten in #3626

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @nipunjindal
    • [2064]: Add stochastic sampler (sample_dpmpp_sde) (#3020)
    • [Stochastic Sampler][Slow Test]: Cuda test fixes (#3257)
  • @clarencechen
    • Diffedit Zero-Shot Inpainting Pipeline (#2837)
    • [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
  • @Markus-Pobitzer
    • Add Stable Diffusion RePaint to community pipelines (#3320)
  • @takuma104
    • Support ControlNet v1.1 shuffle properly (#3340)
    • Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled (#3556)
    • Support Kohya-ss style LoRA file format (in a limited capacity) (#3437)
    • Add function to remove monkey-patch for text encoder LoRA (#3649)
  • @asfiyab-nvidia
    • add stable diffusion tensorrt img2img pipeline (#3419)
    • add Stable Diffusion TensorRT Inpainting pipeline (#3642)
  • @Snailpong
    • [Docs] Korean translation (optimization, training) (#3488)
  • @okotaku
    • [Community] reference only control (#3435)
    • [Community] ControlNet Reference (#3508)
    • [Community, Enhancement] Add reference tricks in README (#3589)
  • @Birch-san
    • Support for cross-attention bias / mask (#2634)
  • @yingjie-han
    • [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU (#3105)
  • @dg845
    • [WIP] Add UniDiffuser model and pipeline (#2963)
  • @kadirnar
    • [docs] update the broken links (#3577)
    • [Community] Support StableDiffusionTilingPipeline (#3586)
    • [Community Doc] Updated the filename and readme file. (#3634)
  • @TheDenk
    • [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline (#3587)
  • @prathikr
    • update code to reflect latest changes as of May 30th (#3616)
Apr 28, 2023
Patch Release: v0.16.1

v0.16.1: Patch Release to fix IF naming, community pipeline versioning, and to allow disable VAE PT 2 attention

  • merge conflict by @apolinario (direct commit on v0.16.1-patch)
  • Fix community pipelines by @patrickvonplaten in #3266
  • Allow disabling torch 2_0 attention by @patrickvonplaten in #3273
Apr 26, 2023
v0.16.0 DeepFloyd IF & ControlNet v1.1

DeepFloyd's IF: The open-sourced Imagen

IF

IF is a pixel-based text-to-image generation model and was released in late April 2023 by DeepFloyd.

The model architecture is strongly inspired by Google's closed-sourced Imagen and a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding:

Installation

pip install torch --upgrade  # diffusers' IF is optimized for torch 2.0
pip install diffusers --upgrade

Accept the License

Before you can use IF, you need to accept its usage conditions. To do so:

  1. Make sure to have a Hugging Face account and be logged in
  2. Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0
  3. Log-in locally
from huggingface_hub import login

login()

and enter your Hugging Face Hub access token.

Code example

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {
    "feature_extractor": stage_1.feature_extractor,
    "safety_checker": stage_1.safety_checker,
    "watermarker": stage_1.watermarker,
}
stage_3 = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16
)
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
generator = torch.manual_seed(1)

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

# stage 1
image = stage_1(
    prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_I.png")
# stage 2
image = stage_2(
    image=image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    generator=generator,
    output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")
# stage 3
image = stage_3(prompt=prompt, image=image, noise_level=100, generator=generator).images
image[0].save("./if_stage_III.png")

For more details about speed and memory optimizations, please have a look at the blog or docs below.

Useful links

:point_right: The official codebase :point_right: Blog post :point_right: Space Demo :point_right: In-detail docs

ControlNet v1.1

Lvmin Zhang has released improved ControlNet checkpoints as well as a couple of new ones.

You can find all :firecracker: Diffusers checkpoints here Please have a look directly at the model cards on how to use the checkpoins:

Improved checkpoints:

Model NameControl Image OverviewControl Image ExampleGenerated Image Example
lllyasviel/control_v11p_sd15_canny<br/> Trained with canny edge detectionA monochrome image with white edges on a black background.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_mlsd<br/> Trained with multi-level line segment detectionAn image with annotated line segments.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11f1p_sd15_depth<br/> Trained with depth estimationAn image with depth information, usually represented as a grayscale image.<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_normalbae<br/> Trained with surface normal estimationAn image with surface normal information, usually represented as a color-coded image.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_seg<br/> Trained with image segmentationAn image with segmented regions, usually represented as a color-coded image.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_lineart<br/> Trained with line art generationAn image with line art, usually black lines on a white background.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_openpose<br/> Trained with human pose estimationAn image with human poses, usually represented as a set of keypoints or skeletons.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_scribble<br/> Trained with scribble-based image generationAn image with scribbles, usually random or user-drawn strokes.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_softedge<br/> Trained with soft edge image generationAn image with soft edges, usually to create a more painterly or artistic effect.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"/></a>

New checkpoints:

Model NameControl Image OverviewControl Image ExampleGenerated Image Example
lllyasviel/control_v11e_sd15_ip2p<br/> Trained with pixel to pixel instructionNo condition .<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15_inpaint<br/> Trained with image inpaintingNo condition.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"/></a>
lllyasviel/control_v11e_sd15_shuffle<br/> Trained with image shufflingAn image with shuffled patches or regions.<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"/></a>
lllyasviel/control_v11p_sd15s2_lineart_anime<br/> Trained with anime line art generationAn image with anime-style line art.<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"/></a><a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"/></a>
 

All commits

  • [Tests] Speed up panorama tests by @sayakpaul in #3067
  • [Post release] v0.16.0dev by @patrickvonplaten in #3072
  • Adds profiling flags, computes train metrics average. by @andsteing in #3053
  • [Pipelines] Make sure that None functions are correctly not saved by @patrickvonplaten in #3080
  • doc string example remove from_pt by @yiyixuxu in #3083
  • [Tests] parallelize by @patrickvonplaten in #3078
  • Throw deprecation warning for return_cached_folder by @patrickvonplaten in #3092
  • Allow SD attend and excite pipeline to work with any size output images by @jcoffland in #2835
  • [docs] Update community pipeline docs by @stevhliu in #2989
  • Add to support Guess Mode for StableDiffusionControlnetPipleline by @takuma104 in #2998
  • fix default value for attend-and-excite by @yiyixuxu in #3099
  • remvoe one line as requested by gc team by @yiyixuxu in #3077
  • ddpm custom timesteps by @williamberman in #3007
  • Fix breaking change in pipeline_stable_diffusion_controlnet.py by @remorses in #3118
  • Add global pooling to controlnet by @patrickvonplaten in #3121
  • [Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
  • [Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
  • Improve deprecation warnings by @patrickvonplaten in #3131
  • Fix config deprecation by @patrickvonplaten in #3129
  • feat: verfication of multi-gpu support for select examples. by @sayakpaul in #3126
  • speed up attend-and-excite fast tests by @yiyixuxu in #3079
  • Optimize log_validation in train_controlnet_flax by @cgarciae in #3110
  • make style by @patrickvonplaten (direct commit on main)
  • Correct textual inversion readme by @patrickvonplaten in #3145
  • Add unet act fn to other model components by @williamberman in #3136
  • class labels timestep embeddings projection dtype cast by @williamberman in #3137
  • [ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model by @cmdr2 in #2705
  • add from_ckpt method as Mixin by @1lint in #2318
  • Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils by @asfiyab-nvidia in #2974
  • Correct Transformer2DModel.forward docstring by @off99555 in #3074
  • Update pipeline_stable_diffusion_inpaint_legacy.py by @hwuebben in #2903
  • Modified altdiffusion pipline to support altdiffusion-m18 by @superhero-7 in #2993
  • controlnet training resize inputs to multiple of 8 by @williamberman in #3135
  • adding custom diffusion training to diffusers examples by @nupurkmr9 in #3031
  • Update custom_diffusion.mdx by @mishig25 in #3165
  • Added distillation for quantization example on textual inversion. by @XinyuYe-Intel in #2760
  • make style by @patrickvonplaten (direct commit on main)
  • Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
  • Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline by @clarencechen in #2942
  • [DreamBooth] add text encoder LoRA support in the DreamBooth training script by @sayakpaul in #3130
  • Update Habana Gaudi documentation by @regisss in #3169
  • Add model offload to x4 upscaler by @patrickvonplaten in #3187
  • [docs] Deterministic algorithms by @stevhliu in #3172
  • Update custom_diffusion.mdx to credit the author by @sayakpaul in #3163
  • Fix TensorRT community pipeline device set function by @asfiyab-nvidia in #3157
  • make from_flax work for controlnet by @yiyixuxu in #3161
  • [docs] Clarify training args by @stevhliu in #3146
  • Multi Vector Textual Inversion by @patrickvonplaten in #3144
  • Add Karras sigmas to HeunDiscreteScheduler by @youssefadr in #3160
  • [AudioLDM] Fix dtype of returned waveform by @sanchit-gandhi in #3189
  • Fix bug in train_dreambooth_lora by @crywang in #3183
  • [Community Pipelines] Update lpw_stable_diffusion pipeline by @SkyTNT in #3197
  • Make sure VAE attention works with Torch 2_0 by @patrickvonplaten in #3200
  • Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" by @williamberman in #3201
  • [Bug fix] Fix batch size attention head size mismatch by @patrickvonplaten in #3214
  • fix mixed precision training on train_dreambooth_inpaint_lora by @themrzmaster in #3138
  • adding enable_vae_tiling and disable_vae_tiling functions by @init-22 in #3225
  • Add ControlNet v1.1 docs by @patrickvonplaten in #3226
  • Fix issue in maybe_convert_prompt by @pdoane in #3188
  • Sync cache version check from transformers by @ychfan in #3179
  • Fix docs text inversion by @patrickvonplaten in #3166
  • add model by @patrickvonplaten in #3230
  • Allow return pt x4 by @patrickvonplaten in #3236
  • Allow fp16 attn for x4 upscaler by @patrickvonplaten in #3239
  • fix fast test by @patrickvonplaten in #3241
  • Adds a document on token merging by @sayakpaul in #3208
  • [AudioLDM] Update docs to use updated ckpt by @sanchit-gandhi in #3240
  • Release: v0.16.0 by @patrickvonplaten (direct commit on main)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @1lint
    • add from_ckpt method as Mixin (#2318)
  • @asfiyab-nvidia
    • Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils (#2974)
    • Fix TensorRT community pipeline device set function (#3157)
  • @nupurkmr9
    • adding custom diffusion training to diffusers examples (#3031)
  • @XinyuYe-Intel
    • Added distillation for quantization example on textual inversion. (#2760)
  • @SkyTNT
    • [Community Pipelines] Update lpw_stable_diffusion pipeline (#3197)
Apr 17, 2023
v0.15.1: Patch Release to fix safety checker, config access and uneven scheduler

Fixes bugs related to missing global pooling in controlnet, img2img processor issue with safety checker, uneven timesteps and better config deprecation

  • [Bug fix] Add global pooling to controlnet by @patrickvonplaten in #3121
  • [Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
  • [Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
  • [Bug fix] Fix config deprecation by @patrickvonplaten in #3129
Apr 12, 2023
v0.15.0 Beyond Image Generation

Taking Diffusers Beyond Image Generation

We are very excited about this release! It brings new pipelines for video and audio to diffusers, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers for other projects!

In addition to that, diffusers 0.15 includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!

🎬 Text-to-Video

Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.

Text-to-Video

Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "Spiderman is surfing"
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)

For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"

Text-to-Video Zero

Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.

For more information please have a look at PAIR/Text2Video-Zero

🔉 Audio Generation

Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models. The 0.15.0 release includes two powerful audio diffusion models.

AudioLDM

Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.

from diffusers import AudioLDMPipeline
import torch

repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

The resulting audio output can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)

For more information see cvssp/audioldm

Spectrogram Diffusion

This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.

from diffusers import SpectrogramDiffusionPipeline, MidiProcessor

pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
pipe = pipe.to("cuda")
processor = MidiProcessor()

# Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
output = pipe(processor("beethoven_hammerklavier_2.mid"))

audio = output.audios[0]

📗 New Docs

Documentation is crucially important for diffusers, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.

Check it out: https://huggingface.co/docs/diffusers/index

Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.

🪄 Stable UnCLIP

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:

from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16)
pipe.to("cuda")

# get image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
image = load_image(url)

# run image variation
image = pipe(image).images[0]

For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"

🚀 More ControlNet

ControlNet was released in diffusers in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!

Multi-ControlNet

Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[controlnet_pose, controlnet_canny]
).to("cuda")

pose_image = ...
canny_image = ...
prompt = ...

image = pipe(prompt=prompt, image=[pose_image, canny_image]).images[0]

And this is an example of how this affects generation:

Control Image1Control Image2Generated
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png"><img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png"><img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_and_canny_result_19.png">
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_pose_512x512.png">(none)<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_pose_only_result_19.png">
<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/pac_canny_512x512.png">(none)<img width="200" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/multi_controlnet/mc_canny_only_result_19.png">

ControlNet Training

We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects :exploding_head:. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.

🐈‍⬛ Textual Inversion, Revisited

Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏

Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers. Please check the updated documentation for details.

🏃 Faster loading of cached pipelines

We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:

Previous: 2.27 sec
Now: 1.1 sec

Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.

This may not sound as much, but many people use diffusers for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.

This can be further reduced by forcing diffusers to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.

🔩 Weight prompting using compel

Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:

  • Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
  • At the same time, we adopted compel, by @damian0815, as a higher-level library to create the weighted embeddings.

You don't have to use compel to create the embeddings, but if you do, this is an example of how it looks in practice:

from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
from compel import Compel

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
prompt = "a red cat playing with a ball++"
prompt_embeds = compel_proc(prompt)

image = pipe(prompt_embeds=prompt_embeds, num_inference_steps=20).images[0]

As you can see, we assign more weight to the ball word using a compel-specific syntax (ball++). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.

You can read more details in the documentation.

🎲 Karras Sigmas for schedulers

Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !

See Add Karras pattern to discrete euler in #2956 for more information.

All commits

  • Adding support for safetensors and LoRa. by @Narsil in #2448
  • [Post release] Push post release by @patrickvonplaten in #2546
  • Correct section docs by @patrickvonplaten in #2540
  • adds xformers support to train_unconditional.py by @vvvm23 in #2520
  • Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
  • Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
  • move test num_images_per_prompt to pipeline mixin by @williamberman in #2488
  • Training tutorial by @stevhliu in #2473
  • Fix regression introduced in #2448 by @Narsil in #2551
  • Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
  • [PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
  • [Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
  • Fix: controlnet docs format by @vicoooo26 in #2559
  • ema step, don't empty cuda cache by @williamberman in #2563
  • Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
  • add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
  • Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
  • [Unet1d] correct docs by @patrickvonplaten in #2565
  • [Training] Fix tensorboard typo by @patrickvonplaten in #2566
  • allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
  • Allow textual_inversion_flax script to use save_steps and revision flag by @haixinxu in #2075
  • add intermediate logging for dreambooth training script by @yiyixuxu in #2557
  • community controlnet inpainting pipelines by @williamberman in #2561
  • [docs] Move relevant code for text2image to docs by @stevhliu in #2537
  • [docs] Move DreamBooth training materials to docs by @stevhliu in #2547
  • [docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
  • Update quicktour by @stevhliu in #2463
  • Support revision in Flax text-to-image training by @pcuenca in #2567
  • fix the default value of doc by @xiaohu2015 in #2539
  • Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
  • [Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
  • [Docs] Weight prompting using compel by @patrickvonplaten in #2574
  • community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
  • Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
  • [docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
  • add deps table check updated to ci by @williamberman in #2590
  • Add notebook doc img2img by @yiyixuxu in #2472
  • [docs] Build notebooks from Markdown by @stevhliu in #2570
  • [Docs] Fix link to colab by @patrickvonplaten in #2604
  • [docs] Update unconditional image generation docs by @stevhliu in #2592
  • Add OpenVINO documentation by @echarlaix in #2569
  • Support LoRA for text encoder by @haofanwang in #2588
  • fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
  • Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
  • add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
  • Fix typos by @standardAI in #2608
  • Migrate blog content to docs by @stevhliu in #2477
  • Add cache_dir to docs by @patrickvonplaten in #2624
  • Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
  • Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
  • Up vesion at which we deprecate "revision='fp16'" since transformers is not released yet by @patrickvonplaten in #2623
  • [Tests] Split scheduler tests by @patrickvonplaten in #2630
  • Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
  • update paint by example docs by @williamberman in #2598
  • [From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
  • add translated docs by @LolitaSian in #2587
  • [Dreambooth] Editable number of class images by @Mr-Philo in #2251
  • Update quicktour.mdx by @standardAI in #2637
  • Update basic_training.mdx by @standardAI in #2639
  • controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
  • [docs] Update readme by @stevhliu in #2612
  • [Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
  • [docs] Build Jax notebooks for real by @stevhliu in #2641
  • Update loading.mdx by @standardAI in #2642
  • Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
  • Update schedulers.mdx by @standardAI in #2647
  • [attention] Fix attention by @patrickvonplaten in #2656
  • Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
  • [Tests] Adds a test suite for EMAModel by @sayakpaul in #2530
  • fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
  • image generation main process checks by @williamberman in #2631
  • [Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
  • AutoencoderKL: clamp indices of blend_h and blend_v to input size by @kig in #2660
  • Update README.md by @qwjaskzxl in #2653
  • [Lora] correct lora saving & loading by @patrickvonplaten in #2655
  • Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
  • Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
  • controlnet integration tests num_inference_steps=3 by @williamberman in #2672
  • Controlnet training by @Ttl in #2545
  • [Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
  • [Tests] fix: slow serialization test by @sayakpaul in #2678
  • Update Dockerfile CUDA by @patrickvonplaten in #2682
  • T5Attention support for cross-attention by @kashif in #2654
  • Update custom_pipeline_overview.mdx by @standardAI in #2684
  • Update kerascv.mdx by @standardAI in #2685
  • Update img2img.mdx by @standardAI in #2688
  • Update conditional_image_generation.mdx by @standardAI in #2687
  • Update controlling_generation.mdx by @standardAI in #2690
  • Update unconditional_image_generation.mdx by @standardAI in #2686
  • Add image_processor by @yiyixuxu in #2617
  • [docs] Add overviews to each section by @stevhliu in #2657
  • [docs] Create better navigation on index by @stevhliu in #2658
  • [docs] Reorganize table of contents by @stevhliu in #2671
  • Rename attention by @patrickvonplaten in #2691
  • Adding use_safetensors argument to give more control to users by @Narsil in #2123
  • [docs] Add safety checker to ethical guidelines by @stevhliu in #2699
  • train_unconditional save restore unet parameters by @williamberman in #2706
  • Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
  • fix image link in inpaint doc by @yiyixuxu in #2693
  • [docs] Update ONNX doc to use optimum by @sayakpaul in #2702
  • Enabling gradient checkpointing for VAE by @Pie31415 in #2536
  • [Tests] Correct PT2 by @patrickvonplaten in #2724
  • Update mps.mdx by @standardAI in #2749
  • Update torch2.0.mdx by @standardAI in #2748
  • Update fp16.mdx by @standardAI in #2746
  • Update dreambooth.mdx by @standardAI in #2742
  • Update philosophy.mdx by @standardAI in #2752
  • Update text_inversion.mdx by @standardAI in #2751
  • add: controlnet entry to training section in the docs. by @sayakpaul in #2677
  • Update numbers for Habana Gaudi in documentation by @regisss in #2734
  • Improve Contribution Doc by @patrickvonplaten in #2043
  • Fix typos by @apivovarov in #2715
  • [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
  • Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
  • Fix mps tests on torch 2.0 by @pcuenca in #2766
  • Add option to set dtype in pipeline.to() method by @1lint in #2317
  • stable diffusion depth batching fix by @williamberman in #2757
  • [docs] update torch 2 benchmark by @pcuenca in #2764
  • [docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
  • [MS Text To Video] Add first text to video by @patrickvonplaten in #2738
  • mps: remove warmup passes by @pcuenca in #2771
  • Support for Offset Noise in examples by @haofanwang in #2753
  • add: section on multiple controlnets. by @sayakpaul in #2762
  • [Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
  • deduplicate training section in the docs. by @sayakpaul in #2788
  • [UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
  • [doc wip] literalinclude by @mishig25 in #2718
  • Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
  • Music Spectrogram diffusion pipeline by @kashif in #1044
  • [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
  • [Docs] small fixes to the text to video doc. by @sayakpaul in #2787
  • Update train_text_to_image_lora.py by @haofanwang in #2767
  • Skip mps in text-to-video tests by @pcuenca in #2792
  • Flax controlnet by @yiyixuxu in #2727
  • [docs] Add Colab notebooks and Spaces by @stevhliu in #2713
  • Add AudioLDM by @sanchit-gandhi in #2232
  • Update train_text_to_image_lora.py by @haofanwang in #2795
  • Add ModelEditing pipeline by @bahjat-kawar in #2721
  • Relax DiT test by @kashif in #2808
  • Update onnxruntime package candidates by @PeixuanZuo in #2666
  • [Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
  • [Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
  • StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
  • Update examples README.md to include the latest examples by @sayakpaul in #2839
  • Ruff: apply same rules as in transformers by @pcuenca in #2827
  • [Tests] Fix slow tests by @patrickvonplaten in #2846
  • Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
  • Helper function to disable custom attention processors by @pcuenca in #2791
  • improve stable unclip doc. by @sayakpaul in #2823
  • add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
  • [WIP]Flax training script for controlnet by @yiyixuxu in #2818
  • Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
  • [Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
  • updated onnx pndm test by @kashif in #2811
  • [Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
  • fix KarrasVePipeline bug by @junhsss in #2828
  • StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
  • Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
  • Remove duplicate sentence in docstrings by @qqaatw in #2834
  • Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
  • Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
  • [WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
  • Add last_epoch argument to optimization.get_scheduler by @felixblanke in #2850
  • [WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
  • [2761]: Add documentation for extra_in_channels UNet1DModel by @nipunjindal in #2817
  • [Tests] Adds a test to check if image_embeds None case is handled properly in StableUnCLIPImg2ImgPipeline by @sayakpaul in #2861
  • Update evaluation.mdx by @standardAI in #2862
  • Update overview.mdx by @standardAI in #2864
  • Update alt_diffusion.mdx by @standardAI in #2865
  • Update paint_by_example.mdx by @standardAI in #2869
  • Update stable_diffusion_safe.mdx by @standardAI in #2870
  • [Docs] Correct phrasing by @patrickvonplaten in #2873
  • [Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
  • feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
  • [docs] Performance tutorial by @stevhliu in #2773
  • [Docs] add an example use for StableUnCLIPPipeline in the pipeline docs by @sayakpaul in #2897
  • add flax requirement by @yiyixuxu in #2894
  • Support fp16 in conversion from original ckpt by @burgalon in #2733
  • img2img.multiple.controlnets.pipeline by @mikegarts in #2833
  • add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
  • [docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
  • Add support Karras sigmas for StableDiffusionKDiffusionPipeline by @takuma104 in #2874
  • Fix textual inversion loading by @GuiyeC in #2914
  • Fix slow tests text inv by @patrickvonplaten in #2915
  • Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
  • Modify example with intel optimization by @mengfei25 in #2896
  • [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
  • [Tests] Speed up test by @patrickvonplaten in #2919
  • Have fix current pipeline link by @guspan-tanadi in #2910
  • Update image_variation.mdx by @standardAI in #2911
  • Update controlnet.mdx by @standardAI in #2912
  • Update pipeline_stable_diffusion_controlnet.py by @patrickvonplaten in #2917
  • Check for all different packages of opencv by @wfng92 in #2901
  • fix: norm group test for UNet3D. by @sayakpaul in #2959
  • Update euler_ancestral.mdx by @standardAI in #2932
  • Update unipc.mdx by @standardAI in #2936
  • Update score_sde_ve.mdx by @standardAI in #2937
  • Update score_sde_vp.mdx by @standardAI in #2938
  • Update ddim.mdx by @standardAI in #2926
  • Update ddpm.mdx by @standardAI in #2929
  • Removing explicit markdown extension by @guspan-tanadi in #2944
  • Ensure validation image RGB not RGBA by @ernestchu in #2945
  • Use upload_folder in training scripts by @Wauplin in #2934
  • allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
  • fix post-processing by @yiyixuxu in #2968
  • [docs] Simplify loading guide by @stevhliu in #2694
  • update flax controlnet training script by @yiyixuxu in #2951
  • [Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
  • The variable name has been updated. by @kadirnar in #2970
  • Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
  • [Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
  • [scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
  • minor fix in controlnet flax example by @yiyixuxu in #2986
  • Explain how to install test dependencies by @pcuenca in #2983
  • docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
  • add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
  • dynamic threshold sampling bug fixes and docs by @williamberman in #3003
  • Initial draft of Core ML docs by @pcuenca in #2987
  • [Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
  • Small typo correction in comments by @rogerioagjr in #3012
  • mps: skip unstable test by @pcuenca in #3037
  • Update contribution.mdx by @mishig25 in #3054
  • fix report tool by @patrickvonplaten in #3047
  • Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
  • [docs] Reusing components by @stevhliu in #3000
  • Fix imports for composable_stable_diffusion pipeline by @nthh in #3002
  • config fixes by @williamberman in #3060
  • accelerate min version for ProjectConfiguration import by @williamberman in #3042
  • AttentionProcessor.group_norm num_channels should be query_dim by @williamberman in #3046
  • Update documentation by @George-Ogden in #2996
  • Fix scheduler type mismatch by @pcuenca in #3041
  • Fix invocation of some slow Flax tests by @pcuenca in #3058
  • add only cross attention to simple attention blocks by @williamberman in #3011
  • Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
  • unet time embedding activation function by @williamberman in #3048
  • Attention processor cross attention norm group norm by @williamberman in #3021
  • Attn added kv processor torch 2.0 block by @williamberman in #3023
  • [Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
  • [LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
  • fix slow tsets by @patrickvonplaten in #3066
  • Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
  • [Docs] update Self-Attention Guidance docs by @SusungHong in #2952
  • Flax memory efficient attention by @pcuenca in #2889
  • [WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
  • fix pipeline setattr value == None by @williamberman in #3063
  • add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
  • [2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
  • Finish docs textual inversion by @patrickvonplaten in #3068
  • [Docs] refactor text-to-video zero by @sayakpaul in #3049
  • Update Flax TPU tests by @pcuenca in #3069
  • Fix a bug of pano when not doing CFG by @ernestchu in #3030
  • Text2video zero refinements by @19and99 in #3070

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Abhinay1997
    • [Community Pipeline] Unclip Image Interpolation (#2400)
  • @ssube
    • add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
    • add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (#2597)
  • @haofanwang
    • Support convert LoRA safetensors into diffusers format (#2403)
    • Support LoRA for text encoder (#2588)
    • Support for Offset Noise in examples (#2753)
    • Update train_text_to_image_lora.py (#2767)
    • Update train_text_to_image_lora.py (#2795)
  • @isamu-isozaki
    • Added multitoken training for textual inversion. Issue 369 (#661)
  • @mikegarts
    • community stablediffusion controlnet img2img pipeline (#2584)
    • img2img.multiple.controlnets.pipeline (#2833)
  • @LolitaSian
    • add translated docs (#2587)
  • @Ttl
    • Controlnet training (#2545)
  • @nipunjindal
    • [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
    • [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
    • [2761]: Add documentation for extra_in_channels UNet1DModel (#2817)
    • [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline (#2902)
    • [2905]: Add Karras pattern to discrete euler (#2956)
    • [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
  • @bahjat-kawar
    • Add ModelEditing pipeline (#2721)
    • StableDiffusionModelEditingPipeline documentation (#2810)
  • @piEsposito
    • add load textual inversion embeddings to stable diffusion (#2009)
  • @19and99
    • [Pipeline] Add TextToVideoZeroPipeline (#2954)
    • Text2video zero refinements (#3070)
  • @MuhHanif
    • Flax memory efficient attention (#2889)
Latest
v0.37.1
Tracking Since
Jul 21, 2022
Last fetched Apr 18, 2026