releases.shpreview
Hugging Face/Diffusers

Diffusers

$npx -y @buildinternet/releases show diffusers
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases2Avg0/wkVersionsv0.37.0 → v0.37.1
Mar 3, 2023
ControlNet, 8K VAE decoding

:rocket: ControlNet comes to 🧨 Diffusers!

Thanks to an amazing collaboration with community member @takuma104 🙌, diffusers fully supports ControlNet! All 8 control models from the paper are available for you to use: depth, scribbles, edges, and more. Best of all is that you can take advantage of all the other goodies and optimizations that Diffusers provides out of the box, making this an ultra fast implementation of ControlNet. Take it for a spin to see for yourself.

ControlNet works by training a copy of some of the layers of the original Stable Diffusion model on additional signals, such as depth maps or scribbles. After training, you can provide a depth map as a strong hint of the composition you want to achieve, and have Stable Diffusion fill in the details for you. For example:

<table> <tr style="text-align: center;"> <th>Before</th> <th>After</th> </tr> <tr> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/house_depth.png" width=300/></td> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/house_after.jpeg" width=300/></td> </tr> </table>

Currently, there are 8 published control models, all of which were trained on runwayml/stable-diffusion-v1-5 (i.e., Stable Diffusion version 1.5). This is an example that uses the scribble controlnet model:

<table> <tr style="text-align: center;"> <th>Before</th> <th>After</th> </tr> <tr> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/drawing_before.png" width=300/></td> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/drawing_after.jpeg" width=300/></td> </tr> </table>

Or you can turn a cartoon into a realistic photo with incredible coherence:

<img src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/lofi.jpg" height="400" alt="ControlNet showing a photo generated from a cartoon frame">

How do you use ControlNet in diffusers? Just like this (example for the canny edges control model):

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

As usual, you can use all the features in the diffusers toolbox: super-fast schedulers, memory-efficient attention, model offloading, etc. We think 🧨 Diffusers is the best way to iterate on your ControlNet experiments!

Please, refer to our blog post and documentation for details.

(And, coming soon, ControlNet training – stay tuned!)

:diamond_shape_with_a_dot_inside: VAE tiling for ultra-high resolution generation

Another community member, @kig, conceived, proposed and fully implemented an amazing PR that allows generation of ultra-high resolution images without memory blowing up 🤯. They follow a tiling approach during the image decoding phase of the process, generating a piece of the image at a time and then stitching them all together. Tiles are blended carefully to avoid visible seems between them, and the final result is amazing. This is the additional code you need to use to enjoy high-resolution generations:

pipe.vae.enable_tiling()

That's it!

For a complete example, refer to the PR or the code snippet we reproduce here for your convenience:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.vae.enable_tiling()

prompt = "a beautiful landscape photo"
image = pipe(prompt, width=4096, height=2048, num_inference_steps=10).images[0]

image.save("4k_landscape.jpg")

All commits

  • [Docs] Add a note on SDEdit by @sayakpaul in #2433
  • small bugfix at StableDiffusionDepth2ImgPipeline call to check_inputs and batch size calculation by @mikegarts in #2423
  • add demo by @yiyixuxu in #2436
  • fix: code snippet of instruct pix2pix from the docs. by @sayakpaul in #2446
  • Update train_text_to_image_lora.py by @haofanwang in #2464
  • mps test fixes by @pcuenca in #2470
  • Fix test train_unconditional by @pcuenca in #2481
  • add MultiDiffusion to controlling generation by @omerbt in #2490
  • image_noiser -> image_normalizer comment by @williamberman in #2496
  • [Safetensors] Make sure metadata is saved by @patrickvonplaten in #2506
  • Add 4090 benchmark (PyTorch 2.0) by @pcuenca in #2503
  • [Docs] Improve safetensors by @patrickvonplaten in #2508
  • Disable ONNX tests by @patrickvonplaten in #2509
  • attend and excite batch test causing timeouts by @williamberman in #2498
  • move pipeline based test skips out of pipeline mixin by @williamberman in #2486
  • pix2pix tests no write to fs by @williamberman in #2497
  • [Docs] Include more information in the "controlling generation" doc by @sayakpaul in #2434
  • Use "hub" directory for cache instead of "diffusers" by @pcuenca in #2005
  • Sequential cpu offload: require accelerate 0.14.0 by @pcuenca in #2517
  • is_safetensors_compatible refactor by @williamberman in #2499
  • [Copyright] 2023 by @patrickvonplaten in #2524
  • Bring Flax attention naming in sync with PyTorch by @pcuenca in #2511
  • [Tests] Fix slow tests by @patrickvonplaten in #2526
  • PipelineTesterMixin parameter configuration refactor by @williamberman in #2502
  • Add a ControlNet model & pipeline by @takuma104 in #2407
  • 8k Stable Diffusion with tiled VAE by @kig in #1441
  • Textual inv make save log both steps by @isamu-isozaki in #2178
  • Fix convert SD to diffusers error by @fkunn1326 in #1979)
  • Small fixes for controlnet by @patrickvonplaten in #2542
  • Fix ONNX checkpoint loading by @anton-l in #2544
  • [Model offload] Add nice warning by @patrickvonplaten in #2543

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @takuma104
    • Add a ControlNet model & pipeline (#2407)

New Contributors

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.13.0...v0.14.0

Feb 20, 2023
v0.13.1: Patch Release to fix warning when loading from `revision="fp16"`
  • fix transformers naming by @patrickvonplaten in #2430
  • remove author names. by @sayakpaul in #2428
  • Fix deprecation warning by @patrickvonplaten in #2426
  • fix the get_indices function by @yiyixuxu in #2418
  • Update pipeline_utils.py by @haofanwang in #2415
Controllable Generation: Pix2Pix0, Attend and Excite, SEGA, SAG, ...

:dart: Controlling Generation

There has been much recent work on fine-grained control of diffusion networks!

Diffusers now supports:

  1. Instruct Pix2Pix
  2. Pix2Pix 0, more details in docs
  3. Attend and excite, more details in docs
  4. Semantic guidance, more details in docs
  5. Self-attention guidance, more details in docs
  6. Depth2image
  7. MultiDiffusion panorama, more details in docs

See our doc on controlling image generation and the individual pipeline docs for more details on the individual methods.

:up: Latent Upscaler

Latent Upscaler is a diffusion model that is designed explicitly for Stable Diffusion. You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it. It is incredibly flexible and can work with any SD checkpoints.

Original output image2x upscaled output image

The model was developed by Katherine Crowson in collaboration with Stability AI

from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipeline.to("cuda")

upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
upscaler.to("cuda")

prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
generator = torch.manual_seed(33)

# we stay in latent space! Let's make sure that Stable Diffusion returns the image
# in latent space
low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images

upscaled_image = upscaler(
    prompt=prompt,
    image=low_res_latents,
    num_inference_steps=20,
    guidance_scale=0,
    generator=generator,
).images[0]

# Let's save the upscaled image under "upscaled_astronaut.png"
upscaled_image.save("astronaut_1024.png")

# as a comparison: Let's also save the low-res image
with torch.no_grad():
    image = pipeline.decode_latents(low_res_latents)
image = pipeline.numpy_to_pil(image)[0]

image.save("astronaut_512.png")

:zap: Optimization

In addition to new features and an increasing number of pipelines, diffusers cares a lot about performance. This release brings a number of optimizations that you can turn on easily.

xFormers

Memory efficient attention, as implemented by xFormers, has been available in diffusers for some time. The problem was that installing xFormers could be complicated because there were no official pip wheels (or they were outdated), and you had to resort to installing from source.

From xFormers 0.0.16, official pip wheels are now published with every release, so installing and using xFormers is now as simple as these two steps:

  1. pip install xformers in your terminal.
  2. pipe.enable_xformers_memory_efficient_attention() in your code to opt-in in your pipelines.

These actions will unlock dramatic memory savings, and usually faster inference too!

See more details in the documentation.

Torch 2.0

Speaking of memory-efficient attention, Accelerated PyTorch 2.0 Transformers now comes with built-in native support for it! When PyTorch 2.0 is released you'll no longer have to install xFormers or any third-party package to take advantage of it. In diffusers we are already preparing for that, and it works out of the box. So, if you happen to be using the latest "nightlies" of PyTorch 2.0 beta, then you're all set – diffusers will use Accelerated PyTorch 2.0 Transformers by default.

In our tests, the built-in PyTorch 2.0 implementation is usually as fast as xFormers', and sometimes even faster. Performance depends on the card you are using and whether you run your code in float16 or float32, so check our documentation for details.

Coarse-grained CPU offload

Community member @keturn, with whom we have enjoyed thoughtful software design conversations, called our attention to the fact that enabling sequential cpu offloading via enable_sequential_cpu_offload worked great to save a lot of memory, but made inference much slower.

This is because enable_sequential_cpu_offload() is optimized for memory, and it recursively works across all the submodules contained in a model, moving them to GPU when they are needed and back to CPU when another submodule needs to run. These cpu-to-gpu-to-cpu transfers happen hundreds of times during the stable diffusion denoising loops, because the UNet runs multiple times and it consists of several PyTorch modules.

This release of diffusers introduces a coarser enable_model_cpu_offload() pipeline API, which copies whole models (not modules) to GPU and makes sure they stay there until another model needs to run. The consequences are:

  • Less memory savings than enable_sequential_cpu_offload, but:
  • Almost as fast inference as when the pipeline is used without any type of offloading.

<a name="pix2pix-zero"></a>

Pix2Pix Zero

Remember the CycleGAN days where one would turn a horse into a zebra in an image while keeping the rest of the content almost untouched? Well, that day has arrived but in the context of Diffusion. Pix2Pix Zero allows users to edit a particular image (be it real or generated), targeting a source concept (horse, for example) and replacing it with a target concept (zebra, for example).

Input imageEdited image

Pix2Pix Zero was proposed in Zero-shot Image-to-Image Translation. The StableDiffusionPix2PixZeroPipeline allows you to

  1. Edit an image generated from an input prompt
  2. Provide an input image and edit it

For the latter, it uses the newly introduced DDIMInverseScheduler to first obtain the inverted noise from the input image and use that in the subsequent generation process.

Both of the use cases leverage the idea of "edit directions", used for steering the generation toward the target concept gradually from the source concept. To know more, we recommend checking out the official documentation.

<a name="attend-excite"></a>

Attend and excite

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models. Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt. It allows creating images that are more semantically faithful with respect to the input text prompts. Thanks to community contributor @evinpinar for leading the charge to add this pipeline!

  • Attend and excite 2 by @evinpinar @yiyixuxu #2369

<a name="semantic-guidance"></a>

Semantic guidance

Semantic Guidance for Diffusion Models was proposed in SEGA: Instructing Diffusion using Semantic Dimensions and provides strong semantic control over image generation. Small changes to the text prompt usually result in entirely different output images. However, with SEGA, a variety of changes to the image are enabled that can be controlled easily and intuitively and stay true to the original image composition. Thanks to the lead author of SEFA, Manuel (@manuelbrack), who added the pipeline in #2223.

Here is a simple demo:

import torch
from diffusers import SemanticStableDiffusionPipeline

pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

out = pipe(
    prompt="a photo of the face of a woman",
    num_images_per_prompt=1,
    guidance_scale=7,
    editing_prompt=[
        "smiling, smile",  # Concepts to apply
        "glasses, wearing glasses",
        "curls, wavy hair, curly hair",
        "beard, full beard, mustache",
    ],
    reverse_editing_direction=[False, False, False, False],  # Direction of guidance i.e. increase all concepts
    edit_warmup_steps=[10, 10, 10, 10],  # Warmup period for each concept
    edit_guidance_scale=[4, 5, 5, 5.4],  # Guidance scale for each concept
    edit_threshold=[
        0.99,
        0.975,
        0.925,
        0.96,
    ],  # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
    edit_momentum_scale=0.3,  # Momentum scale that will be added to the latent guidance
    edit_mom_beta=0.6,  # Momentum beta
    edit_weights=[1, 1, 1, 1, 1],  # Weights of the individual concepts against each other
)

<a name="self-attention-guidance"></a>

Self-attention guidance

SAG was proposed in Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. SAG works by extracting the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Then, the dissimilarity is measured between the predicted noise outputs obtained from feeding the blurred and original input to the diffusion model and this is further leveraged as guidance. With this guidance, the authors observe apparent improvements in a wide range of diffusion models.

import torch
from diffusers import StableDiffusionSAGPipeline
from accelerate.utils import set_seed

pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

seed = 8978
prompt = "."
guidance_scale = 7.5
num_images_per_prompt = 1

sag_scale = 1.0

set_seed(seed)
images = pipe(
    prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
).images
images[0].save("example.png")

SAG was contributed by @SusungHong (lead author of SAG) in https://github.com/huggingface/diffusers/pull/2193.

<a name="panorama"></a>

MultiDiffusion panorama

Proposed in MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, it presents a new generation process, "MultiDiffusion", based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints.

import torch
from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler

model_ckpt = "stabilityai/stable-diffusion-2-base"
scheduler = DDIMScheduler.from_pretrained(model_ckpt, subfolder="scheduler")
pipe = StableDiffusionPanoramaPipeline.from_pretrained(model_ckpt, scheduler=scheduler, torch_dtype=torch.float16)

pipe = pipe.to("cuda")

prompt = "a photo of the dolomites"
image = pipe(prompt).images[0]
image.save("dolomites.png")

The pipeline was contributed by @omerbt (lead author of MultiDiffusion Panorama) and @sayakpaul in #2393.

Ethical Guidelines

Diffusers is no stranger to the different opinions and perspectives about the challenges that generative technologies bring. Thanks to @giadilli, we have drafted our first Diffusers' Ethical Guidelines with which we hope to initiate a fruitful conversation with the community.

Keras Integration

Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. At the same time, diffusers provides a lot of options for inference, deployment and optimization. We have made it possible to easily import and use KerasCV Stable Diffusion checkpoints in diffusers, read more about the process in our new guide.

:clock3: UniPC scheduler

UniPC is a new fast scheduler in diffusion town! UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. The orginal codebase can be found here. Thanks to @wl-zhao for the great work and integrating UniPC into the diffusers!

  • add the UniPC scheduler by @wl-zhao in #2373

:runner: Training: consistent EMA support

As part of 0.13.0 we improved the support for EMA in training. We added a common EMAModel in diffusers.training_utils which can be used by all scripts. The EMAModel is improved to support distributed training, new methods to easily evaluate the EMA model during training and a consistent way to save and load the EMA model similar to other models in diffusers.

  • Fix EMA for multi-gpu training in the unconditional example by @anton-l, @patil-suraj #1930
  • [Utils] Adds store() and restore() methods to EMAModel by @sayakpaul #2302
  • Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten #2048

:dog: Ruff & black

We have replaced flake8 with ruff (much faster), and updated our version of black. These tools are now in sync with the ones used in transformers, so the contributing experience is now more consistent for people using both codebases :)

All commits

  • [lora] Fix bug with training without validation by @orenwang in #2106
  • [Bump version] 0.13.0dev0 & Deprecate predict_epsilon by @patrickvonplaten in #2109
  • [dreambooth] check the low-precision guard before preparing model by @patil-suraj in #2102
  • [textual inversion] Allow validation images by @pcuenca in #2077
  • Allow UNet2DModel to use arbitrary class embeddings by @pcuenca in #2080
  • make scaling factor a config arg of vae/vqvae by @patil-suraj in #1860
  • [Import Utils] Fix naming by @patrickvonplaten in #2118
  • Fix unable to save_pretrained when using pathlib by @Cyberes in #1972
  • fuse attention mask by @williamberman in #2111
  • Fix model card of LoRA by @hysts in #2114
  • [nit] torch_dtype used twice in doc string by @williamberman in #2126
  • [LoRA] Make sure LoRA can be disabled after it's run by @patrickvonplaten in #2128
  • remove redundant allow_patterns by @williamberman in #2130
  • Allow lora from pipeline by @patrickvonplaten in #2129
  • Fix typos in loaders.py by @kuotient in #2137
  • Typo fix: torwards -> towards by @RahulBhalley in #2134
  • Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119
  • [from_pretrained] only load config one time by @williamberman in #2131
  • Adding some safetensors docs. by @Narsil in #2122
  • Fix typo by @pcuenca in #2138
  • fix typo in EMAModel's load_state_dict() by @dasayan05 in #2151
  • [diffusers-cli] Fix typo in accelerate and transformers versions by @pcuenca in #2154
  • [Design philosopy] Create official doc by @patrickvonplaten in #2140
  • Section on using LoRA alpha / scale by @pcuenca in #2139
  • Don't copy when unwrapping model by @pcuenca in #2166
  • Add instance prompt to model card of lora dreambooth example by @hysts in #2112
  • [Bug]: fix DDPM scheduler arbitrary infer steps count. by @dudulightricks in #2076
  • [examples] Fix CLI argument in the launch script command for text2image with LoRA by @sayakpaul in #2171
  • [Breaking change] fix legacy inpaint noise and resize mask tensor by @1lint in #2147
  • Use requests instead of wget in convert_from_ckpt.py by @Abhishek-Varma in #2168
  • [Docs] Add components to docs by @patrickvonplaten in #2175
  • [Docs] remove license by @patrickvonplaten in #2188
  • Pass LoRA rank to LoRALinearLayer by @asadm in #2191
  • add: guide on kerascv conversion tool. by @sayakpaul in #2169
  • Fix a dimension bug in Transform2d by @lmxyy in #2144
  • [Loading] Better error message on missing keys by @patrickvonplaten in #2198
  • Update xFormers docs by @pcuenca in #2208
  • add CITATION.cff by @kashif in #2211
  • Create train_dreambooth_inpaint_lora.py by @thedarkzeno in #2205
  • Docs: short section on changing the scheduler in Flax by @pcuenca in #2181
  • [Bug] scheduling_ddpm: fix variance in the case of learned_range type. by @dudulightricks in #2090
  • refactor onnxruntime integration by @prathikr in #2042
  • Fix timestep dtype in legacy inpaint by @dymil in #2120
  • [nit] negative_prompt typo by @williamberman in #2227
  • removes ~s in favor of full-fledged links. by @sayakpaul in #2229
  • [LoRA] Make sure validation works in multi GPU setup by @patrickvonplaten in #2172
  • fix: flagged_images implementation by @justinmerrell in #1947
  • Hotfix textual inv logging by @isamu-isozaki in #2183
  • Fixes LoRAXFormersCrossAttnProcessor by @jorgemcgomes in #2207
  • Fix typo in StableDiffusionInpaintPipeline by @hutec in #2197
  • [Flax DDPM] Make key optional so default pipelines don't fail by @pcuenca in #2176
  • Show error when loading safety_checker from_flax by @pcuenca in #2187
  • Fix k_dpm_2 & k_dpm_2_a on MPS by @psychedelicious in #2241
  • Fix a typo: bfloa16 -> bfloat16 by @nickkolok in #2243
  • Mention training problems with xFormers 0.0.16 by @pcuenca in #2254
  • fix distributed init twice by @Fazziekey in #2252
  • Fixes prompt input checks in StableDiffusion img2img pipeline by @jorgemcgomes in #2206
  • Create convert_vae_pt_to_diffusers.py by @chavinlo in #2215
  • Stable Diffusion Latent Upscaler by @yiyixuxu in #2059
  • [Examples] Remove datasets important that is not needed by @patrickvonplaten in #2267
  • Make center crop and random flip as args for unconditional image generation by @wfng92 in #2259
  • [Tests] Fix slow tests by @patrickvonplaten in #2271
  • Fix torchvision.transforms and transforms function naming clash by @wfng92 in #2274
  • mps cross-attention hack: don't crash on fp16 by @pcuenca in #2258
  • Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten in #2048
  • Replace flake8 with ruff and update black by @patrickvonplaten in #2279
  • Textual inv save log memory by @isamu-isozaki in #2184
  • EMA: fix state_dict() and load_state_dict() & add cur_decay_value by @chenguolin in #2146
  • [Examples] Test all examples on CPU by @patrickvonplaten in #2289
  • fix pix2pix docs by @patrickvonplaten in #2290
  • misc fixes by @williamberman in #2282
  • Run same number of DDPM steps in inference as training by @bencevans in #2263
  • [LoRA] Freezing the model weights by @erkams in #2245
  • Fast CPU tests should also run on main by @patrickvonplaten in #2313
  • Correct fast tests by @patrickvonplaten in #2314
  • remove ddpm test_full_inference by @williamberman in #2291
  • convert ckpt script docstring fixes by @williamberman in #2293
  • [Community Pipeline] UnCLIP Text Interpolation Pipeline by @Abhinay1997 in #2257
  • [Tests] Refactor push tests by @patrickvonplaten in #2329
  • Add ethical guidelines by @giadilli in #2330
  • Fix running LoRA with xformers by @bddppq in #2286
  • Fix typo in load_pipeline_from_original_stable_diffusion_ckpt() method by @p1atdev in #2320
  • [Docs] Fix ethical guidelines docs by @patrickvonplaten in #2333
  • [Versatile Diffusion] Fix tests by @patrickvonplaten in #2336
  • [Latent Upscaling] Remove unused noise by @patrickvonplaten in #2298
  • [Tests] Remove unnecessary tests by @patrickvonplaten in #2337
  • karlo image variation use kakaobrain upload by @williamberman in #2338
  • github issue forum link by @williamberman in #2335
  • dreambooth checkpointing tests and docs by @williamberman in #2339
  • unet check length inputs by @williamberman in #2327
  • unCLIP variant by @williamberman in #2297
  • Log Unconditional Image Generation Samples to W&B by @bencevans in #2287
  • Fix callback type hints - no optional function argument by @patrickvonplaten in #2357
  • [Docs] initial docs about KarrasDiffusionSchedulers by @kashif in #2349
  • KarrasDiffusionSchedulers type note by @williamberman in #2365
  • [Tests] Add MPS skip decorator by @patrickvonplaten in #2362
  • Funky spacing issue by @meg-huggingface in #2368
  • schedulers add glide noising schedule by @williamberman in #2347
  • add total number checkpoints to training scripts by @williamberman in #2367
  • checkpointing_steps_total_limit->checkpoints_total_limit by @williamberman in #2374
  • Fix 3-way merging with the checkpoint_merger community pipeline by @damian0815 in #2355
  • [Variant] Add "variant" as input kwarg so to have better UX when downloading no_ema or fp16 weights by @patrickvonplaten in #2305
  • [Pipelines] Adds pix2pix zero by @sayakpaul in #2334
  • Add Self-Attention-Guided (SAG) Stable Diffusion pipeline by @SusungHong in #2193
  • [SchedulingPNDM ] reset cur_model_output after each call by @patil-suraj in #2376
  • train_text_to_image EMAModel saving by @williamberman in #2341
  • [Utils] Adds store() and restore() methods to EMAModel by @sayakpaul in #2302
  • enable_model_cpu_offload by @pcuenca in #2285
  • add the UniPC scheduler by @wl-zhao in #2373
  • Replace torch.concat calls by torch.cat by @fxmarty in #2378
  • Make diffusers importable with transformers < 4.26 by @pcuenca in #2380
  • [Examples] Make sure EMA works with any device by @patrickvonplaten in #2382
  • [Dummy imports] Add missing if else statements for SD] by @patrickvonplaten in #2381
  • Attend and excite 2 by @yiyixuxu in #2369
  • [Pix2Pix0] Add utility function to get edit vector by @patrickvonplaten in #2383
  • Revert "[Pix2Pix0] Add utility function to get edit vector" by @patrickvonplaten in #2384
  • Fix stable diffusion onnx pipeline error when batch_size > 1 by @tianleiwu in #2366
  • [Docs] Fix UniPC docs by @wl-zhao in #2386
  • [Pix2Pix Zero] Fix slow tests by @sayakpaul in #2391
  • [Pix2Pix] Add utility function by @patrickvonplaten in #2385
  • Fix UniPC tests and remove some test warnings by @pcuenca in #2396
  • [Pipelines] Add a section on generating captions and embeddings for Pix2Pix Zero by @sayakpaul in #2395
  • Torch2.0 scaled_dot_product_attention processor by @patil-suraj in #2303
  • add: inversion to pix2pix zero docs. by @sayakpaul in #2398
  • Add semantic guidance pipeline by @manuelbrack in #2223
  • Add ddim inversion pix2pix by @patrickvonplaten in #2397
  • add MultiDiffusionPanorama pipeline by @omerbt in #2393
  • Fixing typos in documentation by @anagri in #2389
  • controlling generation docs by @williamberman in #2388
  • apply_forward_hook simply returns if no accelerate by @daquexian in #2387
  • Revert "Release: v0.13.0" by @williamberman in #2405
  • controlling generation doc nits by @williamberman in #2406
  • Fix typo in AttnProcessor2_0 symbol by @pcuenca in #2404
  • add index page by @yiyixuxu in #2401
  • add xformers 0.0.16 warning message by @williamberman in #2345

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @thedarkzeno
    • Create train_dreambooth_inpaint_lora.py (#2205)
  • @prathikr
    • refactor onnxruntime integration (#2042)
  • @Abhinay1997
    • [Community Pipeline] UnCLIP Text Interpolation Pipeline (#2257)
  • @SusungHong
    • Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (#2193)
  • @wl-zhao
    • add the UniPC scheduler (#2373)
    • [Docs] Fix UniPC docs (#2386)
  • @manuelbrack
    • Add semantic guidance pipeline (#2223)
  • @omerbt
    • add MultiDiffusionPanorama pipeline (#2393)
Jan 27, 2023
v0.12.1: Patch Release to fix local files only

Make sure cached models can be loaded in offline mode.

  • Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119
Jan 25, 2023
Instruct-Pix2Pix, DiT, LoRA

🪄 Instruct-Pix2Pix

Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.

pip install diffusers transformers safetensors accelerate
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(url)

prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")
  • Add InstructPix2Pix pipeline by @patil-suraj #2040

🤖 DiT

Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.

The model was released with the paper Scalable Diffusion Models with Transformers.

import torch
from diffusers import DiTPipeline

model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

# pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)

output = pipe(class_labels=class_ids)
image = output.images[0]  # label 'white shark'

⚡ LoRA

LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.

LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.

Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel and DreamBooth training script by @patrickvonplaten in #1884.

By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:

from diffusers import StableDiffusionPipeline
import torch

model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")

prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")

You can follow these resources to know more about how to use LoRA in diffusers:

📐 Customizable Cross Attention

LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!

🌿 Flax => PyTorch

A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!

🌀 Flax Img2Img

Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!

🎲 DEIS Scheduler

DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps. The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.

from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0
  • feat : add log-rho deis multistep scheduler by @qsh-zh #1432

Reproducibility

One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures much better reproducibility across GPU hardware:

import torch
from diffusers import DDIMPipeline
import numpy as np

model_id = "google/ddpm-cifar10-32"

# load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")

# create a generator for reproducibility
generator = torch.manual_seed(0)

# run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())

See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

Important New Guides

Important Bug Fixes

  • Don't download safetensors if library is not installed: #2057
  • Make sure that save_pretrained(...) doesn't accidentally delete files: #2038
  • Fix CPU offload docs for maximum memory gain: #1968
  • Fix conversion for exotically sorted weight names: #1959
  • Fix intermediate checkpointing for textual inversion, thanks @lstein #2072

All commits

  • update composable diffusion for an updated diffuser library by @nanlliu in #1697
  • [Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
  • Bump to 0.12.0.dev0 by @anton-l in #1771
  • [Dreambooth] flax fixes by @pcuenca in #1765
  • update train_unconditional_ort.py by @prathikr in #1775
  • Only test for xformers when enabling them #1773 by @kig in #1776
  • expose polynomial:power and cosine_with_restarts:num_cycles params by @zetyquickly in #1737
  • [Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
  • Correct hf hub download by @patrickvonplaten in #1767
  • Dreambooth docs: minor fixes by @pcuenca in #1758
  • Fix num images per prompt unclip by @patil-suraj in #1787
  • Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
  • Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
  • Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
  • reorder model wrap + bug fix by @prathikr in #1799
  • Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
  • [textual_inversion] unwrap_model text encoder before accessing weights by @patil-suraj in #1816
  • fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
  • Make safety_checker optional in more pipelines by @pcuenca in #1796
  • Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
  • Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
  • Width was typod as weight by @Helw150 in #1800
  • fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
  • Make xformers optional even if it is available by @kn in #1753
  • Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
  • unCLIP image variation by @williamberman in #1781
  • [Community Pipeline] MagicMix by @daspartho in #1839
  • [Versatile Diffusion] Fix cross_attention_kwargs by @patrickvonplaten in #1849
  • [Dtype] Align dtype casting behavior with Transformers and Accelerate by @patrickvonplaten in #1725
  • [StableDiffusionInpaint] Correct test by @patrickvonplaten in #1859
  • [textual inversion] add gradient checkpointing and small fixes. by @patil-suraj in #1848
  • Flax: Fix img2img and align with other pipeline by @skirsten in #1824
  • Make repo structure consistent by @patrickvonplaten in #1862
  • [Unclip] Make sure text_embeddings & image_embeddings can directly be passed to enable interpolation tasks. by @patrickvonplaten in #1858
  • Fix ema decay by @pcuenca in #1868
  • [Docs] Improve docs by @patrickvonplaten in #1870
  • [examples] update loss computation by @patil-suraj in #1861
  • [train_text_to_image] allow using non-ema weights for training by @patil-suraj in #1834
  • [Attention] Finish refactor attention file by @patrickvonplaten in #1879
  • Fix typo in train_dreambooth_inpaint by @pcuenca in #1885
  • Update ONNX Pipelines to use np.float64 instead of np.float by @agizmo in #1789
  • [examples] misc fixes by @patil-suraj in #1886
  • Fixes to the help for report_to in training scripts by @pcuenca in #1888
  • updated doc for stable diffusion pipelines by @yiyixuxu in #1770
  • Add UnCLIPImageVariationPipeline to dummy imports by @anton-l in #1897
  • Add accelerate and xformers versions to diffusers-cli env by @anton-l in #1898
  • [addresses issue #1642] add add_noise to scheduling-sde-ve by @aengusng8 in #1827
  • Add condtional generation to AudioDiffusionPipeline by @teticio in #1826
  • Fixes in comments in SD2 D2I by @neverix in #1903
  • [Deterministic torch randn] Allow tensors to be generated on CPU by @patrickvonplaten in #1902
  • [Docs] Remove duplicated API doc string by @patrickvonplaten in #1901
  • fix: DDPMScheduler.set_timesteps() by @Joqsan in #1912
  • Fix --resume_from_checkpoint step in train_text_to_image.py by @merfnad in #1914
  • Support training SD V2 with Flax by @yasyf in #1783
  • Fix lr-scaling store_true & default=True cli argument for textual_inversion training. by @aredden in #1090
  • Various Fixes for Flax Dreambooth by @yasyf in #1782
  • Test ResnetBlock2D by @hchings in #1850
  • Init for korean docs by @seriousran in #1910
  • New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by @peterwilli in #1615
  • Improve reproduceability 2/3 by @patrickvonplaten in #1906
  • feat : add log-rho deis multistep scheduler by @qsh-zh in #1432
  • Feature/colossalai by @Fazziekey in #1793
  • [Docs] Add TRANSLATING.md file by @seriousran in #1920
  • [StableDiffusionimg2img] validating input type by @Shubhamai in #1913
  • [dreambooth] low precision guard by @williamberman in #1916
  • [Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by @patrickvonplaten in #1927
  • [Conversion] Make sure ema weights are extracted correctly by @patrickvonplaten in #1937
  • fix path to logo by @vvssttkk in #1939
  • Add automatic doc sorting by @patrickvonplaten in #1940
  • update to latest colossalai by @Fazziekey in #1951
  • fix typo in imagic_stable_diffusion.py by @andreemic in #1956
  • [Conversion SD] Make sure weirdly sorted keys work as well by @patrickvonplaten in #1959
  • allow loading ddpm models into ddim by @patrickvonplaten in #1932
  • [Community] Correct checkpoint merger by @patrickvonplaten in #1965
  • Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by @oxidase in #1938
  • [CPU offload] correct cpu offload by @patrickvonplaten in #1968
  • [Docs] Update README.md by @haofanwang in #1960
  • Research project multi subject dreambooth by @klopsahlong in #1948
  • Example tests by @patrickvonplaten in #1982
  • Fix slow tests by @patrickvonplaten in #1983
  • Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script by @kn in #1942
  • Allow converting Flax to PyTorch by adding a "from_flax" keyword by @camenduru in #1900
  • Update docstring by @Warvito in #1971
  • [SD Img2Img] resize source images to multiple of 8 instead of 32 by @vvsotnikov in #1571
  • Update README.md to include our blog post by @sayakpaul in #1998
  • Fix a couple typos in Dreambooth readme by @pcuenca in #2004
  • Add tests for 2D UNet blocks by @hchings in #1945
  • [Conversion] Support convert diffusers to safetensors by @hua1995116 in #1996
  • [Community] Fix merger by @patrickvonplaten in #2006
  • [Conversion] Improve safetensors by @patrickvonplaten in #1989
  • [Black] Update black library by @patrickvonplaten in #2007
  • Fix typos in ColossalAI example by @haofanwang in #2001
  • Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by @williamberman in #1908
  • Change PNDMPipeline to use PNDMScheduler by @willdalh in #2003
  • [train_unconditional] fix LR scheduler init by @patil-suraj in #2010
  • [Docs] No more autocast by @patrickvonplaten in #2021
  • [Flax] Add Flax inpainting impl by @xvjiarui in #1966
  • Check k-diffusion version is at least 0.0.12 by @pcuenca in #2022
  • DiT Pipeline by @kashif in #1806
  • fix dit doc header by @patil-suraj in #2027
  • [LoRA] Add LoRA training script by @patrickvonplaten in #1884
  • [Dit] Fix dit tests by @patrickvonplaten in #2034
  • Fix typos and minor redundancies by @Joqsan in #2029
  • [Lora] Model card by @patrickvonplaten in #2032
  • [Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by @patrickvonplaten in #2038
  • Fix EMA for multi-gpu training in the unconditional example by @anton-l in #1930
  • Minor fix in the documentation of LoRA by @hysts in #2045
  • Add InstructPix2Pix pipeline by @patil-suraj in #2040
  • Create repo before cloning in examples by @Wauplin in #2047
  • Remove modelcards dependency by @Wauplin in #2050
  • Module-ise "original stable diffusion to diffusers" conversion script by @damian0815 in #2019
  • [StableDiffusionInstructPix2Pix] use cpu generator in slow tests by @patil-suraj in #2051
  • [From pretrained] Don't download .safetensors files if safetensors is… by @patrickvonplaten in #2057
  • Correct Pix2Pix example by @patrickvonplaten in #2056
  • add community pipeline: StableUnCLIPPipeline by @budui in #2037
  • [LoRA] Adds example on text2image fine-tuning with LoRA by @sayakpaul in #2031
  • Safetensors loading in "convert_diffusers_to_original_stable_diffusion" by @cafeai in #2054
  • [examples] add dataloader_num_workers argument by @patil-suraj in #2070
  • Dreambooth: reduce VRAM usage by @gleb-akhmerov in #2039
  • [Paint by example] Fix cpu offload for paint by example by @patrickvonplaten in #2062
  • [textual_inversion] Fix resuming state when using gradient checkpointing by @pcuenca in #2072
  • [lora] Log images when using tensorboard by @pcuenca in #2078
  • Fix resume epoch for all training scripts except textual_inversion by @pcuenca in #2079
  • [dreambooth] fix multi on gpu. by @patil-suraj in #2088
  • Run inference on a specific condition and fix call of manual_seed() by @shirayu in #2074
  • [Feat] checkpoint_merger works on local models as well as ones that use safetensors by @lstein in #2060
  • xFormers attention op arg by @takuma104 in #2049
  • [docs] [dreambooth] note random crop by @williamberman in #2085
  • Remove wandb from text_to_image requirements.txt by @pcuenca in #2092
  • [doc] update example for pix2pix by @patil-suraj in #2101
  • Add lora tag to the model tags by @apolinario in #2103
  • [docs] Adds a doc on LoRA support for diffusers by @sayakpaul in #2086
  • Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by @patrickvonplaten in #2071
  • Improve transformers versions handling by @patrickvonplaten in #2104
  • Reproducibility 3/3 by @patrickvonplaten in #1924

🙌 Significant community contributions 🙌

The following contributors have made significant changes to the library over the last release:

  • @nanlliu
    • update composable diffusion for an updated diffuser library (#1697)
  • @skirsten
    • [Flax] Stateless schedulers, fixes and refactors (#1661)
    • Flax: Fix img2img and align with other pipeline (#1824)
  • @hchings
    • Test ResnetBlock2D (#1850)
    • Add tests for 2D UNet blocks (#1945)
  • @seriousran
    • Init for korean docs (#1910)
    • [Docs] Add TRANSLATING.md file (#1920)
  • @qsh-zh
    • feat : add log-rho deis multistep scheduler (#1432)
  • @Fazziekey
    • Feature/colossalai (#1793)
    • update to latest colossalai (#1951)
  • @klopsahlong
    • Research project multi subject dreambooth (#1948)
  • @xvjiarui
    • [Flax] Add Flax inpainting impl (#1966)
  • @damian0815
    • Module-ise "original stable diffusion to diffusers" conversion script (#2019)
  • @camenduru
    • Allow converting Flax to PyTorch by adding a "from_flax" keyword (#1900)
Dec 20, 2022
v0.11.1: Patch release

This patch release fixes a bug with num_images_per_prompt in the UnCLIPPipeline

  • Fix num images per prompt unclip by @patil-suraj in #1787
Dec 19, 2022
v0.11.0: Karlo UnCLIP, safetensors, pipeline versions

:magic_wand: Karlo UnCLIP by Kakao Brain

Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details in a small number of denoising steps.

This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M. For more information about the architecture, see the Karlo repository: https://github.com/kakaobrain/karlo

pip install diffusers transformers safetensors accelerate
import torch
from diffusers import UnCLIPPipeline

pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a high-resolution photograph of a big red frog on a green leaf."
image = pipe(prompt).images[0]

:octocat: Community pipeline versioning

The community pipelines hosted in diffusers/examples/community will now follow the installed version of the library.

E.g. if you have diffusers==0.9.0 installed, the pipelines from the v0.9.0 branch will be used: https://github.com/huggingface/diffusers/tree/v0.9.0/examples/community

If you've installed diffusers from source, e.g. with pip install git+https://github.com/huggingface/diffusers then the latest versions of the pipelines will be fetched from the main branch.

To change the custom pipeline version, set the custom_revision variable like so:

pipeline = DiffusionPipeline.from_pretrained(
     "google/ddpm-cifar10-32", custom_pipeline="one_step_unet", custom_revision="0.10.2"
)

:safety_vest: safetensors

Many of the most important checkpoints now have https://github.com/huggingface/safetensors available. Upon installing safetensors with:

pip install safetensors

You will see a nice speed-up when loading your model :rocket:

Some of the most improtant checkpoints have safetensor weights added now:

Batched generation bug fixes :bug:

  • Make sure all pipelines can run with batched input by @patrickvonplaten in #1669

We fixed a lot of bugs for batched generation. All pipelines should now correctly process batches of prompts and images :hugs: Also we made it much easier to tweak images with reproducible seeds: https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds

:memo: Changelog

  • Remove spurious arg in training scripts by @pcuenca in #1644
  • dreambooth: fix #1566: maintain fp32 wrapper when saving a checkpoint to avoid crash when running fp16 by @timh in #1618
  • Allow k pipeline to generate > 1 images by @pcuenca in #1645
  • Remove unnecessary offset in img2img by @patrickvonplaten in #1653
  • Remove unnecessary kwargs in depth2img by @maruel in #1648
  • Add text encoder conversion by @lawfordp2017 in #1559
  • VersatileDiffusion: fix input processing by @LukasStruppek in #1568
  • tensor format ort bug fix by @prathikr in #1557
  • Deprecate init image correctly by @patrickvonplaten in #1649
  • fix bug if we don't do_classifier_free_guidance by @MKFMIKU in #1601
  • Handle missing global_step key in scripts/convert_original_stable_diffusion_to_diffusers.py by @Cyberes in #1612
  • [SD] Make sure scheduler is correct when converting by @patrickvonplaten in #1667
  • [Textual Inversion] Do not update other embeddings by @patrickvonplaten in #1665
  • Added Community pipeline for comparing Stable Diffusion v1.1-4 checkpoints by @suvadityamuk in #1584
  • Fix wrong type checking in convert_diffusers_to_original_stable_diffusion.py by @apolinario in #1681
  • [Version] Bump to 0.11.0.dev0 by @patrickvonplaten in #1682
  • Dreambooth: save / restore training state by @pcuenca in #1668
  • Disable telemetry when DISABLE_TELEMETRY is set by @w4ffl35 in #1686
  • Change one-step dummy pipeline for testing by @patrickvonplaten in #1690
  • [Community pipeline] Add github mechanism by @patrickvonplaten in #1680
  • Dreambooth: use warnings instead of logger in parse_args() by @pcuenca in #1688
  • manually update train_unconditional_ort by @prathikr in #1694
  • Remove all local telemetry by @anton-l in #1702
  • Update main docs by @patrickvonplaten in #1706
  • [Readme] Clarify package owners by @anton-l in #1707
  • Fix the bug that torch version less than 1.12 throws TypeError by @chinoll in #1671
  • RePaint fast tests and API conforming by @anton-l in #1701
  • Add state checkpointing to other training scripts by @pcuenca in #1687
  • Improve pipeline_stable_diffusion_inpaint_legacy.py by @cyber-meow in #1585
  • apply amp bf16 on textual inversion by @jiqing-feng in #1465
  • Add examples with Intel optimizations by @hshen14 in #1579
  • Added a README page for docs and a "schedulers" page by @yiyixuxu in #1710
  • Accept latents as optional input in Latent Diffusion pipeline by @daspartho in #1723
  • Fix ONNX img2img preprocessing and add fast tests coverage by @anton-l in #1727
  • Fix ldm tests on master by not running the CPU tests on GPU by @patrickvonplaten in #1729
  • Docs: recommend xformers by @pcuenca in #1724
  • Nightly integration tests by @anton-l in #1664
  • [Batched Generators] This PR adds generators that are useful to make batched generation fully reproducible by @patrickvonplaten in #1718
  • Fix ONNX img2img preprocessing by @peterto in #1736
  • Fix MPS fast test warnings by @anton-l in #1744
  • Fix/update the LDM pipeline and tests by @anton-l in #1743
  • kakaobrain unCLIP by @williamberman in #1428
  • [fix] pipeline_unclip generator by @williamberman in #1751
  • unCLIP docs by @williamberman in #1754
  • Correct help text for scheduler_type flag in scripts. by @msiedlarek in #1749
  • Add resnet_time_scale_shift to VD layers by @anton-l in #1757
  • Add attention mask to uclip by @patrickvonplaten in #1756
  • Support attn2==None for xformers by @anton-l in #1759
  • [UnCLIPPipeline] fix num_images_per_prompt by @patil-suraj in #1762
  • Add CPU offloading to UnCLIP by @anton-l in #1761
  • [Versatile] fix attention mask by @patrickvonplaten in #1763
  • [Revision] Don't recommend using revision by @patrickvonplaten in #1764
  • [Examples] Update train_unconditional.py to include logging argument for Wandb by @ash0ts in #1719
  • Transformers version req for UnCLIP by @anton-l in #1766
Dec 9, 2022
v0.10.2: Patch release

This patch removes the hard requirement for transformers>=4.25.1 in case external libraries were downgrading the library upon startup in a non-controllable way.

  • do not automatically enable xformers by @patrickvonplaten in #1640
  • Adapt to forced transformers version in some dependent libraries by @anton-l in #1638
  • Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627

🚨🚨🚨 Note that xformers in not automatically enabled anymore 🚨🚨🚨

The reasons for this are given here: https://github.com/huggingface/diffusers/pull/1640#discussion_r1044651551:

We should not automatically enable xformers for three reasons:

It's not PyTorch-like API. PyTorch doesn't by default enable all the fastest options available We allocate GPU memory before the user even does .to("cuda") This behavior is not consistent with cases where xformers is not installed

=> This means: If you were used to have xformers automatically enabled, please make sure to add the following now:

from diffusers.utils.import_utils import is_xformers_available

unet = ... # load unet

if is_xformers_available():
    try:
        unet.enable_xformers_memory_efficient_attention(True)
    except Exception as e:
        logger.warning(
            "Could not enable memory efficient attention. Make sure xformers is installed"
            f" correctly and a GPU is available: {e}"
        )

for the UNet (e.g. in dreambooth) or for the pipeline:

from diffusers.utils.import_utils import is_xformers_available

pipe = ... # load pipeline

if is_xformers_available():
    try:
        pipe.enable_xformers_memory_efficient_attention(True)
    except Exception as e:
        logger.warning(
            "Could not enable memory efficient attention. Make sure xformers is installed"
            f" correctly and a GPU is available: {e}"
        )
v0.10.1: Patch release

This patch returns enable_xformers_memory_efficient_attention() to UNet2DCondition to restore backward compatibility.

  • Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627
Dec 8, 2022
v0.10.0: Depth Guidance and Safer Checkpoints

🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints

The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.

Installing the transformers library from source is required for the MiDaS model:

pip install --upgrade git+https://github.com/huggingface/transformers/
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline

pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
   "stabilityai/stable-diffusion-2-depth",
   torch_dtype=torch.float16,
).to("cuda")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)

prompt = "two tigers"
n_propmt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]

The updated Stable Diffusion 2.1 checkpoints are also released and fully supported:

:safety_vest: Safe Tensors

We now support SafeTensors: a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy).

  • [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
  • [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
FormatSafeZero-copyLazy loadingNo file size limitLayout controlFlexibilityBfloat16
pickle (PyTorch)
H5 (Tensorflow)~~
SavedModel (Tensorflow)
MsgPack (flax)
SafeTensors

**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format-

pip install safetensors
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.save_pretrained("./safe-stable-diffusion-2-1", safe_serialization=True)

# you can also push this checkpoint to the HF Hub and load from there
safe_pipe = StableDiffusionPipeline.from_pretrained("./safe-stable-diffusion-2-1")

New Pipelines

:paintbrush: Paint-by-example

An implementation of Paint by Example: Exemplar-based Image Editing with Diffusion Models by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen

  • Add paint by example by @patrickvonplaten in #1533

import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png"
mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png"
example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
example_image = download_image(example_url).resize((512, 512))

pipe = DiffusionPipeline.from_pretrained("Fantasy-Studio/Paint-by-Example", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]

Audio Diffusion and Latent Audio Diffusion

Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images.

  • add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426
from IPython.display import Audio
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")

output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))

[Experimental] K-Diffusion pipeline for Stable Diffusion

This pipeline is added to support the latest schedulers from @crowsonkb's k-diffusion The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!

  • [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
pip install k-diffusion
from diffusers import StableDiffusionKDiffusionPipeline
import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")
pipe = pipe.to("cuda")

pipe.set_scheduler("sample_heun")
image = pipe("astronaut riding horse", num_inference_steps=25).images[0]

New Schedulers

Heun scheduler inspired by Karras et. al

Algorithm 1 of Karras et. al. Scheduler ported from @crowsonkb’s k-diffusion

  • Add 2nd order heun scheduler by @patrickvonplaten in #1336
from diffusers import HeunDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)

Single step DPM-Solver

Original paper can be found here and the improved version. The original implementation can be found here.

  • Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
from diffusers import DPMSolverSinglestepScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)

:memo: Changelog

  • [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
  • Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by @anton-l in #1448
  • Speed up test and remove kwargs from call by @patrickvonplaten in #1446
  • v-prediction training support by @patil-suraj in #1455
  • Fix Flax from_pt by @pcuenca in #1436
  • Ensure Flax pipeline always returns numpy array by @pcuenca in #1435
  • Add 2nd order heun scheduler by @patrickvonplaten in #1336
  • fix slow tests by @patrickvonplaten in #1467
  • Flax support for Stable Diffusion 2 by @pcuenca in #1423
  • Updates Image to Image Inpainting community pipeline README by @vvvm23 in #1370
  • StableDiffusion: Decode latents separately to run larger batches by @kig in #1150
  • Fix bug in half precision for DPMSolverMultistepScheduler by @rtaori in #1349
  • [Train unconditional] Unwrap model before EMA by @anton-l in #1469
  • Add ort_nightly_directml to the onnxruntime candidates by @anton-l in #1458
  • Allow saving trained betas by @patrickvonplaten in #1468
  • Fix dtype model loading by @patrickvonplaten in #1449
  • [Dreambooth] Make compatible with alt diffusion by @patrickvonplaten in #1470
  • Add better docs xformers by @patrickvonplaten in #1487
  • Remove reminder comment by @pcuenca in #1489
  • Bump to 0.10.0.dev0 + deprecations by @anton-l in #1490
  • Add doc for Stable Diffusion on Habana Gaudi by @regisss in #1496
  • Replace deprecated hub utils in train_unconditional_ort by @anton-l in #1504
  • [Deprecate] Correct stacklevel by @patrickvonplaten in #1483
  • simplyfy AttentionBlock by @patil-suraj in #1492
  • Standardize on using image argument in all pipelines by @fboulnois in #1361
  • support v prediction in other schedulers by @patil-suraj in #1505
  • Fix Flax flip_sin_to_cos by @akashgokul in #1369
  • Add an explicit --image_size to the conversion script by @anton-l in #1509
  • fix heun scheduler by @patil-suraj in #1512
  • [docs] [dreambooth training] accelerate.utils.write_basic_config by @williamberman in #1513
  • [docs] [dreambooth training] num_class_images clarification by @williamberman in #1508
  • [From pretrained] Allow returning local path by @patrickvonplaten in #1450
  • Update conversion script to correctly handle SD 2 by @patrickvonplaten in #1511
  • [refactor] Making the xformers mem-efficient attention activation recursive by @blefaudeux in #1493
  • Do not use torch.long in mps by @pcuenca in #1488
  • Fix Imagic example by @dhruvrnaik in #1520
  • Fix training docs to install datasets by @pedrogengo in #1476
  • Finalize 2nd order schedulers by @patrickvonplaten in #1503
  • Fixed mask+masked_image in sd inpaint pipeline by @antoche in #1516
  • Create train_dreambooth_inpaint.py by @thedarkzeno in #1091
  • Update FlaxLMSDiscreteScheduler by @dzlab in #1474
  • [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
  • Add xformers attention to VAE by @kig in #1507
  • [CI] Add slow MPS tests by @anton-l in #1104
  • [Stable Diffusion Inpaint] Allow tensor as input image & mask by @patrickvonplaten in #1527
  • Compute embedding distances with torch.cdist by @blefaudeux in #1459
  • [Upscaling] Fix batch size by @patrickvonplaten in #1525
  • Update bug-report.yml by @patrickvonplaten in #1548
  • [Community Pipeline] Checkpoint Merger based on Automatic1111 by @Abhinay1997 in #1472
  • [textual_inversion] Add an option for only saving the embeddings by @allo- in #781
  • [examples] use from_pretrained to load scheduler by @patil-suraj in #1549
  • fix mask discrepancies in train_dreambooth_inpaint by @thedarkzeno in #1529
  • [refactor] make set_attention_slice recursive by @patil-suraj in #1532
  • Research folder by @patrickvonplaten in #1553
  • add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426
  • [Community download] Fix cache dir by @patrickvonplaten in #1555
  • [Docs] Correct docs by @patrickvonplaten in #1554
  • Fix typo by @pcuenca in #1558
  • [docs] [dreambooth training] default accelerate config by @williamberman in #1564
  • Mega community pipeline by @patrickvonplaten in #1561
  • [examples] add check_min_version by @patil-suraj in #1550
  • [dreambooth] make collate_fn global by @patil-suraj in #1547
  • Standardize fast pipeline tests with PipelineTestMixin by @anton-l in #1526
  • Add paint by example by @patrickvonplaten in #1533
  • [Community Pipeline] fix lpw_stable_diffusion by @SkyTNT in #1570
  • [Paint by Example] Better default for image width by @patrickvonplaten in #1587
  • Add from_pretrained telemetry by @anton-l in #1461
  • Correct order height & width in pipeline_paint_by_example.py by @Fantasy-Studio in #1589
  • Fix common tests for FP16 by @anton-l in #1588
  • [UNet2DConditionModel] add an option to upcast attention to fp32 by @patil-suraj in #1590
  • Flax: avoid recompilation when params change by @pcuenca in #1096
  • Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
  • fix upcast in slice attention by @patil-suraj in #1591
  • Update scheduling_repaint.py by @Randolph-zeng in #1582
  • Update RL docs for better sharing / adding models by @natolambert in #1563
  • Make cross-attention check more robust by @pcuenca in #1560
  • [ONNX] Fix flaky tests by @anton-l in #1593
  • Trivial fix for undefined symbol in train_dreambooth.py by @bcsherma in #1598
  • [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
  • [Versatile Diffusion] add upcast_attention by @patil-suraj in #1605
  • Fix PyCharm/VSCode static type checking for dummy objects by @anton-l in #1596
Nov 25, 2022
v0.9.0: Stable Diffusion 2

:art: Stable Diffusion 2 is here!

Installation

pip install diffusers[torch]==0.9 transformers

Stable Diffusion 2.0 is available in several flavors:

Stable Diffusion 2.0-V at 768x768

New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repo_id = "stabilityai/stable-diffusion-2"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, guidance_scale=9, num_inference_steps=25).images[0]
image.save("astronaut.png")

Stable Diffusion 2.0-base at 512x512

The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("astronaut.png")

Stable Diffusion 2.0 for Inpanting

This model for text-guided inpanting is finetuned from SD 2.0-base. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.

import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0]
image.save("yellow_cat.png")

Stable Diffusion X4 Upscaler

The model was trained on crops of size 512x512 and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule.

import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch

model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))

prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")

Saving & Loading is fixed for Versatile Diffusion

Previously there was a :bug: when saving & loading versatile diffusion - this is fixed now so that memory efficient saving & loading works as expected.

  • [Versatile Diffusion] Fix remaining tests by @patrickvonplaten in #1418

:memo: Changelog

  • add v prediction by @patil-suraj in #1386
  • Adapt UNet2D for supre-resolution by @patil-suraj in #1385
  • Version 0.9.0.dev0 by @anton-l in #1394
  • Make height and width optional by @patrickvonplaten in #1401
  • [Config] Add optional arguments by @patrickvonplaten in #1395
  • Upscaling fixed by @patrickvonplaten in #1402
  • Add the new SD2 attention params to the VD text unet by @anton-l in #1400
  • Deprecate sample size by @patrickvonplaten in #1406
  • Support SD2 attention slicing by @anton-l in #1397
  • Add SD2 inpainting integration tests by @anton-l in #1412
  • Fix sample size conversion script by @patrickvonplaten in #1408
  • fix clip guided by @patrickvonplaten in #1414
  • Fix all stable diffusion by @patrickvonplaten in #1415
  • [MPS] call contiguous after permute by @kashif in #1411
  • Deprecate predict_epsilon by @pcuenca in #1393
  • Fix ONNX conversion and inference by @anton-l in #1416
  • Allow to set config params directly in init by @patrickvonplaten in #1419
  • Add tests for Stable Diffusion 2 V-prediction 768x768 by @anton-l in #1420
  • StableDiffusionUpscalePipeline by @patil-suraj in #1396
  • added initial v-pred support to DPM-solver by @kashif in #1421
  • SD2 docs by @patrickvonplaten in #1424
Nov 24, 2022
v0.8.1: Patch release

This patch release fixes an error with CLIPVisionModelWithProjection imports on a non-git transformers installation.

:warning: Please upgrade with pip install --upgrade diffusers or pip install diffusers==0.8.1

Nov 23, 2022
v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model

🙆‍♀️ New Models

VersatileDiffusion

VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.

  • [Versatile Diffusion] Add versatile diffusion model by @patrickvonplaten @anton-l #1283 Make sure to install transformers from "main":
pip install git+https://github.com/huggingface/transformers

Then you can run:

from diffusers import VersatileDiffusionPipeline
import torch
import requests
from io import BytesIO
from PIL import Image

pipe = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# initial image
url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")

# prompt
prompt = "a red car"

# text to image
image = pipe.text_to_image(prompt).images[0]

# image variation
image = pipe.image_variation(image).images[0]

# image variation
image = pipe.dual_guided(prompt, image).images[0]

More in-depth details can be found on:

AltDiffusion

AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.

  • Add AltDiffusion by @patrickvonplaten @patil-suraj #1299

Stable Diffusion Image Variations

StableDiffusionImageVariationPipeline by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.

  • StableDiffusionImageVariationPipeline by @patil-suraj #1365

Safe Latent Diffusion

Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers.

  • Add Safe Stable Diffusion Pipeline by @manuelbrack #1244

VQ-Diffusion with classifier-free sampling

vq diffusion classifier free sampling by @williamberman #1294

LDM super resolution

LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.

  • Add LDM Super Resolution pipeline by @duongna21 #1116

CycleDiffusion

CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of

  1. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  2. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  • Add CycleDiffusion pipeline using Stable Diffusion by @ChenWu98 #888

CLIPSeg + StableDiffusionInpainting.

Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.

K-Diffusion wrapper

K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers models.

  • [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten #1360

🌀New SOTA Scheduler

DPMSolverMultistepScheduler is the 🧨 diffusers implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.

You can use it like this:

from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repo_id = "runwayml/stable-diffusion-v1-5"
scheduler = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)

🌐 Better scheduler API

The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.

You can load a scheduler using from_pretrained, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:

from diffusers import DiffusionPipeline, EulerDiscreteScheduler

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers, as mentioned above!

🎉 Performance

We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:

  • Enable memory-efficient attention by default if xFormers is installed.
  • Use batched-matmuls when possible.

🎁 Quality of Life improvements

  • Fix/Enable all schedulers for in-painting
  • Easier loading of local pipelines
  • cpu offloading: mutli GPU support

:memo: Changelog

  • Add multistep DPM-Solver discrete scheduler by @LuChengTHU in #1132
  • Remove warning about half precision on MPS by @pcuenca in #1163
  • Fix typo latens -> latents by @duongna21 in #1171
  • Fix community pipeline links by @pcuenca in #1162
  • [Docs] Add loading script by @patrickvonplaten in #1174
  • Fix dtype safety checker inpaint legacy by @patrickvonplaten in #1137
  • Community pipeline img2img inpainting by @vvvm23 in #1114
  • [Community Pipeline] Add multilingual stable diffusion to community pipelines by @juancopi81 in #1142
  • [Flax examples] Load text encoder from subfolder by @duongna21 in #1147
  • Link to Dreambooth blog post instead of W&B report by @pcuenca in #1180
  • Fix small typo by @pcuenca in #1178
  • [DDIMScheduler] fix noise device in ddim step by @patil-suraj in #1189
  • MPS schedulers: don't use float64 by @pcuenca in #1169
  • Warning for invalid options without "--with_prior_preservation" by @shirayu in #1065
  • [ONNX] Improve ONNXPipeline scheduler compatibility, fix safety_checker by @anton-l in #1173
  • Restore compatibility with deprecated StableDiffusionOnnxPipeline by @pcuenca in #1191
  • Update pr docs actions by @mishig25 in #1194
  • handle dtype xformers attention by @patil-suraj in #1196
  • [Scheduler] Move predict epsilon to init by @patrickvonplaten in #1155
  • add licenses to pipelines by @natolambert in #1201
  • Fix cpu offloading by @anton-l in #1177
  • Fix slow tests by @patrickvonplaten in #1210
  • [Flax] fix extra copy pasta 🍝 by @camenduru in #1187
  • [CLIPGuidedStableDiffusion] support DDIM scheduler by @patil-suraj in #1190
  • Fix layer names convert LDM script by @duongna21 in #1206
  • [Loading] Make sure loading edge cases work by @patrickvonplaten in #1192
  • Add LDM Super Resolution pipeline by @duongna21 in #1116
  • [Conversion] Improve conversion script by @patrickvonplaten in #1218
  • DDIM docs by @patrickvonplaten in #1219
  • apply repeat_interleave fix for mps to stable diffusion image2image pipeline by @jncasey in #1135
  • Flax tests: don't hardcode number of devices by @pcuenca in #1175
  • Improve documentation for the LPW pipeline by @exo-pla-net in #1182
  • Factor out encode text with Copied from by @patrickvonplaten in #1224
  • Match the generator device to the pipeline for DDPM and DDIM by @anton-l in #1222
  • [Tests] Fix mps+generator fast tests by @anton-l in #1230
  • [Tests] Adjust TPU test values by @anton-l in #1233
  • Add a reference to the name 'Sampler' by @apolinario in #1172
  • Fix Flax usage comments by @pcuenca in #1211
  • [Docs] improve img2img example by @ruanrz in #1193
  • [Stable Diffusion] Fix padding / truncation by @patrickvonplaten in #1226
  • Finalize stable diffusion refactor by @patrickvonplaten in #1269
  • Edited attention.py for older xformers by @Lime-Cakes in #1270
  • Fix wrong link in text2img fine-tuning documentation by @daspartho in #1282
  • [StableDiffusionInpaintPipeline] fix batch_size for mask and masked latents by @patil-suraj in #1279
  • Add UNet 1d for RL model for planning + colab by @natolambert in #105
  • Fix documentation typo for UNet2DModel and UNet2DConditionModel by @xenova in #1275
  • add source link to composable diffusion model by @nanliu1 in #1293
  • Fix incorrect link to Stable Diffusion notebook by @dhruvrnaik in #1291
  • [dreambooth] link to bitsandbytes readme for installation by @0xdevalias in #1229
  • Add Scheduler.from_pretrained and better scheduler changing by @patrickvonplaten in #1286
  • Add AltDiffusion by @patrickvonplaten in #1299
  • Better error message for transformers dummy by @patrickvonplaten in #1306
  • Revert "Update pr docs actions" by @mishig25 in #1307
  • [AltDiffusion] add tests by @patil-suraj in #1311
  • Add improved handling of pil by @patrickvonplaten in #1309
  • cpu offloading: mutli GPU support by @dblunk88 in #1143
  • vq diffusion classifier free sampling by @williamberman in #1294
  • doc string args shape fix by @kamalkraj in #1243
  • [Community Pipeline] CLIPSeg + StableDiffusionInpainting by @unography in #1250
  • Temporary local test for PIL_INTERPOLATION by @pcuenca in #1317
  • Fix gpu_id by @anton-l in #1326
  • integrate ort by @prathikr in #1110
  • [Custom pipeline] Easier loading of local pipelines by @patrickvonplaten in #1327
  • [ONNX] Support Euler schedulers by @anton-l in #1328
  • img2text Typo by @patrickvonplaten in #1329
  • add docs for multi-modal examples by @natolambert in #1227
  • [Flax] Fix loading scheduler from subfolder by @skirsten in #1319
  • Fix/Enable all schedulers for in-painting by @patrickvonplaten in #1331
  • Correct path to schedlure by @patrickvonplaten in #1322
  • Avoid nested fix-copies by @anton-l in #1332
  • Fix img2img speed with LMS-Discrete Scheduler by @NotNANtoN in #896
  • Fix the order of casts for onnx inpainting by @anton-l in #1338
  • Legacy Inpainting Pipeline for Onnx Models by @ctsims in #1237
  • Jax infer support negative prompt by @entrpn in #1337
  • Update README.md: IMAGIC example code snippet misspelling by @ki-arie in #1346
  • Update README.md: Minor change to Imagic code snippet, missing dir error by @ki-arie in #1347
  • Handle batches and Tensors in pipeline_stable_diffusion_inpaint.py:prepare_mask_and_masked_image by @vict0rsch in #1003
  • change the sample model by @shunxing1234 in #1352
  • Add bit diffusion [WIP] by @kingstut in #971
  • perf: prefer batched matmuls for attention by @Birch-san in #1203
  • [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten in #1360
  • Add Safe Stable Diffusion Pipeline by @manuelbrack in #1244
  • [examples] fix mixed_precision arg by @patil-suraj in #1359
  • use memory_efficient_attention by default by @patil-suraj in #1354
  • Replace logger.warn by logger.warning by @regisss in #1366
  • Fix using non-square images with UNet2DModel and DDIM/DDPM pipelines by @jenkspt in #1289
  • handle fp16 in UNet2DModel by @patil-suraj in #1216
  • StableDiffusionImageVariationPipeline by @patil-suraj in #1365
Nov 5, 2022
v0.7.2: Patch release

This patch release fixes a bug that broken the Flax Stable Diffusion Inference. Thanks a mille for spotting it @camenduru in https://github.com/huggingface/diffusers/issues/1145 and thanks a lot to @pcuenca and @kashif for fixing it in https://github.com/huggingface/diffusers/pull/1149

  • Flax: Flip sin to cos in time embeddings #1149 by @pcuenca
Nov 4, 2022
v0.7.1: Patch release

This patch release makes accelerate a soft dependency to avoid an error when installing diffusers with pre-existing torch.

  • Move accelerate to a soft-dependency #1134 by @patrickvonplaten
Nov 3, 2022
v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

:heart: PyTorch + Accelerate

:warning: The PyTorch pipelines now require accelerate for improved model loading times! Install Diffusers with pip install --upgrade diffusers[torch] to get everything in a single command.

🍎 Apple Silicon support with PyTorch 1.13

PyTorch and Apple have been working on improving mps support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

  • Mac computer with Apple silicon (M1/M2) hardware.
  • macOS 12.6 or later (13.0 or later recommended, as support is even better).
  • arm64 version of Python.
  • PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "a photo of an astronaut riding a horse on mars"

# First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)

image = pipe(prompt).images[0]
image.save("astronaut.png")

Continuous Integration

Our automated tests now include a full battery of tests on the mps device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in the documentation.

💃 Dance Diffusion

diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!

  • [Dance Diffusion] Add dance diffusion by @patrickvonplaten #803

Try it out to generate some random music:

from diffusers import DiffusionPipeline
import scipy

model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")

audio = pipeline(audio_length_in_s=4.0).audios[0]

# To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())

🎉 Euler schedulers

These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

  • k-diffusion-euler by @hlky #1019
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]

🔥 Up to 2x faster inference with memory_efficient_attention

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers

  • Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532

To leverage it just make sure you have:

  • PyTorch > 1.12
  • Cuda available
  • Installed the xformers library
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")

# optional: You can disable it via
# pipe.disable_xformers_memory_efficient_attention()

🚀 Much faster loading

Thanks to accelerate, pipeline loading is much, much faster. There are two parts to it:

  • First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using low_cpu_mem_usage (enabled by default), no initialization will be performed.
  • Optionally, you can also use device_map="auto" to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.

from diffusers import RePaintPipeline, RePaintScheduler

# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
    original_image=original_image,
    mask_image=mask_image,
    num_inference_steps=250,
    eta=0.0,
    jump_length=10,
    jump_n_sample=10,
    generator=generator,
)
inpainted_image = output.images[0]

:earth_africa: Community Pipelines

Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class. For a code example, see Long Prompt Weighting Stable Diffusion

  • [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907

Speech to Image

Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion. For a code example, see Speech to Image

Wildcard Stable Diffusion

A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__ to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file. For a code example, see Wildcard Stable Diffusion

  • Wildcard stable diffusion pipeline by @shyamsn97 in #900

Composable Stable Diffusion

Use logic operators to do compositional generation. For a code example, see Composable Stable Diffusion

  • Add Composable diffusion to community pipeline examples by @MarkRich in #951

Imagic Stable Diffusion

Image editing with Stable Diffusion. For a code example, see Imagic Stable Diffusion

  • Add imagic to community pipelines by @MarkRich in #958

Seed Resizing

Allows to generate a larger image while keeping the content of the original image. For a code example, see Seed Resizing

  • Add seed resizing to community pipelines by @MarkRich in #1011

:memo: Changelog

  • [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
  • [Stable Diffusion] Add components function by @patrickvonplaten in #889
  • [PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
  • [DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
  • DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
  • [Examples] add speech to image pipeline example by @MikailINTech in #897
  • [dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
  • Dreambooth class image generation: using unique names to avoid overwriting existing image by @leszekhanusz in #847
  • fix test_components by @patil-suraj in #928
  • Fix Compatibility with Nvidia NGC Containers by @tasercake in #919
  • [Community Pipelines] Fix pad_tokens_and_weights in lpw_stable_diffusion by @SkyTNT in #925
  • Bump the version to 0.7.0.dev0 by @anton-l in #912
  • Introduce the copy mechanism by @anton-l in #924
  • [Tests] Move stable diffusion into their own files by @patrickvonplaten in #936
  • [Flax] dont warn for bf16 weights by @patil-suraj in #923
  • Support LMSDiscreteScheduler in LDMPipeline by @mkshing in #891
  • Wildcard stable diffusion pipeline by @shyamsn97 in #900
  • [MPS] fix mps failing tests by @kashif in #934
  • fix a small typo in pipeline_ddpm.py by @chenguolin in #948
  • Reorganize pipeline tests by @anton-l in #963
  • v1-5 docs updates by @apolinario in #921
  • add community pipeline docs; add minimal text to some empty doc pages by @natolambert in #930
  • Fix typo: torch_type -> torch_dtype by @pcuenca in #972
  • add num_inference_steps arg to DDPM by @tmabraham in #935
  • Add Composable diffusion to community pipeline examples by @MarkRich in #951
  • [Flax] added broadcast_to_shape_from_left helper and Scheduler tests by @kashif in #864
  • [Tests] Fix mps reproducibility issue when running with pytest-xdist by @anton-l in #976
  • mps changes for PyTorch 1.13 by @pcuenca in #926
  • [Onnx] support half-precision and fix bugs for onnx pipelines by @SkyTNT in #932
  • [Dance Diffusion] Add dance diffusion by @patrickvonplaten in #803
  • [Dance Diffusion] FP16 by @patrickvonplaten in #980
  • [Dance Diffusion] Better naming by @patrickvonplaten in #981
  • Fix typo in documentation title by @echarlaix in #975
  • Add --pretrained_model_name_revision option to train_dreambooth.py by @shirayu in #933
  • Do not use torch.float64 on the mps device by @pcuenca in #942
  • CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by @patrickvonplaten in #991
  • fix a bug in the new version by @xiaohu2015 in #957
  • Fix typos by @shirayu in #978
  • Add missing import by @juliensimon in #979
  • minimal stable diffusion GPU memory usage with accelerate hooks by @piEsposito in #850
  • [inpaint pipeline] fix bug for multiple prompts inputs by @xiaohu2015 in #959
  • Enable multi-process DataLoader for dreambooth by @skirsten in #950
  • Small modification to enable usage by external scripts by @briancw in #956
  • [Flax] Add Textual Inversion by @duongna21 in #880
  • Continuation of #942: additional float64 failure by @pcuenca in #996
  • fix dreambooth script. by @patil-suraj in #1017
  • [Accelerate model loading] Fix meta device and super low memory usage by @patrickvonplaten in #1016
  • [Flax] Add finetune Stable Diffusion by @duongna21 in #999
  • [DreamBooth] Set train mode for text encoder by @duongna21 in #1012
  • [Flax] Add DreamBooth by @duongna21 in #1001
  • Deprecate init_git_repo, refactor train_unconditional.py by @anton-l in #1022
  • update readme for flax examples by @patil-suraj in #1026
  • Probably nicer to specify dependency on tensorboard in the training example by @lukovnikov in #998
  • Add --dataloader_num_workers to the DDPM training example by @anton-l in #1027
  • Document sequential CPU offload method on Stable Diffusion pipeline by @piEsposito in #1024
  • Support grayscale images in numpy_to_pil by @anton-l in #1025
  • [Flax SD finetune] Fix dtype by @duongna21 in #1038
  • fix F.interpolate() for large batch sizes by @NouamaneTazi in #1006
  • [Tests] Improve unet / vae tests by @patrickvonplaten in #1018
  • [Tests] Speed up slow tests by @patrickvonplaten in #1040
  • Fix some failing tests by @patrickvonplaten in #1041
  • [Tests] Better prints by @patrickvonplaten in #1043
  • [Tests] no random latents anymore by @patrickvonplaten in #1045
  • Update training and fine-tuning docs by @pcuenca in #1020
  • Fix speedup ratio in fp16.mdx by @mwbyeon in #837
  • clean incomplete pages by @natolambert in #1008
  • Add seed resizing to community pipelines by @MarkRich in #1011
  • Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by @pcuenca in #1048
  • Experimental: allow fp16 in mps by @pcuenca in #961
  • Move safety detection to model call in Flax safety checker by @jonatanklosko in #1023
  • Fix pipelines user_agent, ignore CI requests by @anton-l in #1058
  • [GitBot] Automatically close issues after inactivitiy by @patrickvonplaten in #1079
  • Allow safety_checker to be None when using CPU offload by @pcuenca in #1078
  • k-diffusion-euler by @hlky in #1019
  • [Better scheduler docs] Improve usage examples of schedulers by @patrickvonplaten in #890
  • [Tests] Fix slow tests by @patrickvonplaten in #1087
  • Remove nn sequential by @patrickvonplaten in #1086
  • Remove some unused parameter in CrossAttnUpBlock2D by @LaurentMazare in #1034
  • Add imagic to community pipelines by @MarkRich in #958
  • Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR in #532
  • [docs] add euler scheduler in docs, how to use differnet schedulers by @patil-suraj in #1089
  • Integration tests precision improvement for inpainting by @Lewington-pitsos in #1052
  • lpw_stable_diffusion: Add is_cancelled_callback by @irgolic in #1053
  • Rename latent by @patrickvonplaten in #1102
  • fix typo in examples dreambooth README.md by @jorahn in #1073
  • fix model card url in text inversion readme. by @patil-suraj in #1103
  • [CI] Framework and hardware-specific CI tests by @anton-l in #997
  • Fix a small typo of a variable name by @omihub777 in #1063
  • Fix tests for equivalence of DDIM and DDPM pipelines by @sgrigory in #1069
  • Fix padding in dreambooth by @shirayu in #1030
  • [Flax] time embedding by @kashif in #1081
  • Training to predict x0 in training example by @lukovnikov in #1031
  • [Loading] Ignore unneeded files by @patrickvonplaten in #1107
  • Fix hub-dependent tests for PRs by @anton-l in #1119
  • Allow saving None pipeline components by @anton-l in #1118
  • feat: add repaint by @Revist in #974
  • Continuation of #1035 by @pcuenca in #1120
  • VQ-diffusion by @williamberman in #658
Oct 19, 2022
v0.6.0: Finetuned Stable Diffusion inpainting

:art: Finetuned Stable Diffusion inpainting

The first official stable diffusion checkpoint fine-tuned on inpainting has been released.

You can try it out in the official demo here

or code it up yourself :computer: :

from io import BytesIO

import torch

import PIL
import requests
from diffusers import StableDiffusionInpaintPipeline


def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")


img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

output = pipe(prompt=prompt, image=image, mask_image=mask_image)
image = output.images[0]

gives:

imagemask_imagepromptOutput
<img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="200"/><img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="200"/>Face of a yellow cat, high resolution, sitting on a park bench=><img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/test.png" alt="drawing" width="200"/>

:warning: This release deprecates the unsupervised noising-based inpainting pipeline into StableDiffusionInpaintPipelineLegacy. The new StableDiffusionInpaintPipeline is based on a Stable Diffusion model finetuned for the inpainting task: https://huggingface.co/runwayml/stable-diffusion-inpainting

Note When loading StableDiffusionInpaintPipeline with a non-finetuned model (i.e. the one saved with diffusers<=0.5.1), the pipeline will default to StableDiffusionInpaintPipelineLegacy, to maintain backward compatibility :sparkles:

from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

assert pipe.__class__ .__name__ == "StableDiffusionInpaintPipelineLegacy"

Context:

Why this change? When Stable Diffusion came out ~2 months ago, there were many unofficial in-painting demos using the original v1-4 checkpoint ("CompVis/stable-diffusion-v1-4"). These demos worked reasonably well, so that we integrated an experimental StableDiffusionInpaintPipeline class into diffusers. Now that the official inpainting checkpoint was released: https://github.com/runwayml/stable-diffusion we decided to make this our official pipeline and move the old / hacky one to "StableDiffusionInpaintPipelineLegacy".

:rocket: ONNX pipelines for image2image and inpainting

Thanks to the contribution by @zledas (#552) this release supports OnnxStableDiffusionImg2ImgPipeline and OnnxStableDiffusionInpaintPipeline optimized for CPU inference:

from diffusers import OnnxStableDiffusionImg2ImgPipeline, OnnxStableDiffusionInpaintPipeline

img_pipeline = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider"
)

inpaint_pipeline = OnnxStableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", revision="onnx", provider="CPUExecutionProvider"
)

:earth_africa: Community Pipelines

Two new community pipelines have been added to diffusers :fire:

Stable Diffusion Interpolation example

Interpolate the latent space of Stable Diffusion between different prompts/seeds. For more info see stable-diffusion-videos.

For a code example, see Stable Diffusion Interpolation

  • Add Stable Diffusion Interpolation Example by @nateraw in #862

Stable Diffusion Interpolation Mega

One Stable Diffusion Pipeline with all functionalities of Text2Image, Image2Image and Inpainting

For a code example, see Stable Diffusion Mega

  • All in one Stable Diffusion Pipeline by @patrickvonplaten in #821

:memo: Changelog

  • [Community] One step unet by @patrickvonplaten in #840
  • Remove unneeded use_auth_token by @osanseviero in #839
  • Bump to 0.6.0.dev0 by @anton-l in #831
  • Remove the last of ["sample"] by @anton-l in #842
  • Fix Flax pipeline: width and height are ignored #838 by @camenduru in #848
  • [DeviceMap] Make sure stable diffusion can be loaded from older trans… by @patrickvonplaten in #860
  • Fix small community pipeline import bug and finish README by @patrickvonplaten in #869
  • Fix training push_to_hub (unconditional image generation): models were not saved before pushing to hub by @pcuenca in #868
  • Fix table in community README.md by @nateraw in #879
  • Add generic inference example to community pipeline readme by @apolinario in #874
  • Rename frame filename in interpolation community example by @nateraw in #881
  • Add Apple M1 tests by @anton-l in #796
  • Fix autoencoder test by @pcuenca in #886
  • Rename StableDiffusionOnnxPipeline -> OnnxStableDiffusionPipeline by @anton-l in #887
  • Fix DDIM on Windows not using int64 for timesteps by @hafriedlander in #819
  • [dreambooth] allow fine-tuning text encoder by @patil-suraj in #883
  • Stable Diffusion image-to-image and inpaint using onnx. by @zledas in #552
  • Improve ONNX img2img numpy handling, temporarily fix the tests by @anton-l in #899
  • [Stable Diffusion Inpainting] Deprecate inpainting pipeline in favor of official one by @patrickvonplaten in #903
  • [Communit Pipeline] Make sure "mega" uses correct inpaint pipeline by @patrickvonplaten in #908
  • Stable diffusion inpainting by @patil-suraj in #904
  • ONNX supervised inpainting by @anton-l in #906
Oct 13, 2022
v0.5.1: Patch release

This patch release fixes an bug with Flax's NFSW safety checker in the pipeline.

https://github.com/huggingface/diffusers/pull/832 by @patil-suraj

v0.5.0: JAX/Flax and TPU support

:ear_of_rice: JAX/Flax integration for super fast Stable Diffusion on TPUs.

We added JAX support for Stable Diffusion! You can now run Stable Diffusion on Colab TPUs (and GPUs too!) for faster inference.

Check out this TPU-ready colab for a Stable Diffusion pipeline: And a detailed blog post on Stable Diffusion and parallelism in JAX / Flax :hugs: https://huggingface.co/blog/stable_diffusion_jax

The most used models, schedulers and pipelines have been ported to JAX/Flax, namely:

  • Models: FlaxAutoencoderKL, FlaxUNet2DConditionModel
  • Schedulers: FlaxDDIMScheduler, FlaxDDIMScheduler, FlaxPNDMScheduler
  • Pipelines: FlaxStableDiffusionPipeline

Changelog:

  • Implement FlaxModelMixin #493 by @mishig25 , @patil-suraj, @patrickvonplaten , @pcuenca
  • Karras VE, DDIM and DDPM flax schedulers #508 by @kashif
  • initial flax pndm scheduler #492 by @kashif
  • FlaxDiffusionPipeline & FlaxStableDiffusionPipeline #559 by @mishig25 , @patrickvonplaten , @pcuenca
  • Flax pipeline pndm #583 by @pcuenca
  • Add from_pt argument in .from_pretrained #527 by @younesbelkada
  • Make flax from_pretrained work with local subfolder #608 by @mishig25

:fire: DeepSpeed low-memory training

Thanks to the :hugs: accelerate integration with DeepSpeed, a few of our training examples became even more optimized in terms of VRAM and speed:

  • DreamBooth is now trainable on 8GB GPUs thanks to a contribution from @Ttl! Find out how to run it here.
  • The Text2Image finetuning example is also fully compatible with DeepSpeed.

:pencil2: Changelog

  • Revert "[v0.4.0] Temporarily remove Flax modules from the public API by @anton-l in #755)"
  • Fix push_to_hub for dreambooth and textual_inversion by @YaYaB in #748
  • Fix ONNX conversion script opset argument type by @justinchuby in #739
  • Add final latent slice checks to SD pipeline intermediate state tests by @jamestiotio in #731
  • fix(DDIM scheduler): use correct dtype for noise by @keturn in #742
  • [Tests] Fix tests by @patrickvonplaten in #774
  • debug an exception by @LowinLi in #638
  • Clean up resnet.py file by @natolambert in #780
  • add sigmoid betas by @natolambert in #777
  • [Low CPU memory] + device map by @patrickvonplaten in #772
  • Fix gradient checkpointing test by @patrickvonplaten in #797
  • fix typo docstring in unet2d by @natolambert in #798
  • DreamBooth DeepSpeed support for under 8 GB VRAM training by @Ttl in #735
  • support bf16 for stable diffusion by @patil-suraj in #792
  • stable diffusion fine-tuning by @patil-suraj in #356
  • Flax: Trickle down norm_num_groups by @akash5474 in #789
  • Eventually preserve this typo? :) by @spezialspezial in #804
  • Fix indentation in the code example by @osanseviero in #802
  • [Img2Img] Fix batch size mismatch prompts vs. init images by @patrickvonplaten in #793
  • Minor package fixes by @anton-l in #809
  • [Dummy imports] Better error message by @patrickvonplaten in #795
  • add or fix license formatting in models directory by @natolambert in #808
  • [train_text2image] Fix EMA and make it compatible with deepspeed. by @patil-suraj in #813
  • Fix fine-tuning compatibility with deepspeed by @pink-red in #816
  • Add diffusers version and pipeline class to the Hub UA by @anton-l in #814
  • [Flax] Add test by @patrickvonplaten in #824
  • update flax scheduler API by @patil-suraj in #822
  • Fix dreambooth loss type with prior_preservation and fp16 by @anton-l in #826
  • Fix type mismatch error, add tests for negative prompts by @anton-l in #823
  • Give more customizable options for safety checker by @patrickvonplaten in #815
  • Flax safety checker by @pcuenca in #825
  • Align PT and Flax API - allow loading checkpoint from PyTorch configs by @patrickvonplaten in #827
Oct 11, 2022
v0.4.2: Patch release

This patch release allows the img2img pipeline to be run on fp16 and fixes a bug with the "mps" device.

Latest
v0.37.1
Tracking Since
Jul 21, 2022
Last checked Apr 20, 2026