releases.shpreview
Hugging Face/Diffusers

Diffusers

$npx -y @buildinternet/releases show diffusers
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases2Avg0/wkVersionsv0.37.0 → v0.37.1
Oct 7, 2022
v0.4.1: Patch release

This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".

Oct 6, 2022
v0.4.0 Better, faster, stronger!

🚗 Faster

We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.

On top of that, we now default to using the float16 format. It's much faster than float32 and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast, so the resulting code is cleaner!

🔑 use_auth_token no more

The recently released version of huggingface-hub automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login in your terminal and you're all set.

- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

We bumped huggingface-hub version to 0.10.0 in our dependencies to achieve this.

🎈More flexible APIs

  • Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in https://github.com/huggingface/diffusers/pull/719 for more details

Please update any custom Stable Diffusion pipelines accordingly:

- if isinstance(self.scheduler, LMSDiscreteScheduler):
-    latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
- if isinstance(self.scheduler, LMSDiscreteScheduler):
-     sigma = self.scheduler.sigmas[i]
-     latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
- if isinstance(self.scheduler, LMSDiscreteScheduler):
-     latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
-     latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
  • Pipeline callbacks. As a community project (h/t @jamestiotio!), diffusers pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.

🛠️ More tasks

Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:

  • Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
  • Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
  • Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!

🏃‍♀️ Under the hood changes to support better fine-tuning

Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers to support general-purpose fine-tuning (coming soon!).

⚠️ Experimental: community pipelines

This is big, but it's still an experimental feature that may change in the future.

We are constantly amazed at the amount of imagination and creativity in the diffusers community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained will be able to load and run it. Read more in the documentation.

We can't wait to see what new tasks the community creates!

💪 Quality of life fixes

Bug fixing, improved documentation, better tests are all important to ensure diffusers is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!

🙏 Significant community contributions

The following people have made significant contributions to the library over the last release:

  • @Victarry – Add training example for DreamBooth (#554)
  • @jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
  • @jachiam – Allow resolutions that are not multiples of 64 (#505)
  • @johnowhitaker – Adding pred_original_sample to SchedulerOutput for some samplers (#614).
  • @keturn – Interesting discussions and insights on many topics.

✏️ Change list

  • [Docs] Correct links by @patrickvonplaten in #432
  • [Black] Update black by @patrickvonplaten in #433
  • use torch.matmul instead of einsum in attnetion. by @patil-suraj in #445
  • Renamed variables from single letter to better naming by @daspartho in #449
  • Docs: fix installation typo by @daspartho in #453
  • fix table formatting for stable diffusion pipeline doc (add blank line) by @natolambert in #471
  • update expected results of slow tests by @kashif in #268
  • [Flax] Make room for more frameworks by @patrickvonplaten in #494
  • Fix disable_attention_slicing in pipelines by @pcuenca in #498
  • Rename test_scheduler_outputs_equivalence in model tests. by @pcuenca in #451
  • Scheduler docs update by @natolambert in #464
  • Fix scheduler inference steps error with power of 3 by @natolambert in #466
  • initial flax pndm schedular by @kashif in #492
  • Fix vae tests for cpu and gpu by @kashif in #480
  • [Docs] Add subfolder docs by @patrickvonplaten in #500
  • docs: bocken doc links for relative links by @jjmachan in #504
  • Removing .float() (autocast in fp16 will discard this (I think)). by @Narsil in #495
  • Fix MPS scheduler indexing when using mps by @pcuenca in #450
  • [CrossAttention] add different method for sliced attention by @patil-suraj in #446
  • Implement FlaxModelMixin by @mishig25 in #493
  • Karras VE, DDIM and DDPM flax schedulers by @kashif in #508
  • [UNet2DConditionModel, UNet2DModel] pass norm_num_groups to all the blocks by @patil-suraj in #442
  • Add init_weights method to FlaxMixin by @mishig25 in #513
  • UNet Flax with FlaxModelMixin by @pcuenca in #502
  • Stable diffusion text2img conversion script. by @patil-suraj in #154
  • [CI] Add stalebot by @anton-l in #481
  • Fix is_onnx_available by @SkyTNT in #440
  • [Tests] Test attention.py by @sidthekidder in #368
  • Finally fix the image-based SD tests by @anton-l in #509
  • Remove the usage of numpy in up/down sample_2d by @ydshieh in #503
  • Fix typos and add Typo check GitHub Action by @shirayu in #483
  • Quick fix for the img2img tests by @anton-l in #530
  • [Tests] Fix spatial transformer tests on GPU by @anton-l in #531
  • [StableDiffusionInpaintPipeline] accept tensors for init and mask image by @patil-suraj in #439
  • adding more typehints to DDIM scheduler by @vishnu-anirudh in #456
  • Revert "adding more typehints to DDIM scheduler" by @patrickvonplaten in #533
  • Add LMSDiscreteSchedulerTest by @sidthekidder in #467
  • [Download] Smart downloading by @patrickvonplaten in #512
  • [Hub] Update hub version by @patrickvonplaten in #538
  • Unify offset configuration in DDIM and PNDM schedulers by @jonatanklosko in #479
  • [Configuration] Better logging by @patrickvonplaten in #545
  • make fixup support by @younesbelkada in #546
  • FlaxUNet2DConditionOutput @flax.struct.dataclass by @mishig25 in #550
  • [Flax] fix Flax scheduler by @kashif in #564
  • JAX/Flax safety checker by @pcuenca in #558
  • Flax: ignore dtype for configuration by @pcuenca in #565
  • Remove check_tf_utils to avoid an unnecessary TF import for now by @anton-l in #566
  • Fix _upsample_2d by @ydshieh in #535
  • [Flax] Add Vae for Stable Diffusion by @patrickvonplaten in #555
  • [Flax] Solve problem with VAE by @patrickvonplaten in #574
  • [Tests] Upload custom test artifacts by @anton-l in #572
  • [Tests] Mark the ncsnpp model tests as slow by @anton-l in #575
  • [examples/community] add CLIPGuidedStableDiffusion by @patil-suraj in #561
  • Fix CrossAttention._sliced_attention by @ydshieh in #563
  • Fix typos by @shirayu in #568
  • Add from_pt argument in .from_pretrained by @younesbelkada in #527
  • [FlaxAutoencoderKL] rename weights to align with PT by @patil-suraj in #584
  • Fix BaseOutput initialization from dict by @anton-l in #570
  • Add the K-LMS scheduler to the inpainting pipeline + tests by @anton-l in #587
  • [flax safety checker] Use FlaxPreTrainedModel for saving/loading by @patil-suraj in #591
  • FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by @mishig25 in #559
  • [Flax] Fix unet and ddim scheduler by @patrickvonplaten in #594
  • Fix params replication when using the dummy checker by @pcuenca in #602
  • Allow dtype to be specified in Flax pipeline by @pcuenca in #600
  • Fix flax from_pretrained pytorch weight check by @mishig25 in #603
  • Mv weights name consts to diffusers.utils by @mishig25 in #605
  • Replace dropout_prob by dropout in vae by @younesbelkada in #595
  • Add smoke tests for the training examples by @anton-l in #585
  • Add torchvision to training deps by @anton-l in #607
  • Return Flax scheduler state by @pcuenca in #601
  • [ONNX] Collate the external weights, speed up loading from the hub by @anton-l in #610
  • docs: fix Berkeley ref by @ryanrussell in #611
  • Handle the PIL.Image.Resampling deprecation by @anton-l in #588
  • Make flax from_pretrained work with local subfolder by @mishig25 in #608
  • [flax] 'dtype' should not be part of self._internal_dict by @mishig25 in #609
  • [UNet2DConditionModel] add gradient checkpointing by @patil-suraj in #461
  • docs: fix stochastic_karras_ve ref by @ryanrussell in #618
  • Adding pred_original_sample to SchedulerOutput for some samplers by @johnowhitaker in #614
  • docs: .md readability fixups by @ryanrussell in #619
  • Flax documentation by @younesbelkada in #589
  • fix docs: change sample to images by @AbdullahAlfaraj in #613
  • refactor: pipelines readability improvements by @ryanrussell in #622
  • Allow passing session_options for ORT backend by @cloudhan in #620
  • Fix breaking error: "ort is not defined" by @pcuenca in #626
  • docs: src/diffusers readability improvements by @ryanrussell in #629
  • Fix formula for noise levels in Karras scheduler and tests by @sgrigory in #627
  • [CI] Fix onnxruntime installation order by @anton-l in #633
  • Warning for too long prompts in DiffusionPipelines (Resolve #447) by @shirayu in #472
  • Fix docs link to train_unconditional.py by @AbdullahAlfaraj in #642
  • Remove deprecated torch_device kwarg by @pcuenca in #623
  • refactor: custom_init_isort readability fixups by @ryanrussell in #631
  • Remove inappropriate docstrings in LMS docstrings. by @pcuenca in #634
  • Flax pipeline pndm by @pcuenca in #583
  • Fix SpatialTransformer by @ydshieh in #578
  • Add training example for DreamBooth. by @Victarry in #554
  • [Pytorch] Pytorch only schedulers by @kashif in #534
  • [examples/dreambooth] don't pass tensor_format to scheduler. by @patil-suraj in #649
  • [dreambooth] update install section by @patil-suraj in #650
  • [DDIM, DDPM] fix add_noise by @patil-suraj in #648
  • [Pytorch] add dep. warning for pytorch schedulers by @kashif in #651
  • [CLIPGuidedStableDiffusion] remove set_format from pipeline by @patil-suraj in #653
  • Fix onnx tensor format by @anton-l in #654
  • Fix main: stable diffusion pipelines cannot be loaded by @pcuenca in #655
  • Fix the LMS pytorch regression by @anton-l in #664
  • Added script to save during textual inversion training. Issue 524 by @isamu-isozaki in #645
  • [CLIPGuidedStableDiffusion] take the correct text embeddings by @patil-suraj in #667
  • Update index.mdx by @tmabraham in #670
  • [examples] update transfomers version by @patil-suraj in #665
  • [gradient checkpointing] lower tolerance for test by @patil-suraj in #652
  • Flax from_pretrained: clean up mismatched_keys. by @pcuenca in #630
  • trained_betas ignored in some schedulers by @vishnu-anirudh in #635
  • Renamed x -> hidden_states in resnet.py by @daspartho in #676
  • Optimize Stable Diffusion by @NouamaneTazi in #371
  • Allow resolutions that are not multiples of 64 by @jachiam in #505
  • refactor: update ldm-bert config.json url closes #675 by @ryanrussell in #680
  • [docs] fix table in fp16.mdx by @NouamaneTazi in #683
  • Fix slow tests by @NouamaneTazi in #689
  • Fix BibText citation by @osanseviero in #693
  • Add callback parameters for Stable Diffusion pipelines by @jamestiotio in #521
  • [dreambooth] fix applying clip_grad_norm_ by @patil-suraj in #686
  • Flax: add shape argument to set_timesteps by @pcuenca in #690
  • Fix type annotations on StableDiffusionPipeline.call by @tasercake in #682
  • Fix import with Flax but without PyTorch by @pcuenca in #688
  • [Support PyTorch 1.8] Remove inference mode by @patrickvonplaten in #707
  • [CI] Speed up slow tests by @anton-l in #708
  • [Utils] Add deprecate function and move testing_utils under utils by @patrickvonplaten in #659
  • Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) by @jachiam in #701
  • [Docs] fix docstring for issue #709 by @kashif in #710
  • Update schedulers README.md by @tmabraham in #694
  • add accelerate to load models with smaller memory footprint by @piEsposito in #361
  • Fix typos by @shirayu in #718
  • Add an argument "negative_prompt" by @shirayu in #549
  • Fix import if PyTorch is not installed by @pcuenca in #715
  • Remove comments no longer appropriate by @pcuenca in #716
  • [train_unconditional] fix applying clip_grad_norm_ by @patil-suraj in #721
  • renamed x to meaningful variable in resnet.py by @i-am-epic in #677
  • [Tests] Add accelerate to testing by @patrickvonplaten in #729
  • [dreambooth] Using already created Path in dataset by @DrInfiniteExplorer in #681
  • Include CLIPTextModel parameters in conversion by @kanewallmann in #695
  • Avoid negative strides for tensors by @shirayu in #717
  • [Pytorch] pytorch only timesteps by @kashif in #724
  • [Scheduler design] The pragmatic approach by @anton-l in #719
  • Removing autocast for 35-25% speedup. (autocast considered harmful). by @Narsil in #511
  • No more use_auth_token=True by @patrickvonplaten in #733
  • remove use_auth_token from remaining places by @patil-suraj in #737
  • Replace messages that have empty backquotes by @pcuenca in #738
  • [Docs] Advertise fp16 instead of autocast by @patrickvonplaten in #740
  • remove use_auth_token from for TI test by @patil-suraj in #747
  • allow multiple generations per prompt by @patil-suraj in #741
  • Add back-compatibility to LMS timesteps by @anton-l in #750
  • update the clip guided PR according to the new API by @patil-suraj in #751
  • Raise an error when moving an fp16 pipeline to CPU by @anton-l in #749
  • Better steps deprecation for LMS by @anton-l in #753
Sep 8, 2022
v0.3.0: New API, Stable Diffusion pipelines, low-memory inference, MPS backend, ONNX

:books: Shiny new docs!

Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.

:memo: New API & breaking changes

New API

Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:

image = pipe("The red cat is sitting on a chair")["sample"][0]

is now replaced by:

image = pipe("The red cat is sitting on a chair").images[0]
# or
image = pipe("The red cat is sitting on a chair")["image"][0]
# or
image = pipe("The red cat is sitting on a chair")[0]

Similarly:

sample = unet(...).sample

and

prev_sample = scheduler(...).prev_sample

is now possible!

🚨🚨🚨 Breaking change 🚨🚨🚨

This PR introduces breaking changes for the following public-facing methods:

  • VQModel.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latents = model.encode(...) to latents = model.encode(...)[0] or latents = model.encode(...).latens
  • VQModel.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
  • VQModel.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample
  • AutoencoderKL.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latent_dist = model.encode(...) to latent_dist = model.encode(...)[0] or latent_dist = model.encode(...).latent_dist
  • AutoencoderKL.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
  • AutoencoderKL.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample

:art: New Stable Diffusion pipelines

A couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:

  • Image-to-image generation. In addition to using a text prompt, this pipeline lets you include an example image to be used as the initial state of the process. 🤗 Diffuse the Rest is a cool demo about it!
  • Inpainting (experimental). You can provide an image and a mask and ask Stable Diffusion to replace the mask.

For more details about how they work, please visit our new API documentation.

This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:

PipelineTasksColabDemo
pipeline_stable_diffusion.pyText-to-Image Generation🤗 Stable Diffusion
pipeline_stable_diffusion_img2img.pyImage-to-Image Text-Guided Generation🤗 Diffuse the Rest
pipeline_stable_diffusion_inpaint.pyExperimentalText-Guided Image InpaintingComing soon

:candy: Less memory usage for smaller GPUs

Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in https://github.com/basujindal/stable-diffusion/pull/117.

To make use of the attention optimization, just enable it with .enable_attention_slicing() after loading the pipeline:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()

This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!

:black_cat: Textual Inversion

Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.

GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb

:apple: MPS backend for Apple Silicon

🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to("mps")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.

:factory: Experimental ONNX exporter and pipeline for Stable Diffusion

We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).

You need to use StableDiffusionOnnxPipeline instead of StableDiffusionPipeline. You also need to download the weights from the onnx branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):

from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="onnx",
    provider="CPUExecutionProvider",
    use_auth_token=True,
)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

:warning: Warning: the script above takes a long time to download the external ONNX weights, so it will be faster to convert the checkpoint yourself (see below).

To convert your own checkpoint, run the conversion script locally:

python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

After that it can be loaded from the local path:

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")

Improvements and bugfixes

  • Mark in painting experimental by @patrickvonplaten in #430
  • Add config docs by @patrickvonplaten in #429
  • [Docs] Models by @kashif in #416
  • [Docs] Using diffusers by @patrickvonplaten in #428
  • [Outputs] Improve syntax by @patrickvonplaten in #423
  • Initial ONNX doc (TODO: Installation) by @pcuenca in #426
  • [Tests] Correct image folder tests by @patrickvonplaten in #427
  • [MPS] Make sure it doesn't break torch < 1.12 by @patrickvonplaten in #425
  • [ONNX] Stable Diffusion exporter and pipeline by @anton-l in #399
  • [Tests] Make image-based SD tests reproducible with fixed datasets by @anton-l in #424
  • [Docs] Outputs.mdx by @patrickvonplaten in #422
  • [Docs] Fix scheduler docs by @patrickvonplaten in #421
  • [Docs] DiffusionPipeline by @patrickvonplaten in #418
  • Improve unconditional diffusers example by @satpalsr in #414
  • Improve latent diff example by @satpalsr in #413
  • Inference support for mps device by @pcuenca in #355
  • [Docs] Minor fixes in optimization section by @patrickvonplaten in #420
  • [Docs] Pipelines for inference by @satpalsr in #417
  • [Docs] Training docs by @patrickvonplaten in #415
  • Docs: fp16 page by @pcuenca in #404
  • Add typing to scheduling_sde_ve: init, set_timesteps, and set_sigmas function definitions by @danielpatrickhug in #412
  • Docs fix some typos by @natolambert in #408
  • [docs sprint] schedulers docs, will update by @natolambert in #376
  • Docs: fix undefined in toctree by @natolambert in #406
  • Attention slicing by @patrickvonplaten in #407
  • Rename variables from single letter to meaningful name fix by @rashmimarganiatgithub in #395
  • Docs: Stable Diffusion pipeline by @pcuenca in #386
  • Small changes to Philosophy by @pcuenca in #403
  • karras-ve docs by @kashif in #401
  • Score sde ve doc by @kashif in #400
  • [Docs] Finish Intro Section by @patrickvonplaten in #402
  • [Docs] Quicktour by @patrickvonplaten in #397
  • ddim docs by @kashif in #396
  • Docs: optimization / special hardware by @pcuenca in #390
  • added pndm docs by @kashif in #391
  • Update text_inversion.mdx by @johnowhitaker in #393
  • [Docs] Logging by @patrickvonplaten in #394
  • [Pipeline Docs] ddpm docs for sprint by @kashif in #382
  • [Pipeline Docs] Unconditional Latent Diffusion by @satpalsr in #388
  • Docs: Conceptual section by @pcuenca in #392
  • [Pipeline Docs] Latent Diffusion by @patrickvonplaten in #377
  • [textual-inversion] fix saving embeds by @patil-suraj in #387
  • [Docs] Let's go by @patrickvonplaten in #385
  • Add colab links to textual inversion by @apolinario in #375
  • Efficient Attention by @patrickvonplaten in #366
  • Use expand instead of ones to broadcast tensor by @pcuenca in #373
  • [Tests] Fix SD slow tests by @anton-l in #364
  • [Type Hint] VAE models by @daspartho in #365
  • [Type hint] scheduling lms discrete by @santiviquez in #360
  • [Type hint] scheduling karras ve by @santiviquez in #359
  • type hints: models/vae.py by @shepherd1530 in #346
  • [Type Hints] DDIM pipelines by @sidthekidder in #345
  • [ModelOutputs] Replace dict outputs with Dict/Dataclass and allow to return tuples by @patrickvonplaten in #334
  • package version on main should have .dev0 suffix by @mishig25 in #354
  • [textual_inversion] use tokenizer.add_tokens to add placeholder_token by @patil-suraj in #357
  • [Type hint] scheduling ddim by @santiviquez in #343
  • [Type Hints] VAE models by @daspartho in #344
  • [Type Hint] DDPM schedulers by @daspartho in #349
  • [Type hint] PNDM schedulers by @daspartho in #335
  • Fix typo in unet_blocks.py by @da03 in #353
  • [Commands] Add env command by @patrickvonplaten in #352
  • Add transformers and scipy to dependency table by @patrickvonplaten in #348
  • [Type Hint] Unet Models by @sidthekidder in #330
  • [Img2Img2] Re-add K LMS scheduler by @patrickvonplaten in #340
  • Use ONNX / Core ML compatible method to broadcast by @pcuenca in #310
  • [Type hint] PNDM pipeline by @daspartho in #327
  • [Type hint] Latent Diffusion Uncond pipeline by @santiviquez in #333
  • Add contributions to README and re-order a bit by @patrickvonplaten in #316
  • [CI] try to fix GPU OOMs between tests and excessive tqdm logging by @anton-l in #323
  • README: stable diffusion version v1-3 -> v1-4 by @pcuenca in #331
  • Textual inversion by @patil-suraj in #266
  • [Type hint] Score SDE VE pipeline by @santiviquez in #325
  • [CI] Cancel pending jobs for PRs on new commits by @anton-l in #324
  • [train_unconditional] fix gradient accumulation. by @patil-suraj in #308
  • Fix nondeterministic tests for GPU runs by @anton-l in #314
  • Improve README to show how to use SD without an access token by @patrickvonplaten in #315
  • Fix flake8 F401 imported but unused by @anton-l in #317
  • Allow downloading of revisions for models. by @okalldal in #303
  • Fix more links by @python273 in #312
  • Changed variable name from "h" to "hidden_states" by @JC-swEng in #285
  • Fix stable-diffusion-seeds.ipynb link by @python273 in #309
  • [Tests] Add fast pipeline tests by @patrickvonplaten in #302
  • Improve README by @patrickvonplaten in #301
  • [Refactor] Remove set_seed by @patrickvonplaten in #289
  • [Stable Diffusion] Hotfix by @patrickvonplaten in #299
  • Check dummy file by @patrickvonplaten in #297
  • Add missing auth tokens for two SD tests by @anton-l in #296
  • Fix GPU tests (token + single-process) by @anton-l in #294
  • [PNDM Scheduler] format timesteps attrs to np arrays by @NouamaneTazi in #273
  • Fix link by @python273 in #286
  • [Type hint] Karras VE pipeline by @patrickvonplaten in #288
  • Add datasets + transformers + scipy to test deps by @anton-l in #279
  • Easily understandable error if inference steps not set before using scheduler by @samedii in #263)
  • [Docs] Add some guides by @patrickvonplaten in #276
  • [README] Add readme for SD by @patrickvonplaten in #274
  • Refactor Pipelines / Community pipelines and add better explanations. by @patrickvonplaten in #257
  • Refactor progress bar by @hysts in #242
  • Support K-LMS in img2img by @anton-l in #270
  • [BugFix]: Fixed add_noise in LMSDiscreteScheduler by @nicolas-dufour in #253
  • [Tests] Make sure tests are on GPU by @patrickvonplaten in #269
  • Adds missing torch imports to inpainting and image_to_image example by @PulkitMishra in #265
  • Fix typo in README.md by @webel in #260
  • Fix inpainting script by @patil-suraj in #258
  • Initialize CI for code quality and testing by @anton-l in #256
  • add inpainting example script by @nagolinc in #241
  • Update README.md with examples by @natolambert in #252
  • Reproducible images by supplying latents to pipeline by @pcuenca in #247
  • Style the scripts directory by @anton-l in #250
  • Pin black==22.3 to keep a stable --preview flag by @anton-l in #249
  • [Clean up] Clean unused code by @patrickvonplaten in #245
  • added test workflow and fixed failing test by @kashif in #237
  • split tests_modeling_utils by @kashif in #223
  • [example/image2image] raise error if strength is not in desired range by @patil-suraj in #238
  • Add image2image example script. by @patil-suraj in #231
  • Remove dead code in resnet.py by @ydshieh in #218

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @kashif
    • [Docs] Models (#416)
    • karras-ve docs (#401)
    • Score sde ve doc (#400)
    • ddim docs (#396)
    • added pndm docs (#391)
    • [Pipeline Docs] ddpm docs for sprint (#382)
    • added test workflow and fixed failing test (#237)
    • split tests_modeling_utils (#223)
Aug 22, 2022
v0.2.4: Patch release

This patch release allows the Stable Diffusion pipelines to be loaded with float16 precision:

pipe = StableDiffusionPipeline.from_pretrained(
           "CompVis/stable-diffusion-v1-4", 
           revision="fp16", 
           torch_dtype=torch.float16, 
           use_auth_token=True
)
pipe = pipe.to("cuda")

The resulting models take up less than 6900 MiB of GPU memory.

  • [Loading] allow modules to be loaded in fp16 by @patrickvonplaten in #230
v0.2.3: Stable Diffusion public release

:art: Stable Diffusion public release

The Stable Diffusion checkpoints are now public and can be loaded by anyone! :partying_face:

Make sure to accept the license terms on the model page first (requires login): https://huggingface.co/CompVis/stable-diffusion-v1-4 Install the required packages: pip install diffusers==0.2.3 transformers scipy And log in on your machine using the huggingface-cli login command.

from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

# this will substitute the default PNDM scheduler for K-LMS  
lms = LMSDiscreteScheduler(
    beta_start=0.00085, 
    beta_end=0.012, 
    beta_schedule="scaled_linear"
)

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    scheduler=lms,
    use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")

The safety checker

Following the model authors' guidelines and code, the Stable Diffusion inference results will now be filtered to exclude unsafe content. Any images classified as unsafe will be returned as blank. To check if the safety module is triggered programmaticaly, check the nsfw_content_detected flag like so:

outputs = pipe(prompt)
image = outputs
if any(outputs["nsfw_content_detected"]):
    print("Potential unsafe content was detected in one or more images. Try again with a different prompt and/or seed.")

Improvements and bugfixes

  • add add_noise method in LMSDiscreteScheduler, PNDMScheduler by @patil-suraj in #227
  • hotfix for pdnm test by @natolambert in #220
  • Restore is_modelcards_available in .utils by @pcuenca in #224
  • Update README for 0.2.3 release by @pcuenca in #225
  • Pipeline to device by @pcuenca in #210
  • fix safety check by @patil-suraj in #217
  • Add safety module by @patil-suraj in #213
  • Support one-string prompts and custom image size in LDM by @anton-l in #212
  • Add is_torch_available, is_flax_available by @anton-l in #204
  • Revive make quality by @anton-l in #203
  • [StableDiffusionPipeline] use default params in call by @patil-suraj in #196
  • fix test_from_pretrained_hub_pass_model by @patil-suraj in #194
  • Match params with official Stable Diffusion lib by @apolinario in #192

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.2.2...v0.2.3

Aug 16, 2022

This patch release fixes an import of the StableDiffusionPipeline

[K-LMS Scheduler] fix import by @patrickvonplaten in #191

v0.2.1 Patch release

This patch release fixes a small bug of the StableDiffusionPipeline

  • [Stable diffusion] Hot fix by @patrickvonplaten in 50a9ae
v0.2.0: Stable Diffusion early access, K-LMS sampling

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.

The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to <a href="https://stability.ai/academia-access-form" target="_blank">this</a> form

from torch import autocast
from diffusers import StableDiffusionPipeline

# make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)  

prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7)["sample"][0]  # image here is in PIL format
    
image.save(f"astronaut_rides_horse.png")

K-LMS sampling

The new LMSDiscreteScheduler is a port of k-lms from k-diffusion by Katherine Crowson. The scheduler can be easily swapped into existing pipelines like so:

from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-3-diffusers"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)

Integration test with text-to-image script of Stable-Diffusion

#182 and #186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion. Try it out yourself:

In Stable-Diffusion:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms

or

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code

In diffusers:

from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"
#prompt = "a photograph of the eiffel tower on the moon"
#prompt = "an oil painting of a futuristic forest gives"

# uncomment to use DDIM
# scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
# pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True, scheduler=scheduler)  # make sure you're logged in with `huggingface-cli login`

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)  # make sure you're logged in with `huggingface-cli login`

all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
    with autocast("cuda"):
        images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
        all_images.append(torch.from_numpy(images))

# additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)

# to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")

Improvements and bugfixes

  • Allow passing non-default modules to pipeline by @pcuenca in #188
  • Add K-LMS scheduler from k-diffusion by @anton-l in #185
  • [Naming] correct config naming of DDIM pipeline by @patrickvonplaten in #187
  • [PNDM] Stable diffusion by @patrickvonplaten in #186
  • [Half precision] Make sure half-precision is correct by @patrickvonplaten in #182
  • allow custom height, width in StableDiffusionPipeline by @patil-suraj in #179
  • add tests for stable diffusion pipeline by @patil-suraj in #178
  • Stable diffusion pipeline by @patil-suraj in #168
  • [LDM pipeline] fix eta condition. by @patil-suraj in #171
  • [PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by @patil-suraj in #167
  • allow pndm scheduler to be used with ldm pipeline by @patil-suraj in #165
  • add scaled_linear schedule in PNDM and DDPM by @patil-suraj in #164
  • add attention up/down blocks for VAE by @patil-suraj in #161
  • Add an alternative Karras et al. stochastic scheduler for VE models by @anton-l in #160
  • [LDMTextToImagePipeline] make text model generic by @patil-suraj in #162
  • Minor typos by @pcuenca in #159
  • Fix arg key for dataset_name in create_model_card by @pcuenca in #158
  • [VAE] fix the downsample block in Encoder. by @patil-suraj in #156
  • [UNet2DConditionModel] add cross_attention_dim as an argument by @patil-suraj in #155
  • Added diffusers to conda-forge and updated README for installation instruction by @sugatoray in #129
  • Add issue templates for feature requests and bug reports by @osanseviero in #153
  • Support training with a local image folder by @anton-l in #152
  • Allow DDPM scheduler to use model's predicated variance by @eyalmazuz in #132

Full Changelog: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0

Jul 28, 2022
0.1.3 Patch release

This patch releases refactors the model architecture of VQModel or AutoencoderKL including the weight naming. Therefore the official weights of the CompVis organization have been re-uploaded, see:

Corresponding PR: https://github.com/huggingface/diffusers/pull/137

Please make sure to upgrade diffusers to have those models running correctly: pip install --upgrade diffusers

Bug fixes

Jul 21, 2022
Initial release of 🧨 Diffusers

These are the release notes of the 🧨 Diffusers library

Introducing Hugging Face's new library for diffusion models.

Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.

While the most prolific successes of diffusion models have been in the computer vision community, these models have also achieved remarkable results in other domains, such as:

and more.

Goals

The goals of diffusers are:

  • to centralize the research of diffusion models from independent repositories to a clear and maintained project,
  • to reproduce high impact machine learning systems such as DALLE and Imagen in a manner that is accessible for the public, and
  • to create an easy to use API that enables one to train their own models or re-use checkpoints from other repositories for inference.

Release overview

Quickstart:

Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:

:bullettrain_side: Inference pipelines

Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).

The original release contains the following pipelines:

We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:

:alarm_clock: Schedulers

  • Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
  • Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
  • Schedulers are available in numpy, but can easily be transformed into PyTorch.

The goal is for each scheduler to provide one or more step() functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.

The initial release contains the following schedulers:

:factory: Models

Models are hosted in the src/diffusers/models folder.

For the initial release, you'll get to see a few building blocks, as well as some resulting models:

  • UNet2DModel can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the unconditional version of the UNet model, in opposition to the conditional version that follows below.
  • UNet2DConditionModel is similar to the UNet2DModel, but is conditional: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.
  • AutoencoderKL and VQModel are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.

:page_with_curl: Training example

The first release contains a dataset-agnostic unconditional example and a training notebook:

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

  • @CompVis' latent diffusion models library, available here
  • @hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
  • @ermongroup's DDIM implementation, available here.
  • @yang-song's Score-VE and Score-VP implementations, available here

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here.

Latest
v0.37.1
Tracking Since
Jul 21, 2022
Last checked Apr 18, 2026