This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".
We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.
On top of that, we now default to using the float16 format. It's much faster than float32 and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast, so the resulting code is cleaner!
use_auth_token no moreThe recently released version of huggingface-hub automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login in your terminal and you're all set.
- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
We bumped huggingface-hub version to 0.10.0 in our dependencies to achieve this.
Please update any custom Stable Diffusion pipelines accordingly:
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- sigma = self.scheduler.sigmas[i]
- latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
- latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
diffusers pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:
Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers to support general-purpose fine-tuning (coming soon!).
This is big, but it's still an experimental feature that may change in the future.
We are constantly amazed at the amount of imagination and creativity in the diffusers community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained will be able to load and run it. Read more in the documentation.
We can't wait to see what new tasks the community creates!
Bug fixing, improved documentation, better tests are all important to ensure diffusers is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!
The following people have made significant contributions to the library over the last release:
disable_attention_slicing in pipelines by @pcuenca in #498.float() (autocast in fp16 will discard this (I think)). by @Narsil in #495mps by @pcuenca in #450FlaxModelMixin by @mishig25 in #493init_weights method to FlaxMixin by @mishig25 in #513make fixup support by @younesbelkada in #546_upsample_2d by @ydshieh in #535CrossAttention._sliced_attention by @ydshieh in #563from_pt argument in .from_pretrained by @younesbelkada in #527FlaxPreTrainedModel for saving/loading by @patil-suraj in #591dropout_prob by dropout in vae by @younesbelkada in #595Berkeley ref by @ryanrussell in #611stochastic_karras_ve ref by @ryanrussell in #618.md readability fixups by @ryanrussell in #619src/diffusers readability improvements by @ryanrussell in #629torch_device kwarg by @pcuenca in #623custom_init_isort readability fixups by @ryanrussell in #631SpatialTransformer by @ydshieh in #578main: stable diffusion pipelines cannot be loaded by @pcuenca in #655from_pretrained: clean up mismatched_keys. by @pcuenca in #630trained_betas ignored in some schedulers by @vishnu-anirudh in #635config.json url closes #675 by @ryanrussell in #680set_timesteps by @pcuenca in #690Path in dataset by @DrInfiniteExplorer in #681autocast for 35-25% speedup. (autocast considered harmful). by @Narsil in #511Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.
Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:
image = pipe("The red cat is sitting on a chair")["sample"][0]
is now replaced by:
image = pipe("The red cat is sitting on a chair").images[0]
# or
image = pipe("The red cat is sitting on a chair")["image"][0]
# or
image = pipe("The red cat is sitting on a chair")[0]
Similarly:
sample = unet(...).sample
and
prev_sample = scheduler(...).prev_sample
is now possible!
This PR introduces breaking changes for the following public-facing methods:
VQModel.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latents = model.encode(...) to latents = model.encode(...)[0] or latents = model.encode(...).latensVQModel.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sampleVQModel.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sampleAutoencoderKL.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latent_dist = model.encode(...) to latent_dist = model.encode(...)[0] or latent_dist = model.encode(...).latent_distAutoencoderKL.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sampleAutoencoderKL.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sampleA couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:
For more details about how they work, please visit our new API documentation.
This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:
| Pipeline | Tasks | Colab | Demo |
|---|---|---|---|
| pipeline_stable_diffusion.py | Text-to-Image Generation | 🤗 Stable Diffusion | |
| pipeline_stable_diffusion_img2img.py | Image-to-Image Text-Guided Generation | 🤗 Diffuse the Rest | |
| pipeline_stable_diffusion_inpaint.py | Experimental – Text-Guided Image Inpainting | Coming soon |
Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in https://github.com/basujindal/stable-diffusion/pull/117.
To make use of the attention optimization, just enable it with .enable_attention_slicing() after loading the pipeline:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!
Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.
GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb
🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to("mps")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.
We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).
You need to use StableDiffusionOnnxPipeline instead of StableDiffusionPipeline. You also need to download the weights from the onnx branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):
from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="onnx",
provider="CPUExecutionProvider",
use_auth_token=True,
)
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
:warning: Warning: the script above takes a long time to download the external ONNX weights, so it will be faster to convert the checkpoint yourself (see below).
To convert your own checkpoint, run the conversion script locally:
python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"
After that it can be loaded from the local path:
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")
mps device by @pcuenca in #355expand instead of ones to broadcast tensor by @pcuenca in #373version on main should have .dev0 suffix by @mishig25 in #354scripts directory by @anton-l in #250resnet.py by @ydshieh in #218The following contributors have made significant changes to the library over the last release:
This patch release allows the Stable Diffusion pipelines to be loaded with float16 precision:
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True
)
pipe = pipe.to("cuda")
The resulting models take up less than 6900 MiB of GPU memory.
The Stable Diffusion checkpoints are now public and can be loaded by anyone! :partying_face:
Make sure to accept the license terms on the model page first (requires login): https://huggingface.co/CompVis/stable-diffusion-v1-4
Install the required packages: pip install diffusers==0.2.3 transformers scipy
And log in on your machine using the huggingface-cli login command.
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
# this will substitute the default PNDM scheduler for K-LMS
lms = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear"
)
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
scheduler=lms,
use_auth_token=True
).to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
Following the model authors' guidelines and code, the Stable Diffusion inference results will now be filtered to exclude unsafe content. Any images classified as unsafe will be returned as blank. To check if the safety module is triggered programmaticaly, check the nsfw_content_detected flag like so:
outputs = pipe(prompt)
image = outputs
if any(outputs["nsfw_content_detected"]):
print("Potential unsafe content was detected in one or more images. Try again with a different prompt and/or seed.")
is_modelcards_available in .utils by @pcuenca in #224is_torch_available, is_flax_available by @anton-l in #204make quality by @anton-l in #203Full Changelog: https://github.com/huggingface/diffusers/compare/v0.2.2...v0.2.3
This patch release fixes an import of the StableDiffusionPipeline
[K-LMS Scheduler] fix import by @patrickvonplaten in #191
This patch release fixes a small bug of the StableDiffusionPipeline
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.
The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to <a href="https://stability.ai/academia-access-form" target="_blank">this</a> form
from torch import autocast
from diffusers import StableDiffusionPipeline
# make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)
prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
image = pipe(prompt, guidance_scale=7)["sample"][0] # image here is in PIL format
image.save(f"astronaut_rides_horse.png")
The new LMSDiscreteScheduler is a port of k-lms from k-diffusion by Katherine Crowson.
The scheduler can be easily swapped into existing pipelines like so:
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
model_id = "CompVis/stable-diffusion-v1-3-diffusers"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)
#182 and #186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion. Try it out yourself:
In Stable-Diffusion:
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms
or
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code
In diffusers:
from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid
torch.manual_seed(42)
prompt = "a photograph of an astronaut riding a horse"
#prompt = "a photograph of the eiffel tower on the moon"
#prompt = "an oil painting of a futuristic forest gives"
# uncomment to use DDIM
# scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
# pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True, scheduler=scheduler) # make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True) # make sure you're logged in with `huggingface-cli login`
all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
with autocast("cuda"):
images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"] # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
all_images.append(torch.from_numpy(images))
# additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)
# to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))
image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")
dataset_name in create_model_card by @pcuenca in #158diffusers to conda-forge and updated README for installation instruction by @sugatoray in #129Full Changelog: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0
This patch releases refactors the model architecture of VQModel or AutoencoderKL including the weight naming. Therefore the official weights of the CompVis organization have been re-uploaded, see:
Corresponding PR: https://github.com/huggingface/diffusers/pull/137
Please make sure to upgrade diffusers to have those models running correctly: pip install --upgrade diffusers
FileNotFoundError: 'model_card_template.md' https://github.com/huggingface/diffusers/pull/136These are the release notes of the 🧨 Diffusers library
Introducing Hugging Face's new library for diffusion models.
Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.
While the most prolific successes of diffusion models have been in the computer vision community, these models have also achieved remarkable results in other domains, such as:
and more.
The goals of diffusers are:
Quickstart:
Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:
Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).
The original release contains the following pipelines:
We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:
The goal is for each scheduler to provide one or more step() functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.
The initial release contains the following schedulers:
Models are hosted in the src/diffusers/models folder.
For the initial release, you'll get to see a few building blocks, as well as some resulting models:
UNet2DModel can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the unconditional version of the UNet model, in opposition to the conditional version that follows below.UNet2DConditionModel is similar to the UNet2DModel, but is conditional: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.AutoencoderKL and VQModel are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.The first release contains a dataset-agnostic unconditional example and a training notebook:
train_unconditional.py example, which trains a DDPM UNet model on a dataset of your choice.This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here.