Initial release of 🧨 Diffusers
These are the release notes of the 🧨 Diffusers library
Introducing Hugging Face's new library for diffusion models.
Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.
While the most prolific successes of diffusion models have been in the computer vision community, these models have also achieved remarkable results in other domains, such as:
and more.
The goals of diffusers are:
Quickstart:
Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:
Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).
The original release contains the following pipelines:
We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:
The goal is for each scheduler to provide one or more step() functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.
The initial release contains the following schedulers:
Models are hosted in the src/diffusers/models folder.
For the initial release, you'll get to see a few building blocks, as well as some resulting models:
UNet2DModel can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the unconditional version of the UNet model, in opposition to the conditional version that follows below.UNet2DConditionModel is similar to the UNet2DModel, but is conditional: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.AutoencoderKL and VQModel are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.The first release contains a dataset-agnostic unconditional example and a training notebook:
train_unconditional.py example, which trains a DDPM UNet model on a dataset of your choice.This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here.
Fetched April 7, 2026