ControlNet, 8K VAE decoding
Thanks to an amazing collaboration with community member @takuma104 ๐, diffusers fully supports ControlNet! All 8 control models from the paper are available for you to use: depth, scribbles, edges, and more. Best of all is that you can take advantage of all the other goodies and optimizations that Diffusers provides out of the box, making this an ultra fast implementation of ControlNet. Take it for a spin to see for yourself.
ControlNet works by training a copy of some of the layers of the original Stable Diffusion model on additional signals, such as depth maps or scribbles. After training, you can provide a depth map as a strong hint of the composition you want to achieve, and have Stable Diffusion fill in the details for you. For example:
<table> <tr style="text-align: center;"> <th>Before</th> <th>After</th> </tr> <tr> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/house_depth.png" width=300/></td> <td><img class="mx-auto" src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/house_after.jpeg" width=300/></td> </tr> </table>Currently, there are 8 published control models, all of which were trained on runwayml/stable-diffusion-v1-5 (i.e., Stable Diffusion version 1.5). This is an example that uses the scribble controlnet model:
Or you can turn a cartoon into a realistic photo with incredible coherence:
<img src="https://huggingface.co/datasets/YiYiXu/controlnet-testing/resolve/main/lofi.jpg" height="400" alt="ControlNet showing a photo generated from a cartoon frame">How do you use ControlNet in diffusers? Just like this (example for the canny edges control model):
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
As usual, you can use all the features in the diffusers toolbox: super-fast schedulers, memory-efficient attention, model offloading, etc. We think ๐งจ Diffusers is the best way to iterate on your ControlNet experiments!
Please, refer to our blog post and documentation for details.
(And, coming soon, ControlNet training โ stay tuned!)
Another community member, @kig, conceived, proposed and fully implemented an amazing PR that allows generation of ultra-high resolution images without memory blowing up ๐คฏ. They follow a tiling approach during the image decoding phase of the process, generating a piece of the image at a time and then stitching them all together. Tiles are blended carefully to avoid visible seems between them, and the final result is amazing. This is the additional code you need to use to enjoy high-resolution generations:
pipe.vae.enable_tiling()
That's it!
For a complete example, refer to the PR or the code snippet we reproduce here for your convenience:
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.vae.enable_tiling()
prompt = "a beautiful landscape photo"
image = pipe(prompt, width=4096, height=2048, num_inference_steps=10).images[0]
image.save("4k_landscape.jpg")
mps test fixes by @pcuenca in #2470train_unconditional by @pcuenca in #2481The following contributors have made significant changes to the library over the last release:
Full Changelog: https://github.com/huggingface/diffusers/compare/v0.13.0...v0.14.0
Fetched April 7, 2026