v0.26.0 — Diffusers — releases.sh

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")

<table> <tr> <td><center> masterpiece, bestquality, sunset. <br> <img src="https://github.com/huggingface/diffusers/assets/22957388/7ef7b2b5-b37a-41a7-8397-f6c4c0f567e4" alt="library" style="width: 300px;" /> </center></td> </tr> </table>

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

import torch
from diffusers import (
    EulerDiscreteScheduler,
    MotionAdapter,
    PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image

adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")

<table> <tr> <td><center> masterpiece, bestquality, sunset. <br> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pia-default-output.gif" alt="cat in a field" style="width: 300px;" /> </center></td> </tr> </table>

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])

pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images =  [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline(
    prompt="wonderwoman",
    ip_adapter_image=[style_images, face_image],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50
    generator=generator,
).images[0]

Reference style images: <img src=https://github.com/huggingface/diffusers/assets/12631849/84f15215-7ac2-40ef-a552-3bdad3fdfba0 width=700/>

<table> <tr> <th><strong>Reference face Image</strong></th> <th><strong>Output Image</strong></th> </tr> <tr> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width=500></td> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width=500></td> </tr> </table>

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

[docs] Fix missing API function by @stevhliu in #6604
Fix failing tests due to Posix Path by @DN6 in #6627
Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
[Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
fix SDXL-kdiffusion tests by @yiyixuxu in #6647
add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
add Sa-Solver by @lawrence-cj in #5975
Add tearDown method to LoRA tests. by @DN6 in #6660
[Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
Update README by @standardAI in #6669
[Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
Standardise outputs for video pipelines by @DN6 in #6626
fix dpm related slow test failure by @yiyixuxu in #6680
[Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
[Refactor] Update from single file by @DN6 in #6428
[WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
Add InstantID Pipeline by @haofanwang in #6673
[Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
[Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
AnimateDiff Video to Video by @a-r-r-o-w in #6328
[docs] UViT2D by @stevhliu in #6643
Correct sigmas cpu settings by @patrickvonplaten in #6708
[docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
fix community README by @a-r-r-o-w in #6645
fix custom diffusion training with concept list by @AIshutin in #6710
Add IP Adapters to slow tests by @DN6 in #6714
Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
Add Community Example Consistency Training Script by @dg845 in #6717
Add UFOGenScheduler to Community Examples by @dg845 in #6650
[Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
changed to posix unet by @gzguevara in #6719
Change os.path to pathlib Path by @Stepheni12 in #6737
correct hflip arg by @sayakpaul in #6743
Add unload_textual_inversion method by @fabiorigano in #6656
[Core] move transformer scripts to transformers modules by @sayakpaul in #6747
Update lora.md with a more accurate description of rank by @xhedit in #6724
Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
udpate ip-adapter slow tests by @yiyixuxu in #6760
Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
[DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
add note about serialization by @sayakpaul in #6764
Update train_diffusion_dpo.py by @viettmab in #6754
Pin torch < 2.2.0 in test runners by @DN6 in #6780
[Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass by @sayakpaul in #6762
add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
Fix setting scaling factor in VAE config by @DN6 in #6779
Add PIA Model/Pipeline by @DN6 in #6698
[docs] Add missing parameter by @stevhliu in #6775
[IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
[sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
[Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
Release: v0.26.0 by @<NOT FOUND> (direct commit on v0.26.0-release)
fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@a-r-r-o-w
- [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
- AnimateDiff Video to Video (#6328)
- [docs] AnimateDiff Video-to-Video (#6712)
- fix community README (#6645)
@ultranity
- refactor: extract init/forward function in UNet2DConditionModel (#6478)
@lawrence-cj
- add Sa-Solver (#5975)
@ayushtues
- [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
@haofanwang
- Add InstantID Pipeline (#6673)
- [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
@brandostrong
- SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
@dg845
- Add Community Example Consistency Training Script (#6717)
- Add UFOGenScheduler to Community Examples (#6650)
- Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)