releases.shpreview

v0.26.0

v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images

$npx -y @buildinternet/releases show rel_sds9IGojtLUjcUQqm7gk4

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")
<table> <tr> <td><center> masterpiece, bestquality, sunset. <br> <img src="https://github.com/huggingface/diffusers/assets/22957388/7ef7b2b5-b37a-41a7-8397-f6c4c0f567e4" alt="library" style="width: 300px;" /> </center></td> </tr> </table>

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

import torch
from diffusers import (
    EulerDiscreteScheduler,
    MotionAdapter,
    PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image

adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")
<table> <tr> <td><center> masterpiece, bestquality, sunset. <br> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pia-default-output.gif" alt="cat in a field" style="width: 300px;" /> </center></td> </tr> </table>

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])

pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images =  [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline(
    prompt="wonderwoman",
    ip_adapter_image=[style_images, face_image],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50
    generator=generator,
).images[0]

Reference style images: <img src=https://github.com/huggingface/diffusers/assets/12631849/84f15215-7ac2-40ef-a552-3bdad3fdfba0 width=700/>

<table> <tr> <th><strong>Reference face Image</strong></th> <th><strong>Output Image</strong></th> </tr> <tr> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width=500></td> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width=500></td> </tr> </table>

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

  • [docs] Fix missing API function by @stevhliu in #6604
  • Fix failing tests due to Posix Path by @DN6 in #6627
  • Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
  • [Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
  • refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
  • Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
  • Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
  • fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
  • fix SDXL-kdiffusion tests by @yiyixuxu in #6647
  • add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
  • add Sa-Solver by @lawrence-cj in #5975
  • Add tearDown method to LoRA tests. by @DN6 in #6660
  • [Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
  • Update README by @standardAI in #6669
  • [Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
  • Standardise outputs for video pipelines by @DN6 in #6626
  • fix dpm related slow test failure by @yiyixuxu in #6680
  • [Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
  • [Refactor] Update from single file by @DN6 in #6428
  • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
  • Add InstantID Pipeline by @haofanwang in #6673
  • [Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
  • [Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
  • SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
  • AnimateDiff Video to Video by @a-r-r-o-w in #6328
  • [docs] UViT2D by @stevhliu in #6643
  • Correct sigmas cpu settings by @patrickvonplaten in #6708
  • [docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
  • fix community README by @a-r-r-o-w in #6645
  • fix custom diffusion training with concept list by @AIshutin in #6710
  • Add IP Adapters to slow tests by @DN6 in #6714
  • Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
  • Add Community Example Consistency Training Script by @dg845 in #6717
  • Add UFOGenScheduler to Community Examples by @dg845 in #6650
  • [Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
  • Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
  • changed to posix unet by @gzguevara in #6719
  • Change os.path to pathlib Path by @Stepheni12 in #6737
  • correct hflip arg by @sayakpaul in #6743
  • Add unload_textual_inversion method by @fabiorigano in #6656
  • [Core] move transformer scripts to transformers modules by @sayakpaul in #6747
  • Update lora.md with a more accurate description of rank by @xhedit in #6724
  • Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
  • udpate ip-adapter slow tests by @yiyixuxu in #6760
  • Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
  • [DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
  • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
  • add note about serialization by @sayakpaul in #6764
  • Update train_diffusion_dpo.py by @viettmab in #6754
  • Pin torch < 2.2.0 in test runners by @DN6 in #6780
  • [Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass by @sayakpaul in #6762
  • add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
  • Fix setting scaling factor in VAE config by @DN6 in #6779
  • Add PIA Model/Pipeline by @DN6 in #6698
  • [docs] Add missing parameter by @stevhliu in #6775
  • [IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
  • [sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
  • [Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
  • Release: v0.26.0 by @<NOT FOUND> (direct commit on v0.26.0-release)
  • fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @a-r-r-o-w
    • [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
    • AnimateDiff Video to Video (#6328)
    • [docs] AnimateDiff Video-to-Video (#6712)
    • fix community README (#6645)
  • @ultranity
    • refactor: extract init/forward function in UNet2DConditionModel (#6478)
  • @lawrence-cj
    • add Sa-Solver (#5975)
  • @ayushtues
    • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
  • @haofanwang
    • Add InstantID Pipeline (#6673)
    • [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
  • @brandostrong
    • SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
  • @dg845
    • Add Community Example Consistency Training Script (#6717)
    • Add UFOGenScheduler to Community Examples (#6650)
    • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)

Fetched April 7, 2026