v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images
This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.
I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.
import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image
repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()
image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)
frames = pipeline(
prompt=prompt,
image=image,
num_inference_steps=50,
negative_prompt=negative_prompt,
generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")
<table>
<tr>
<td><center>
masterpiece, bestquality, sunset.
<br>
<img src="https://github.com/huggingface/diffusers/assets/22957388/7ef7b2b5-b37a-41a7-8397-f6c4c0f567e4"
alt="library"
style="width: 300px;" />
</center></td>
</tr>
</table>
📜 Check out the docs here.
PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.
import torch
from diffusers import (
EulerDiscreteScheduler,
MotionAdapter,
PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image
adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"
generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")
<table>
<tr>
<td><center>
masterpiece, bestquality, sunset.
<br>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pia-default-output.gif"
alt="cat in a field"
style="width: 300px;" />
</center></td>
</tr>
</table>
📜 Check out the docs here.
IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:
import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"h94/IP-Adapter",
subfolder="models/image_encoder",
torch_dtype=torch.float16,
)
pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])
pipeline.enable_model_cpu_offload()
face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipeline(
prompt="wonderwoman",
ip_adapter_image=[style_images, face_image],
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=50
generator=generator,
).images[0]
Reference style images: <img src=https://github.com/huggingface/diffusers/assets/12631849/84f15215-7ac2-40ef-a552-3bdad3fdfba0 width=700/>
<table> <tr> <th><strong>Reference face Image</strong></th> <th><strong>Output Image</strong></th> </tr> <tr> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width=500></td> <td><img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width=500></td> </tr> </table>📜 Check out the docs here.
from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.
We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.
Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.
use_karras_sigmas option by @yiyixuxu in #6477unets module 🦋 by @sayakpaul in #6630from_single_file() by @sayakpaul in #6638transformers modules by @sayakpaul in #6747tensor_to_vid function in video pipelines by @DN6 in #6715alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704is_flaky to test_model_cpu_offload_forward_pass by @sayakpaul in #6762The following contributors have made significant changes to the library over the last release:
Fetched April 7, 2026