releases.shpreview
Hugging Face/Diffusers

Diffusers

$npx -y @buildinternet/releases show diffusers
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases2Avg0/wkVersionsv0.37.0 → v0.37.1
Mar 25, 2026
Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading
  • Fix for loading ModularPipelines with AutoModel type hints in their modular_model_index.json #13271
  • Fix Flux Klein LoRA loading #13313
  • Fix unguarded torchvision import in Cosmos Predict 2.5 #13321
Mar 5, 2026
Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥

Modular Diffusers

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing DiffusionPipeline class, providing a more flexible way to create custom diffusion pipelines.

Find more details on how to get started with Modular Diffusers here, and also check out the announcement post.

New Pipelines and Models

Image 🌆

  • Z Image Omni Base: Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.
  • Flux2 Klein:FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
  • Qwen Image Layered: Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.
  • FIBO Edit: Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in https://github.com/huggingface/diffusers/pull/12930.
  • Cosmos Predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.
  • Cosmos Transfer2.5: Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.
  • GLM-Image: GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in https://github.com/huggingface/diffusers/pull/12973.
  • RAE: Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.

Video + audio 🎥 🎼

  • LTX-2: LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
  • Helios: Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in https://github.com/huggingface/diffusers/pull/13208.

Improvements to Core Library

New caching methods

New context-parallelism (CP) backends

Misc

  • Mambo-G Guidance: New guider implementation (#12862)
  • Laplace Scheduler for DDPM (#11320)
  • Custom Sigmas in UniPCMultistepScheduler (#12109)
  • MultiControlNet support for SD3 Inpainting (#11251)
  • Context parallel in native flash attention (#12829)
  • NPU Ulysses Attention Support (#12919)
  • Fix Wan 2.1 I2V Context Parallel Inference (#12909)
  • Fix Qwen-Image Context Parallel Inference (#12970)
  • Introduction to @apply_lora_scale decorator for simplifying model definitions (#12994)
  • Introduction of pipeline-level “cpu” device_map (#12811)
  • Enable CP for kernels-based attention backends (#12812)
  • Diffusers is fully functional with Transformers V5 (https://github.com/huggingface/diffusers/pull/12976)

A lot of the above features/improvements came as part of the MVP program we have been running. Immense thanks to the contributors!

Bug Fixes

  • Fix QwenImageEditPlus on NPU (#13017)
  • Fix MT5Tokenizer → use T5Tokenizer for Transformers v5.0+ compatibility (#12877)
  • Fix Wan/WanI2V patchification (#13038)
  • Fix LTX-2 inference with num_videos_per_prompt > 1 and CFG (#13121)
  • Fix Flux2 img2img prediction (#12855)
  • Fix QwenImage txt_seq_lens handling (#12702)
  • Fix prefix_token_len bug (#12845)
  • Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
  • Fix is_fsdp determination (#12960)
  • Fix GLM-Image get_image_features API (#13052)
  • Fix Wan 2.2 when either transformer isn't present (#13055)
  • Fix guider issue (#13147)
  • Fix torchao quantizer for new versions (#12901)
  • Fix GGUF for unquantized types with unquantize kernels (#12498)
  • Make Qwen hidden states contiguous for torchao (#13081)
  • Make Flux hidden states contiguous (#13068)
  • Fix Kandinsky 5 hardcoded CUDA autocast (#12814)
  • Fix aiter availability check (#13059)
  • Fix attention mask check for unsupported backends (#12892)
  • Allow prompt and prior_token_ids simultaneously in GlmImagePipeline (#13092)
  • GLM-Image batch support (#13007)
  • Cosmos 2.5 Video2World frame extraction fix (#13018)
  • ResNet: only use contiguous in training mode (#12977)

All commits

  • [PRX] Improve model compilation by @WaterKnight1998 in #12787
  • Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py by @delmalih in #12798
  • [Modular]z-image by @yiyixuxu in #12808
  • Fix Qwen Edit Plus modular for multi-image input by @sayakpaul in #12601
  • [WIP] Add Flux2 modular by @DN6 in #12763
  • [docs] improve distributed inference cp docs. by @sayakpaul in #12810
  • post release 0.36.0 by @sayakpaul in #12804
  • Update distributed_inference.md to correct syntax by @sayakpaul in #12827
  • [lora] Remove lora docs unneeded and add " # Copied from ..." by @sayakpaul in #12824
  • support CP in native flash attention by @sywangyi in #12829
  • [qwen-image] edit 2511 support by @naykun in #12839
  • fix pytest tests/pipelines/pixart_sigma/test_pixart.py::PixArtSigmaPi… by @sywangyi in #12842
  • Support for control-lora by @lavinal712 in #10686
  • Add support for LongCat-Image by @junqiangwu in #12828
  • fix the prefix_token_len bug by @junqiangwu in #12845
  • extend TorchAoTest::test_model_memory_usage to other platform by @sywangyi in #12768
  • Qwen Image Layered Support by @naykun in #12853
  • Z-Image-Turbo ControlNet by @hlky in #12792
  • Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion by @miguelmartin75 in #12852
  • more update in modular by @yiyixuxu in #12560
  • Feature: Add Mambo-G Guidance as Guider by @MatrixTeam-AI in #12862
  • Add OvisImagePipeline in AUTO_TEXT2IMAGE_PIPELINES_MAPPING by @alvarobartt in #12876
  • Cosmos Predict2.5 14b Conversion by @miguelmartin75 in #12863
  • Use T5Tokenizer instead of MT5Tokenizer (removed in Transformers v5.0+) by @alvarobartt in #12877
  • Add z-image-omni-base implementation by @RuoyiDu in #12857
  • fix torchao quantizer for new torchao versions by @vkuzo in #12901
  • fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs by @mbalabanski in #12894
  • Z-Image-Turbo from_single_file fix by @hlky in #12888
  • chore: fix dev version in setup.py by @DefTruth in #12904
  • Community Pipeline: Add z-image differential img2img by @r4inm4ker in #12882
  • Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py by @miguelmartin75 in #12914
  • Fix wan 2.1 i2v context parallel by @DefTruth in #12909
  • fix the use of device_map in CP docs by @sayakpaul in #12902
  • [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin by @sayakpaul in #12873
  • Detect 2.0 vs 2.1 ZImageControlNetModel by @hlky in #12861
  • Refactor environment variable assignments in workflow by @paulinebm in #12916
  • Add codeQL workflow by @paulinebm in #12917
  • Delete .github/workflows/codeql.yml by @paulinebm (direct commit on v0.37.0-release)
  • CodeQL workflow for security analysis by @paulinebm (direct commit on v0.37.0-release)
  • Check for attention mask in backends that don't support it by @dxqb in #12892
  • [Flux.1] improve pos embed for ascend npu by computing on npu by @zhangtao0408 in #12897
  • LTX Video 0.9.8 long multi prompt by @yaoqih in #12614
  • Add FSDP option for Flux2 by @leisuzz in #12860
  • Add transformer cache context for SkyReels-V2 pipelines & Update docs by @tolgacangoz in #12837
  • [docs] fix torchao typo. by @sayakpaul in #12883
  • Update wan.md to remove unneeded hfoptions by @sayakpaul in #12890
  • Improve docstrings and type hints in scheduling_edm_euler.py by @delmalih in #12871
  • [Modular] Video for Mellon by @asomoza in #12924
  • Add LTX 2.0 Video Pipelines by @dg845 in #12915
  • Add environment variables to checkout step by @paulinebm in #12927
  • Improve docstrings and type hints in scheduling_consistency_decoder.py by @delmalih in #12928
  • Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning by @adi776borate in #12814
  • Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #12865
  • fix the warning torch_dtype is deprecated by @msdsm in #12841
  • [NPU] npu attention enable ulysses by @TmacAaron in #12919
  • Torchao floatx version guard by @howardzhang-cv in #12923
  • Bugfix for dreambooth flux2 img2img2 by @leisuzz in #12825
  • [Modular] qwen refactor by @yiyixuxu in #12872
  • [modular] Tests for custom blocks in modular diffusers by @sayakpaul in #12557
  • [chore] remove controlnet implementations outside controlnet module. by @sayakpaul in #12152
  • [core] Handle progress bar and logging in distributed environments by @sayakpaul in #12806
  • Improve docstrings and type hints in scheduling_consistency_models.py by @delmalih in #12931
  • [Feature] MultiControlNet support for SD3Impainting by @ishan-modi in #11251
  • Laplace Scheduler for DDPM by @gapatron in #11320
  • Store vae.config.scaling_factor to prevent missing attr reference (sdxl advanced dreambooth training script) by @Teriks in #12346
  • Add thread-safe wrappers for components in pipeline (examples/server-async/utils/requestscopedpipeline.py) by @FredyRivera-dev in #12515
  • [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL by @kashif in #11573
  • Change timestep device to cpu for xla by @bhavya01 in #11501
  • [LoRA] add lora_alpha to sana README by @linoytsaban in #11780
  • Fix wrong param types, docs, and handles noise=None in scale_noise of FlowMatching schedulers by @Promisery in #11669
  • [docs] Remote inference by @stevhliu in #12372
  • Align HunyuanVideoConditionEmbedding with CombinedTimestepGuidanceTextProjEmbeddings by @samutamm in #12316
  • [Fix] syntax in QwenImageEditPlusPipeline by @SahilCarterr in #12371
  • Fix ftfy name error in Wan pipeline by @dsocek in #12314
  • [modular] error early in enable_auto_cpu_offload by @sayakpaul in #12578
  • [ChronoEdit] support multiple loras by @zhangjiewu in #12679
  • fix how is_fsdp is determined by @sayakpaul in #12960
  • [LoRA] add LoRA support to LTX-2 by @sayakpaul in #12933
  • Fix: typo in autoencoder_dc.py by @tvelovraf in #12687
  • [Modular] better docstring by @yiyixuxu in #12932
  • [docs] polish caching docs. by @sayakpaul in #12684
  • Fix typos by @omahs in #12705
  • Fix link to diffedit implementation reference by @JuanFKurucz in #12708
  • Fix QwenImage txt_seq_lens handling by @kashif in #12702
  • Bugfix for flux2 img2img2 prediction by @leisuzz in #12855
  • Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests by @dg845 in #12962
  • Add Unified Sequence Parallel attention by @Bissmella in #12693
  • [Modular] Changes for using WAN I2V by @asomoza in #12959
  • Z rz rz rz rz rz rz r cogview by @sayakpaul in #12973
  • Update distributed_inference.md to reposition sections by @sayakpaul in #12971
  • [chore] make transformers version check stricter for glm image. by @sayakpaul in #12974
  • Remove 8bit device restriction by @SunMarc in #12972
  • disable_mmap in pipeline from_pretrained by @hlky in #12854
  • [Modular] mellon utils by @yiyixuxu in #12978
  • LongCat Image pipeline: Allow offloading/quantization of text_encoder component by @Yahweasel in #12963
  • Add ChromaInpaintPipeline by @hameerabbasi in #12848
  • fix Qwen-Image series context parallel by @DefTruth in #12970
  • Flux2 klein by @yiyixuxu in #12982
  • [modular] fix a bug in mellon param & improve docstrings by @yiyixuxu in #12980
  • add klein docs. by @sayakpaul in #12984
  • LTX 2 Single File Support by @dg845 in #12983
  • [core] gracefully error out when attn-backend x cp combo isn't supported. by @sayakpaul in #12832
  • Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py by @delmalih in #12936
  • [Docs] Replace root CONTRIBUTING.md with symlink to source docs by @delmalih in #12986
  • make style && make quality by @sayakpaul (direct commit on v0.37.0-release)
  • Revert "make style && make quality" by @sayakpaul (direct commit on v0.37.0-release)
  • [chore] make style to push new changes. by @sayakpaul in #12998
  • Fibo edit pipeline by @galbria in #12930
  • Fix variable name in docstring for PeftAdapterMixin.set_adapters by @geekuillaume in #13003
  • Improve docstrings and type hints in scheduling_ddim_cogvideox.py by @delmalih in #12992
  • [scheduler] Support custom sigmas in UniPCMultistepScheduler by @a-r-r-o-w in #12109
  • feat: accelerate longcat-image with regional compile by @lgyStoic in #13019
  • Improve docstrings and type hints in scheduling_ddim_flax.py by @delmalih in #13010
  • Improve docstrings and type hints in scheduling_ddim_inverse.py by @delmalih in #13020
  • fix Dockerfiles for cuda and xformers. by @sayakpaul in #13022
  • Resnet only use contiguous in training mode. by @jiqing-feng in #12977
  • feat: add qkv projection fuse for longcat transformers by @lgyStoic in #13021
  • Improve docstrings and type hints in scheduling_ddim_parallel.py by @delmalih in #13023
  • Improve docstrings and type hints in scheduling_ddpm_flax.py by @delmalih in #13024
  • Improve docstrings and type hints in scheduling_ddpm_parallel.py by @delmalih in #13027
  • Remove *pooled_* mentions from Chroma inpaint by @hameerabbasi in #13026
  • Flag Flax schedulers as deprecated by @delmalih in #13031
  • [modular] add auto_docstring & more doc related refactors by @yiyixuxu in #12958
  • Upgrade GitHub Actions to latest versions by @salmanmkc in #12866
  • [From Single File] support from_single_file method for WanAnimateTransformer3DModel by @samadwar in #12691
  • Fix: Cosmos2.5 Video2World frame extraction and add default negative prompt by @adi776borate in #13018
  • [GLM-Image] Add batch support for GlmImagePipeline by @JaredforReal in #13007
  • [Qwen] avoid creating attention masks when there is no padding by @kashif in #12987
  • [modular]support klein by @yiyixuxu in #13002
  • [QwenImage] fix prompt isolation tests by @sayakpaul in #13042
  • fast tok update by @itazap in #13036
  • change to CUDA 12.9. by @sayakpaul in #13045
  • remove torchao autoquant from diffusers docs by @vkuzo in #13048
  • docs: improve docstring scheduling_dpm_cogvideox.py by @delmalih in #13044
  • Fix Wan/WanI2V patchification by @Jayce-Ping in #13038
  • LTX2 distilled checkpoint support by @rootonchair in #12934
  • [wan] fix layerwise upcasting tests on CPU by @sayakpaul in #13039
  • [ci] uniform run times and wheels for pytorch cuda. by @sayakpaul in #13047
  • docs: fix grammar in fp16_safetensors CLI warning by @Olexandr88 in #13040
  • [wan] fix wan 2.2 when either of the transformers isn't present. by @sayakpaul in #13055
  • [bug fix] GLM-Image fit new get_image_features API by @JaredforReal in #13052
  • Fix aiter availability check by @lauri9 in #13059
  • [Modular]add a real quick start guide by @yiyixuxu in #13029
  • feat: support Ulysses Anything Attention by @DefTruth in #12996
  • Refactor Model Tests by @DN6 in #12822
  • [Flux2] Fix LoRA loading for Flux2 Klein by adaptively enumerating transformer blocks by @songkey in #13030
  • [Modular] loader related by @yiyixuxu in #13025
  • [Modular] mellon doc etc by @yiyixuxu in #13051
  • [modular] change the template modular pipeline card by @sayakpaul in #13072
  • Add support for Magcache by @AlanPonnachan in #12744
  • [docs] Fix syntax error in quantization configuration by @sayakpaul in #13076
  • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13083
  • [core] make flux hidden states contiguous by @sayakpaul in #13068
  • [core] make qwen hidden states contiguous to make torchao happy. by @sayakpaul in #13081
  • Feature/zimage inpaint pipeline by @CalamitousFelicitousness in #13006
  • GGUF fix for unquantized types when using unquantize kernels by @dxqb in #12498
  • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py by @delmalih in #13085
  • [modular]simplify components manager doc by @yiyixuxu in #13088
  • ZImageControlNet cfg by @hlky in #13080
  • [Modular] refactor Wan: modular pipelines by task etc by @yiyixuxu in #13063
  • [Modular] guard ModularPipeline.blocks attribute by @yiyixuxu in #13014
  • LTX 2 Improve encode_video by Accepting More Input Types by @dg845 in #13057
  • Z image lora training by @linoytsaban in #13056
  • [modular] add modular tests for Z-Image and Wan by @sayakpaul in #13078
  • [Docs] Add guide for AutoModel with custom code by @DN6 in #13099
  • [SkyReelsV2] Fix ftfy import by @asomoza in #13113
  • [lora] fix non-diffusers lora key handling for flux2 by @sayakpaul in #13119
  • [CI] Refactor Wan Model Tests by @DN6 in #13082
  • docs: improve docstring scheduling_edm_dpmsolver_multistep.py by @delmalih in #13122
  • [Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline by @JaredforReal in #13092
  • docs: improve docstring scheduling_flow_match_euler_discrete.py by @delmalih in #13127
  • Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} by @miguelmartin75 in #13066
  • [modular] add tests for robust model loading. by @sayakpaul in #13120
  • Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled by @dg845 in #13121
  • [CI] Fix setuptools pkg_resources Errors by @dg845 in #13129
  • docs: improve docstring scheduling_flow_match_heun_discrete.py by @delmalih in #13130
  • [CI] Fix setuptools pkg_resources Bug for PR GPU Tests by @dg845 in #13132
  • fix cosmos transformer typing. by @sayakpaul in #13134
  • Sunset Python 3.8 & get rid of explicit typing exports where possible by @sayakpaul in #12524
  • feat: implement apply_lora_scale to remove boilerplate. by @sayakpaul in #12994
  • [docs] fix ltx2 i2v docstring. by @sayakpaul in #13135
  • [Modular] add different pipeine blocks to init by @yiyixuxu in #13145
  • fix MT5Tokenizer by @yiyixuxu in #13146
  • fix guider by @yiyixuxu in #13147
  • [Modular] update doc for ModularPipeline by @yiyixuxu in #13100
  • [Modular] add explicit workflow support by @yiyixuxu in #13028
  • [LTX2] Fix wrong lora mixin by @asomoza in #13144
  • [Pipelines] Remove k-diffusion by @DN6 in #13152
  • [tests] accept recompile_limit from the user in tests by @sayakpaul in #13150
  • [core] support device type device_maps to work with offloading. by @sayakpaul in #12811
  • [Bug] Fix QwenImageEditPlus Series on NPU by @zhangtao0408 in #13017
  • [CI] Add ftfy as a test dependency by @DN6 in #13155
  • docs: improve docstring scheduling_flow_match_lcm.py by @delmalih in #13160
  • [docs] add docs for qwenimagelayered by @stevhliu in #13158
  • Flux2: Tensor tuples can cause issues for checkpointing by @dxqb in #12777
  • [CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated by @dg845 in #13149
  • Fix ftfy import for PRX Pipeline by @dg845 in #13154
  • [core] Enable CP for kernels-based attention backends by @sayakpaul in #12812
  • remove deps related to test from ci by @sayakpaul in #13164
  • [CI] Fix new LoRAHotswap tests by @DN6 in #13163
  • [gguf][torch.compile time] Convert to plain tensor earlier in dequantize_gguf_tensor by @anijain2305 in #13166
  • Support Flux Klein peft (fal) lora format by @asomoza in #13169
  • Fix T5GemmaEncoder loading for transformers 5.x composite T5GemmaConfig by @DavidBert in #13143
  • Allow Automodel to use from_config with custom code. by @DN6 in #13123
  • Fix AutoModel typing Import Error by @dg845 in #13178
  • migrate to transformers v5 by @sayakpaul in #12976
  • fix: graceful fallback when attention backends fail to import by @sym-bot in #13060
  • [docs] Fix torchrun command argument order in docs by @sayakpaul in #13181
  • [attention backends] use dedicated wrappers from fa3 for cp. by @sayakpaul in #13165
  • Cosmos Transfer2.5 Auto-Regressive Inference Pipeline by @miguelmartin75 in #13114
  • Fix wrong do_classifier_free_guidance threshold in ZImagePipeline by @kirillsst in #13183
  • Fix Flash Attention 3 interface for new FA3 return format by @veeceey in #13173
  • Fix LTX-2 image-to-video generation failure in two stages generation by @Songrui625 in #13187
  • Fixing Kohya loras loading: Flux.1-dev loras with TE ("lora_te1_" prefix) by @christopher5106 in #13188
  • [Modular] update the auto pipeline blocks doc by @yiyixuxu in #13148
  • [tests] consistency tests for modular index by @sayakpaul in #13192
  • [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline by @yiyixuxu in #13193
  • [chore] updates in the pypi publication workflow. by @sayakpaul in #12805
  • [tests] enable cpu offload test in torchao without compilation. by @sayakpaul in #12704
  • remove db utils from benchmarking by @sayakpaul in #13199
  • [AutoModel] Fix bug with subfolders and local model paths when loading custom code by @DN6 in #13197
  • [AutoModel] Allow registering auto_map to model config by @DN6 in #13186
  • [Modular] Save Modular Pipeline weights to Hub by @DN6 in #13168
  • docs: improve docstring scheduling_ipndm.py by @delmalih in #13198
  • Clean up accidental files by @DN6 in #13202
  • [modular]Update model card to include workflow by @yiyixuxu in #13195
  • [modular] not pass trust_remote_code to external repos by @yiyixuxu in #13204
  • [Modular] implement requirements validation for custom blocks by @sayakpaul in #12196
  • cogvideo example: Distribute VAE video encoding across processes in CogVideoX LoRA training by @jiqing-feng in #13207
  • Fix group-offloading bug by @SHYuanBest in #13211
  • Add Helios-14B Video Generation Pipelines by @dg845 in #13208
  • [Z-Image] Fix more do_classifier_free_guidance thresholds by @asomoza in #13212
  • [lora] fix zimage lora conversion to support for more lora. by @sayakpaul in #13209
  • adding lora support to z-image controlnet pipelines by @christopher5106 in #13200
  • Add LTX2 Condition Pipeline by @dg845 in #13058
  • Fix Helios paper link in documentation by @SHYuanBest in #13213
  • [attention backends] change to updated repo and version. by @sayakpaul in #13161
  • feat: implement rae autoencoder. by @Ando233 in #13046
  • Release: v0.37.0-release by @sayakpaul (direct commit on v0.37.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @delmalih
    • Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)
    • Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
    • Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
    • Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
    • Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)
    • [Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)
    • Improve docstrings and type hints in scheduling_ddim_cogvideox.py (#12992)
    • Improve docstrings and type hints in scheduling_ddim_flax.py (#13010)
    • Improve docstrings and type hints in scheduling_ddim_inverse.py (#13020)
    • Improve docstrings and type hints in scheduling_ddim_parallel.py (#13023)
    • Improve docstrings and type hints in scheduling_ddpm_flax.py (#13024)
    • Improve docstrings and type hints in scheduling_ddpm_parallel.py (#13027)
    • Flag Flax schedulers as deprecated (#13031)
    • docs: improve docstring scheduling_dpm_cogvideox.py (#13044)
    • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13083)
    • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13085)
    • docs: improve docstring scheduling_edm_dpmsolver_multistep.py (#13122)
    • docs: improve docstring scheduling_flow_match_euler_discrete.py (#13127)
    • docs: improve docstring scheduling_flow_match_heun_discrete.py (#13130)
    • docs: improve docstring scheduling_flow_match_lcm.py (#13160)
    • docs: improve docstring scheduling_ipndm.py (#13198)
  • @yiyixuxu
    • [Modular]z-image (#12808)
    • more update in modular (#12560)
    • [Modular] qwen refactor (#12872)
    • [Modular] better docstring (#12932)
    • [Modular] mellon utils (#12978)
    • Flux2 klein (#12982)
    • [modular] fix a bug in mellon param & improve docstrings (#12980)
    • [modular] add auto_docstring & more doc related refactors (#12958)
    • [modular]support klein (#13002)
    • [Modular]add a real quick start guide (#13029)
    • [Modular] loader related (#13025)
    • [Modular] mellon doc etc (#13051)
    • [modular]simplify components manager doc (#13088)
    • [Modular] refactor Wan: modular pipelines by task etc (#13063)
    • [Modular] guard ModularPipeline.blocks attribute (#13014)
    • [Modular] add different pipeine blocks to init (#13145)
    • fix MT5Tokenizer (#13146)
    • fix guider (#13147)
    • [Modular] update doc for ModularPipeline (#13100)
    • [Modular] add explicit workflow support (#13028)
    • [Modular] update the auto pipeline blocks doc (#13148)
    • [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline (#13193)
    • [modular]Update model card to include workflow (#13195)
    • [modular] not pass trust_remote_code to external repos (#13204)
  • @sayakpaul
    • Fix Qwen Edit Plus modular for multi-image input (#12601)
    • [docs] improve distributed inference cp docs. (#12810)
    • post release 0.36.0 (#12804)
    • Update distributed_inference.md to correct syntax (#12827)
    • [lora] Remove lora docs unneeded and add " # Copied from ..." (#12824)
    • fix the use of device_map in CP docs (#12902)
    • [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin (#12873)
    • [docs] fix torchao typo. (#12883)
    • Update wan.md to remove unneeded hfoptions (#12890)
    • [modular] Tests for custom blocks in modular diffusers (#12557)
    • [chore] remove controlnet implementations outside controlnet module. (#12152)
    • [core] Handle progress bar and logging in distributed environments (#12806)
    • [modular] error early in enable_auto_cpu_offload (#12578)
    • fix how is_fsdp is determined (#12960)
    • [LoRA] add LoRA support to LTX-2 (#12933)
    • [docs] polish caching docs. (#12684)
    • Z rz rz rz rz rz rz r cogview (#12973)
    • Update distributed_inference.md to reposition sections (#12971)
    • [chore] make transformers version check stricter for glm image. (#12974)
    • add klein docs. (#12984)
    • [core] gracefully error out when attn-backend x cp combo isn't supported. (#12832)
    • make style && make quality
    • Revert "make style && make quality"
    • [chore] make style to push new changes. (#12998)
    • fix Dockerfiles for cuda and xformers. (#13022)
    • [QwenImage] fix prompt isolation tests (#13042)
    • change to CUDA 12.9. (#13045)
    • [wan] fix layerwise upcasting tests on CPU (#13039)
    • [ci] uniform run times and wheels for pytorch cuda. (#13047)
    • [wan] fix wan 2.2 when either of the transformers isn't present. (#13055)
    • [modular] change the template modular pipeline card (#13072)
    • [docs] Fix syntax error in quantization configuration (#13076)
    • [core] make flux hidden states contiguous (#13068)
    • [core] make qwen hidden states contiguous to make torchao happy. (#13081)
    • [modular] add modular tests for Z-Image and Wan (#13078)
    • [lora] fix non-diffusers lora key handling for flux2 (#13119)
    • [modular] add tests for robust model loading. (#13120)
    • fix cosmos transformer typing. (#13134)
    • Sunset Python 3.8 & get rid of explicit typing exports where possible (#12524)
    • feat: implement apply_lora_scale to remove boilerplate. (#12994)
    • [docs] fix ltx2 i2v docstring. (#13135)
    • [tests] accept recompile_limit from the user in tests (#13150)
    • [core] support device type device_maps to work with offloading. (#12811)
    • [core] Enable CP for kernels-based attention backends (#12812)
    • remove deps related to test from ci (#13164)
    • migrate to transformers v5 (#12976)
    • [docs] Fix torchrun command argument order in docs (#13181)
    • [attention backends] use dedicated wrappers from fa3 for cp. (#13165)
    • [tests] consistency tests for modular index (#13192)
    • [chore] updates in the pypi publication workflow. (#12805)
    • [tests] enable cpu offload test in torchao without compilation. (#12704)
    • remove db utils from benchmarking (#13199)
    • [Modular] implement requirements validation for custom blocks (#12196)
    • [lora] fix zimage lora conversion to support for more lora. (#13209)
    • [attention backends] change to updated repo and version. (#13161)
    • Release: v0.37.0-release
  • @DN6
    • [WIP] Add Flux2 modular (#12763)
    • Refactor Model Tests (#12822)
    • [Docs] Add guide for AutoModel with custom code (#13099)
    • [CI] Refactor Wan Model Tests (#13082)
    • [Pipelines] Remove k-diffusion (#13152)
    • [CI] Add ftfy as a test dependency (#13155)
    • [CI] Fix new LoRAHotswap tests (#13163)
    • Allow Automodel to use from_config with custom code. (#13123)
    • [AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)
    • [AutoModel] Allow registering auto_map to model config (#13186)
    • [Modular] Save Modular Pipeline weights to Hub (#13168)
    • Clean up accidental files (#13202)
  • @naykun
    • [qwen-image] edit 2511 support (#12839)
    • Qwen Image Layered Support (#12853)
  • @junqiangwu
    • Add support for LongCat-Image (#12828)
    • fix the prefix_token_len bug (#12845)
  • @hlky
    • Z-Image-Turbo ControlNet (#12792)
    • Z-Image-Turbo from_single_file fix (#12888)
    • Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)
    • disable_mmap in pipeline from_pretrained (#12854)
    • ZImageControlNet cfg (#13080)
  • @miguelmartin75
    • Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)
    • Cosmos Predict2.5 14b Conversion (#12863)
    • Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914)
    • Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)
    • Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)
  • @RuoyiDu
    • Add z-image-omni-base implementation (#12857)
  • @r4inm4ker
    • Community Pipeline: Add z-image differential img2img (#12882)
  • @yaoqih
    • LTX Video 0.9.8 long multi prompt (#12614)
  • @dg845
    • Add LTX 2.0 Video Pipelines (#12915)
    • Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests (#12962)
    • LTX 2 Single File Support (#12983)
    • LTX 2 Improve encode_video by Accepting More Input Types (#13057)
    • Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled (#13121)
    • [CI] Fix setuptools pkg_resources Errors (#13129)
    • [CI] Fix setuptools pkg_resources Bug for PR GPU Tests (#13132)
    • [CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated (#13149)
    • Fix ftfy import for PRX Pipeline (#13154)
    • Fix AutoModel typing Import Error (#13178)
    • Add Helios-14B Video Generation Pipelines (#13208)
    • Add LTX2 Condition Pipeline (#13058)
  • @kashif
    • [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)
    • Fix QwenImage txt_seq_lens handling (#12702)
    • [Qwen] avoid creating attention masks when there is no padding (#12987)
  • @bhavya01
    • Change timestep device to cpu for xla (#11501)
  • @linoytsaban
    • [LoRA] add lora_alpha to sana README (#11780)
    • Z image lora training (#13056)
  • @stevhliu
    • [docs] Remote inference (#12372)
    • [docs] add docs for qwenimagelayered (#13158)
  • @hameerabbasi
    • Add ChromaInpaintPipeline (#12848)
    • Remove *pooled_* mentions from Chroma inpaint (#13026)
  • @galbria
    • Fibo edit pipeline (#12930)
  • @JaredforReal
    • [GLM-Image] Add batch support for GlmImagePipeline (#13007)
    • [bug fix] GLM-Image fit new get_image_features API (#13052)
    • [Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline (#13092)
  • @rootonchair
    • LTX2 distilled checkpoint support (#12934)
  • @AlanPonnachan
    • Add support for Magcache (#12744)
  • @CalamitousFelicitousness
    • Feature/zimage inpaint pipeline (#13006)
  • @Ando233
    • feat: implement rae autoencoder. (#13046)
Dec 8, 2025
Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄

The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀

New image pipelines

New video pipelines

  • Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in https://github.com/huggingface/diffusers/pull/12634.
  • Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in https://github.com/huggingface/diffusers/pull/12478.
  • Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
  • Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.

New kernels-powered attention backends

The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:

  • Flash Attention 3 (+ its varlen variant)
  • Flash Attention 2 (+ its varlen variant)
  • SAGE

This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:

# Make sure you have `kernels` installed: `pip install kernels`.
# You can choose `flash_hub` or `sage_hub`, too.
pipe.transformer.set_attention_backend("_flash_3_hub")

For more details, check out the documentation.

TaylorSeer cache

TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in https://github.com/huggingface/diffusers/pull/12648. Check out the documentation here.

New training script

Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.

Misc

All commits

  • remove unneeded checkpoint imports. by @sayakpaul in #12488
  • [tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
  • ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
  • [docs] Attention checks by @stevhliu in #12486
  • [CI] Check links by @stevhliu in #12491
  • [ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
  • [tests] introduce VAETesterMixin to consolidate tests for slicing and tiling by @sayakpaul in #12374
  • docs: cleanup of runway model by @EazyAl in #12503
  • Kandinsky 5 is finally in Diffusers! by @leffff in #12478
  • Remove Qwen Image Redundant RoPE Cache by @dg845 in #12452
  • Raise warning instead of error when imports are missing for custom code by @DN6 in #12513
  • Fix: Use incorrect temporary variable key when replacing adapter name… by @FeiXie8 in #12502
  • [docs] Organize toctree by modality by @stevhliu in #12514
  • styling issues. by @sayakpaul in #12522
  • Add Photon model and pipeline support by @DavidBert in #12456
  • purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet by @Vaibhavs10 in #12497
  • Prx by @DavidBert in #12525
  • [core] AutoencoderMixin to abstract common methods by @sayakpaul in #12473
  • Kandinsky5 No cfg fix by @asomoza in #12527
  • Fix: Add _skip_keys for AutoencoderKLWan by @yiyixuxu in #12523
  • [CI] xfail the test_wuerstchen_prior test by @sayakpaul in #12530
  • [tests] Test attention backends by @sayakpaul in #12388
  • fix CI bug for kandinsky3_img2img case by @kaixuanliu in #12474
  • Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432 by @Aishwarya0811 in #12449
  • Handle deprecated transformer classes by @DN6 in #12517
  • fix constants.py to user upper() by @sayakpaul in #12479
  • HunyuanImage21 by @yiyixuxu in #12333
  • Loose the criteria tolerance appropriately for Intel XPU devices by @kaixuanliu in #12460
  • Deprecate Stable Cascade by @DN6 in #12537
  • [chore] Move guiders experimental warning by @sayakpaul in #12543
  • Fix Chroma attention padding order and update docs to use lodestones/Chroma1-HD by @josephrocca in #12508
  • Add AITER attention backend by @lauri9 in #12549
  • Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by @alirezafarashah in #12531
  • Kandinsky 5 10 sec (NABLA suport) by @leffff in #12520
  • Improve pos embed for Flux.1 inference on Ascend NPU by @gameofdimension in #12534
  • support latest few-step wan LoRA. by @sayakpaul in #12541
  • [Pipelines] Enable Wan VACE to run since single transformer by @DN6 in #12428
  • fix crash if tiling mode is enabled by @sywangyi in #12521
  • Fix typos in kandinsky5 docs by @Meatfucker in #12552
  • [ci] don't run sana layerwise casting tests in CI. by @sayakpaul in #12551
  • Bria fibo by @galbria in #12545
  • Avoiding graph break by changing the way we infer dtype in vae.decoder by @ppadjinTT in #12512
  • [Modular] Fix for custom block kwargs by @DN6 in #12561
  • [Modular] Allow custom blocks to be saved to local_dir by @DN6 in #12381
  • Fix Stable Diffusion 3.x pooled prompt embedding with multiple images by @friedrich in #12306
  • Fix custom code loading in Automodel by @DN6 in #12571
  • [modular] better warn message by @yiyixuxu in #12573
  • [tests] add tests for flux modular (t2i, i2i, kontext) by @sayakpaul in #12566
  • [modular]pass hub_kwargs to load_config by @yiyixuxu in #12577
  • ulysses enabling in native attention path by @sywangyi in #12563
  • Kandinsky 5.0 Docs fixes by @leffff in #12582
  • [docs] sort doc by @sayakpaul in #12586
  • [LoRA] add support for more Qwen LoRAs by @linoytsaban in #12581
  • [Modular] Allow ModularPipeline to load from revisions by @DN6 in #12592
  • Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py by @turian in #12596
  • [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference by @lawrence-cj in #12584
  • Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch) by @MohammadSadeghSalehi in #12546
  • [Modular] Some clean up for Modular tests by @DN6 in #12579
  • feat: enable attention dispatch for huanyuan video by @DefTruth in #12591
  • fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled by @sywangyi in #12562
  • [CI] Push test fix by @DN6 in #12617
  • add ChronoEdit by @zhangjiewu in #12593
  • [modular] wan! by @yiyixuxu in #12611
  • [CI] Fix typo in uv install by @DN6 in #12618
  • fix: correct import path for load_model_dict_into_meta in conversion scripts by @yashwantbezawada in #12616
  • Fix Context Parallel validation checks by @DN6 in #12446
  • [Modular] Clean up docs by @DN6 in #12604
  • Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples by @cesaryuan in #12544
  • [CI] Remove unittest dependency from testing_utils.py by @DN6 in #12621
  • Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers by @charchit7 in #12594
  • fix copies by @yiyixuxu in #12637
  • Add MLU Support. by @a120092009 in #12629
  • fix dispatch_attention_fn check by @yiyixuxu in #12636
  • [modular] add tests for qwen modular by @sayakpaul in #12585
  • ArXiv -> HF Papers by @qgallouedec in #12583
  • [docs] Update install instructions by @stevhliu in #12626
  • [modular] add a check by @yiyixuxu in #12628
  • Improve docstrings and type hints in scheduling_amused.py by @delmalih in #12623
  • [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by @dg845 in #12526
  • adjust unit tests for test_save_load_float16 by @kaixuanliu in #12500
  • skip autoencoderdl layerwise casting memory by @sayakpaul in #12647
  • [utils] Update check_doc_toc by @stevhliu in #12642
  • [docs] AutoModel by @stevhliu in #12644
  • Improve docstrings and type hints in scheduling_ddim.py by @delmalih in #12622
  • Improve docstrings and type hints in scheduling_ddpm.py by @delmalih in #12651
  • [Modular] Add Custom Blocks guide to doc by @DN6 in #12339
  • Improve docstrings and type hints in scheduling_euler_discrete.py by @delmalih in #12654
  • Update Wan Animate Docs by @dg845 in #12658
  • Rope in float32 for mps or npu compatibility by @DavidBert in #12665
  • [PRX pipeline]: add 1024 resolution ratio bins by @DavidBert in #12670
  • SANA-Video Image to Video pipeline SanaImageToVideoPipeline support by @lawrence-cj in #12634
  • [CI] Make CI logs less verbose by @DN6 in #12674
  • Revert AutoencoderKLWan's dim_mult default value back to list by @dg845 in #12640
  • [CI] Temporarily pin transformers by @DN6 in #12677
  • [core] Refactor hub attn kernels by @sayakpaul in #12475
  • [CI] Fix indentation issue in workflow files by @DN6 in #12685
  • [CI] Fix failing Pipeline CPU tests by @DN6 in #12681
  • Improve docstrings and type hints in scheduling_pndm.py by @delmalih in #12676
  • Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet by @pratim4dasude in #12649
  • Improve docstrings and type hints in scheduling_lms_discrete.py by @delmalih in #12678
  • Add FluxLoraLoaderMixin to Fibo pipeline by @SwayStar123 in #12688
  • bugfix: fix chrono-edit context parallel by @DefTruth in #12660
  • [core] support sage attention + FA2 through kernels by @sayakpaul in #12439
  • [i8n-pt] Fix grammar and expand Portuguese documentation by @cdutr in #12598
  • Fix variable naming typos in community FluxControlNetFillInpaintPipeline by @sqhuang in #12701
  • fix typo in docs by @lawrence-cj in #12675
  • Add Support for Z-Image Series by @JerryWu-code in #12703
  • let's go Flux2 🚀 by @sayakpaul in #12711
  • Update script names in README for Flux2 training by @anvilarth in #12713
  • [lora]: Fix Flux2 LoRA NaN test by @sayakpaul in #12714
  • [docs] Correct flux2 links by @sayakpaul in #12716
  • [docs] put autopipeline after overview and hunyuanimage in images by @sayakpaul in #12548
  • Improve docstrings and type hints in scheduling_dpmsolver_multistep.py by @delmalih in #12710
  • Support unittest for Z-image ⚡️ by @JerryWu-code in #12715
  • [chore] remove torch.save from remnant code. by @sayakpaul in #12717
  • Enable regional compilation on z-image transformer model by @sayakpaul in #12736
  • Fix examples not loading LoRA adapter weights from checkpoint by @SurAyush in #12690
  • [Modular] Add single file support to Modular by @DN6 in #12383
  • fix type-check for z-image transformer by @DefTruth in #12739
  • Hunyuanvideo15 by @yiyixuxu in #12696
  • [Docs] Update Imagen Video paper link in schedulers by @delmalih in #12724
  • Improve docstrings and type hints in scheduling_heun_discrete.py by @delmalih in #12726
  • Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py by @delmalih in #12766
  • fix FLUX.2 context parallel by @DefTruth in #12737
  • Rename BriaPipeline to BriaFiboPipeline in documentation by @galbria in #12758
  • Update bria_fibo.md with minor fixes by @sayakpaul in #12731
  • [feat]: implement "local" caption upsampling for Flux.2 by @sayakpaul in #12718
  • Add ZImage LoRA support and integrate into ZImagePipeline by @CalamitousFelicitousness in #12750
  • Add support for Ovis-Image by @DoctorKey in #12740
  • Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. by @JerryWu-code in #12770
  • Fixes #12673. record_stream in group offloading is not working properly by @KimbingNg in #12721
  • [core] start varlen variants for attn backend kernels. by @sayakpaul in #12765
  • [core] reuse AttentionMixin for compatible classes by @sayakpaul in #12463
  • Deprecate upcast_vae in SDXL based pipelines by @DN6 in #12619
  • Kandinsky 5.0 Video Pro and Image Lite by @leffff in #12664
  • Fix: leaf_level offloading breaks after delete_adapters by @adi776borate in #12639
  • [tests] fix hunuyanvideo 1.5 offloading tests. by @sayakpaul in #12782
  • [Z-Image] various small changes, Z-Image transformer tests, etc. by @sayakpaul in #12741
  • Z-Image-Turbo from_single_file by @hlky in #12756
  • Update attention_backends.md to format kernels by @sayakpaul in #12757
  • Improve docstrings and type hints in scheduling_unipc_multistep.py by @delmalih in #12767
  • fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode by @jerry2102 in #12753
  • [lora] support more ZImage LoRAs by @sayakpaul in #12790
  • PRX Set downscale_freq_shift to 0 for consistency with internal implementation by @DavidBert in #12791
  • Fix broken group offloading with block_level for models with standalone layers by @rycerzes in #12692
  • [Docs] Add Z-Image docs by @asomoza in #12775
  • move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)
  • [docs] minor fixes to kandinsky docs by @sayakpaul in #12797
  • Improve docstrings and type hints in scheduling_deis_multistep.py by @delmalih in #12796
  • [Feat] TaylorSeer Cache by @toilaluan in #12648
  • Update the TensorRT-ModelOPT to Nvidia-ModelOPT by @jingyu-ml in #12793
  • add post init for safty checker by @jiqing-feng in #12794
  • [HunyuanVideo1.5] support step-distilled by @yiyixuxu in #12802
  • Add ZImageImg2ImgPipeline by @CalamitousFelicitousness in #12751
  • Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @yiyixuxu
    • ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
    • Fix: Add _skip_keys for AutoencoderKLWan (#12523)
    • HunyuanImage21 (#12333)
    • [modular] better warn message (#12573)
    • [modular]pass hub_kwargs to load_config (#12577)
    • [modular] wan! (#12611)
    • fix copies (#12637)
    • fix dispatch_attention_fn check (#12636)
    • [modular] add a check (#12628)
    • Hunyuanvideo15 (#12696)
    • [HunyuanVideo1.5] support step-distilled (#12802)
  • @leffff
    • Kandinsky 5 is finally in Diffusers! (#12478)
    • Kandinsky 5 10 sec (NABLA suport) (#12520)
    • Kandinsky 5.0 Docs fixes (#12582)
    • Kandinsky 5.0 Video Pro and Image Lite (#12664)
  • @dg845
    • Remove Qwen Image Redundant RoPE Cache (#12452)
    • [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)
    • Update Wan Animate Docs (#12658)
    • Revert AutoencoderKLWan's dim_mult default value back to list (#12640)
  • @DN6
    • Raise warning instead of error when imports are missing for custom code (#12513)
    • Handle deprecated transformer classes (#12517)
    • Deprecate Stable Cascade (#12537)
    • [Pipelines] Enable Wan VACE to run since single transformer (#12428)
    • [Modular] Fix for custom block kwargs (#12561)
    • [Modular] Allow custom blocks to be saved to local_dir (#12381)
    • Fix custom code loading in Automodel (#12571)
    • [Modular] Allow ModularPipeline to load from revisions (#12592)
    • [Modular] Some clean up for Modular tests (#12579)
    • [CI] Push test fix (#12617)
    • [CI] Fix typo in uv install (#12618)
    • Fix Context Parallel validation checks (#12446)
    • [Modular] Clean up docs (#12604)
    • [CI] Remove unittest dependency from testing_utils.py (#12621)
    • [Modular] Add Custom Blocks guide to doc (#12339)
    • [CI] Make CI logs less verbose (#12674)
    • [CI] Temporarily pin transformers (#12677)
    • [CI] Fix indentation issue in workflow files (#12685)
    • [CI] Fix failing Pipeline CPU tests (#12681)
    • [Modular] Add single file support to Modular (#12383)
    • Deprecate upcast_vae in SDXL based pipelines (#12619)
  • @DavidBert
    • Add Photon model and pipeline support (#12456)
    • Prx (#12525)
    • Rope in float32 for mps or npu compatibility (#12665)
    • [PRX pipeline]: add 1024 resolution ratio bins (#12670)
    • PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)
  • @galbria
    • Bria fibo (#12545)
    • Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)
  • @lawrence-cj
    • [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)
    • SANA-Video Image to Video pipeline SanaImageToVideoPipeline support (#12634)
    • fix typo in docs (#12675)
  • @zhangjiewu
    • add ChronoEdit (#12593)
  • @delmalih
    • Improve docstrings and type hints in scheduling_amused.py (#12623)
    • Improve docstrings and type hints in scheduling_ddim.py (#12622)
    • Improve docstrings and type hints in scheduling_ddpm.py (#12651)
    • Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)
    • Improve docstrings and type hints in scheduling_pndm.py (#12676)
    • Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)
    • Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)
    • [Docs] Update Imagen Video paper link in schedulers (#12724)
    • Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)
    • Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)
    • Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)
    • Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)
  • @pratim4dasude
    • Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)
  • @JerryWu-code
    • Add Support for Z-Image Series (#12703)
    • Support unittest for Z-image ⚡️ (#12715)
    • Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)
  • @CalamitousFelicitousness
    • Add ZImage LoRA support and integrate into ZImagePipeline (#12750)
    • Add ZImageImg2ImgPipeline (#12751)
  • @DoctorKey
    • Add support for Ovis-Image (#12740)
Oct 15, 2025
🐞 fixes for `transformers` models, imports,

All commits

  • Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)
  • handle offload_state_dict when initing transformers models by @sayakpaul in #12438
  • [CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue by @DN6 in #12354
  • Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.… by @Aishwarya0811 in #12206
  • fix scale_shift_factor being on cpu for wan and ltx by @vladmandic in #12347
  • Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)
Aug 20, 2025
v0.35.1 for improvements in Qwen-Image Edit

Thanks to @naykun for the following PRs that improve Qwen-Image Edit:

Aug 19, 2025
Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

  • Wan 2.2
  • Flux-Kontext
  • Qwen-Image
  • Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

from diffusers import DiffusionPipeline
import torch 

ckpt_id = "Qwen/Qwen-Image"
pipe = DiffusionPipeline.from_pretrained(
-    ckpt_id, torch_dtype=torch.bfloat16
- ).to("cuda")
+    ckpt_id, torch_dtype=torch.bfloat16, device_map="cuda"
+ )				

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

import os
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes"

# rest of the loading code
....

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

  • [tests] skip instead of returning. by @sayakpaul in #11793
  • adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
  • fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
  • [chore] post release v0.34.0 by @sayakpaul in #11800
  • Follow up for Group Offload to Disk by @DN6 in #11760
  • [rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
  • [tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
  • adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809
  • Flux Kontext by @a-r-r-o-w in #11812
  • Kontext training by @sayakpaul in #11813
  • Kontext fixes by @a-r-r-o-w in #11815
  • remove syncs before denoising in Kontext by @sayakpaul in #11818
  • [CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803
  • TorchAO compile + offloading tests by @a-r-r-o-w in #11697
  • Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804
  • [lora] fix: lora unloading behvaiour by @sayakpaul in #11822
  • [lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806
  • ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834
  • Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836
  • [tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825
  • reset deterministic in tearDownClass by @jiqing-feng in #11785
  • [tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835
  • [single file] Cosmos by @a-r-r-o-w in #11801
  • [docs] fix single_file example. by @sayakpaul in #11847
  • Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649
  • [docs] Batch generation by @stevhliu in #11841
  • [docs] Deprecated pipelines by @stevhliu in #11838
  • fix norm not training in train_control_lora_flux.py by @Luo-Yihang in #11832
  • [From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807
  • [lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843
  • update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820
  • [Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823
  • [docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851
  • [benchmarks] overhaul benchmarks by @sayakpaul in #11565
  • FIX set_lora_device when target layers differ by @BenjaminBossan in #11844
  • Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856
  • [chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877
  • [docs] fix references in flux pipelines. by @sayakpaul in #11857
  • [tests] remove tests for deprecated pipelines. by @sayakpaul in #11879
  • [docs] LoRA metadata by @stevhliu in #11848
  • [training ] add Kontext i2i training by @sayakpaul in #11858
  • [CI] Fix big GPU test marker by @DN6 in #11786
  • First Block Cache by @a-r-r-o-w in #11180
  • [tests] annotate compilation test classes with bnb by @sayakpaul in #11715
  • Update chroma.md by @shm4r7 in #11891
  • [CI] Speed up GPU PR Tests by @DN6 in #11887
  • Pin k-diffusion for CI by @sayakpaul in #11894
  • [Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728
  • [tests] Remove more deprecated tests by @sayakpaul in #11895
  • [tests] mark the wanvace lora tester flaky by @sayakpaul in #11883
  • [tests] add compile + offload tests for GGUF. by @sayakpaul in #11740
  • feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880
  • Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767
  • [SD3] CFG Cutoff fix and official callback by @asomoza in #11890
  • The Modular Diffusers by @yiyixuxu in #9672
  • [quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876
  • Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429
  • [LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896
  • [utils] account for MPS when available in get_device(). by @sayakpaul in #11905
  • [ControlnetUnion] Multiple Fixes by @asomoza in #11888
  • Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)
  • [tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910
  • Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904
  • [docs] torch.compile blog post by @stevhliu in #11837
  • Flux: pass joint_attention_kwargs when using gradient_checkpointing by @piercus in #11814
  • Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909
  • Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924
  • [tests] Improve Flux tests by @a-r-r-o-w in #11919
  • Remove device synchronization when loading weights by @a-r-r-o-w in #11927
  • Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054
  • Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923
  • [ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929
  • enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908
  • [training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775
  • Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518
  • [refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916
  • [docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947
  • [Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940
  • [docs] Update toctree by @stevhliu in #11936
  • [docs] include bp link. by @sayakpaul in #11952
  • Fix kontext finetune issue when batch size >1 by @mymusise in #11921
  • [tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954
  • [tests] Add test slices for Cosmos by @a-r-r-o-w in #11955
  • [tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953
  • [Modular] update the collection behavior by @yiyixuxu in #11963
  • fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690
  • Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967
  • [Examples] Uniform notations in train_flux_lora by @tomguluson92 in #10011
  • fix style by @yiyixuxu in #11975
  • [tests] Add test slices for Wan by @a-r-r-o-w in #11920
  • [docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935
  • [tests] enforce torch version in the compilation tests. by @sayakpaul in #11979
  • [modular diffusers] Wan by @a-r-r-o-w in #11913
  • [compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987
  • enable quantcompile test on xpu by @yao-matrix in #11988
  • [WIP] Wan2.2 by @yiyixuxu in #12004
  • [refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968
  • [refactor] Wan single file implementation by @a-r-r-o-w in #11918
  • Fix huggingface-hub failing tests by @asomoza in #11994
  • feat: add flux kontext by @jlonge4 in #11985
  • [modular] add Modular flux for text-to-image by @sayakpaul in #11995
  • [docs] include lora fast post. by @sayakpaul in #11993
  • [docs] quant_kwargs by @stevhliu in #11712
  • [docs] Fix link by @stevhliu in #12018
  • [wan2.2] add 5b i2v by @yiyixuxu in #12006
  • wan2.2 i2v FirstBlockCache fix by @okaris in #12013
  • [core] support attention backends for LTX by @sayakpaul in #12021
  • [docs] Update index by @stevhliu in #12020
  • [Fix] huggingface-cli to hf missed files by @asomoza in #12008
  • [training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000
  • [wan2.2] fix vae patches by @yiyixuxu in #12041
  • Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015
  • [LoRA] support lightx2v lora in wan by @sayakpaul in #12040
  • Fix type of force_upcast to bool by @BerndDoser in #12046
  • Update autoencoder_kl_cosmos.py by @tanuj-rai in #12045
  • Qwen-Image by @naykun in #12055
  • [wan2.2] follow-up by @yiyixuxu in #12024
  • tests + minor refactor for QwenImage by @a-r-r-o-w in #12057
  • Cross attention module to Wan Attention by @samuelt0 in #12058
  • fix(qwen-image): update vae license by @naykun in #12063
  • CI fixing by @paulinebm in #12059
  • enable all gpus when running ci. by @sayakpaul in #12062
  • fix the rest for all GPUs in CI by @sayakpaul in #12064
  • [docs] Install by @stevhliu in #12026
  • [wip] feat: support lora in qwen image and training script by @sayakpaul in #12056
  • [docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068
  • [tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070
  • [tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069
  • optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072
  • Add cuda kernel support for GGUF inference by @Isotr0py in #11869
  • fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081
  • [refactor] condense group offloading by @a-r-r-o-w in #11990
  • Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077
  • Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048
  • Make prompt_2 optional in Flux Pipelines by @DN6 in #12073
  • [tests] tighten compilation tests for quantization by @sayakpaul in #12002
  • Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976
  • fix flux type hint by @DefTruth in #12089
  • [qwen] device typo by @yiyixuxu in #12099
  • [lora] adapt new LoRA config injection method by @sayakpaul in #11999
  • lora_conversion_utils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101
  • [tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103
  • [Modular] Fast Tests by @yiyixuxu in #11937
  • [GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684
  • [docs] diffusers gguf checkpoints by @sayakpaul in #12092
  • [core] add modular support for Flux I2I by @sayakpaul in #12086
  • [lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119
  • [Modular] More Updates for Custom Code Loading by @DN6 in #11969
  • enable compilation in qwen image. by @sayakpaul in #12061
  • [tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106
  • [chore] complete the licensing statement. by @sayakpaul in #12001
  • [docs] Cache link by @stevhliu in #12105
  • [Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127
  • Add low_cpu_mem_usage option to from_single_file to align with from_pretrained by @IrisRainbowNeko in #12114
  • [docs] Modular diffusers by @stevhliu in #11931
  • [Bugfix] typo fix in NPU FA by @leisuzz in #12129
  • Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117
  • [core] parallel loading of shards by @sayakpaul in #12028
  • try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032
  • [docs] Refresh effective and efficient doc by @stevhliu in #12134
  • Fix bf15/fp16 for pipeline_wan_vace.py by @SlimRG in #12143
  • make parallel loading flag a part of constants. by @sayakpaul in #12137
  • [docs] Parallel loading of shards by @stevhliu in #12135
  • feat: cuda device_map for pipelines. by @sayakpaul in #12122
  • [core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005
  • support hf_quantizer in cache warmup. by @sayakpaul in #12043
  • make test_gguf all pass on xpu by @yao-matrix in #12158
  • [docs] Quickstart by @stevhliu in #12128
  • Qwen Image Edit Support by @naykun in #12164
  • remove silu for CogView4 by @lambertwjh in #12150
  • [qwen] Qwen image edit followups by @sayakpaul in #12166
  • Minor modification to support DC-AE-turbo by @chenjy2003 in #12169
  • [Docs] typo error in qwen image by @leisuzz in #12144
  • fix: caching allocator behaviour for quantization. by @sayakpaul in #12172
  • fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178
  • [docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181
  • [LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170
  • Update README.md by @Taechai in #12182
  • [chore] add lora button to qwenimage docs by @sayakpaul in #12183
  • [Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074
  • Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @vuongminh1907
    • update: FluxKontextInpaintPipeline support (#11820)
  • @Net-Mist
    • feat: add multiple input image support in Flux Kontext (#11880)
  • @tolgacangoz
    • Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
  • @naykun
    • Qwen-Image (#12055)
    • fix(qwen-image): update vae license (#12063)
    • Qwen Image Edit Support (#12164)
  • @Trgtuan10
    • Add QwenImage Inpainting and Img2Img pipeline (#12117)
  • @SamYuan1990
    • try to use deepseek with an agent to auto i18n to zh (#12032)
Jun 24, 2025
Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:

  • Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
  • Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
  • Inpainting and Outpainting
  • Subject to Video (faces, object, characters, etc.)
  • Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)

The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.

Check out the docs to learn more.

Cosmos Predict2 Video2World

Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.

The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.

LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.

Check out the docs to learn more.

Hunyuan Video Framepack and F1

Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.

FusionX

The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():

from diffusers import WanTransformer3DModel

transformer = WanTransformer3DModel.from_single_file(
    "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors",
    torch_dtype=torch.bfloat16
)

To load the LoRAs, use load_lora_weights():

pipe = DiffusionPipeline.from_pretrained(
    "Wan-AI/Wan2.1-T2V-14B-Diffusers",
    torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors"
)

AccVideo and CausVid (only LoRAs)

AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.

Chroma

Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more

Thanks to @Ednaordinary for contributing it in this PR!

VisualCloze

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:

  1. Support for various in-domain tasks
  2. Generalization to unseen tasks through in-context learning
  3. Unify multiple tasks into one step and generate both target image and intermediate results
  4. Support reverse-engineering conditions from target images

Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!

Better torch.compile support

We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:

Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:

<details> <summary>Code</summary>
import torch
from diffusers import DiffusionPipeline
torch._dynamo.config.cache_size_limit = 10000

pipeline = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
)
pipline.enable_model_cpu_offload()
# Compile.
pipeline.transformer.compile()

image = pipeline(
    prompt="An astronaut riding a horse on Mars",
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
</details>

This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:

You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:

<details> <summary>Code</summary>
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import AutoModel, FluxPipeline
from transformers import T5EncoderModel

import torch
torch._dynamo.config.recompile_limit = 1000 

quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"}
text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs)
dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs)

ckpt_id = "black-forest-labs/FLUX.1-dev"
text_encoder_2 = T5EncoderModel.from_pretrained(
    ckpt_id,
    subfolder="text_encoder_2",
    quantization_config=text_encoder_2_quant_config,
    torch_dtype=torch_dtype,
)
transformer = AutoModel.from_pretrained(
    ckpt_id,
    subfolder="transformer",
    quantization_config=dit_quant_config,
    torch_dtype=torch_dtype,
)
pipe = FluxPipeline.from_pretrained(
    ckpt_id,
    transformer=transformer,
    text_encoder_2=text_encoder_2,
    torch_dtype=torch_dtype,
)
pipe.enable_model_cpu_offload()
pipe.transformer.compile()

image = pipeline(
    prompt="An astronaut riding a horse on Mars",
    guidance_scale=3.5,
    height=768,
    width=1360,
    num_inference_steps=28,
    max_sequence_length=512,
).images[0]
</details>

Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.

Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.

Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.

PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:

import torch
from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
     quant_backend="bitsandbytes_4bit",
     quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
     components_to_quantize=["transformer", "text_encoder_2"],
)
pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe("photo of a cute dog").images[0]

This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.

Group offloading with disk

In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.

Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.

pipeline.transformer.enable_group_offload(
    onload_device="cuda", 
    offload_device="cpu", 
    offload_type="leaf_level", 
    offload_to_disk_path="path/to/disk"
)

Refer to these two tables to compare the speed and memory trade-offs.

LoRA metadata parsing

It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.

New training scripts

  • We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
  • HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.

Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:

All commits

  • [LoRA] support musubi wan loras. by @sayakpaul in #11243
  • fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
  • make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
  • make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
  • add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
  • make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
  • [BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
  • Fix LTX 0.9.5 single file by @hlky in #11271
  • [Tests] Cleanup lora tests utils by @sayakpaul in #11276
  • [CI] relax tolerance for unclip further by @sayakpaul in #11268
  • do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273
  • Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi by @kuantuna in #11294
  • HiDream Image by @hlky in #11231
  • flow matching lcm scheduler by @quickjkee in #11170
  • Update autoencoderkl_allegro.md by @Forbu in #11303
  • Hidream refactoring follow ups by @a-r-r-o-w in #11299
  • Fix incorrect tile_latent_min_width calculations by @kuantuna in #11305
  • [ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
  • make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU by @yao-matrix in #11308
  • make test_stable_diffusion_karras_sigmas pass on XPU by @yao-matrix in #11310
  • make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313
  • [LoRA] support more SDXL loras. by @sayakpaul in #11292
  • [HiDream] code example by @linoytsaban in #11317
  • import for FlowMatchLCMScheduler by @asomoza in #11318
  • Use float32 on mps or npu in transformer_hidream_image's rope by @hlky in #11316
  • Add skrample section to community_projects.md by @Beinsezii in #11319
  • [docs] Promote AutoModel usage by @sayakpaul in #11300
  • [LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
  • Fix vae.Decoder prev_output_channel by @hlky in #11280
  • fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
  • [docs] fix hidream docstrings. by @sayakpaul in #11325
  • Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible by @AstraliteHeart in #11297
  • post release 0.33.0 by @sayakpaul in #11255
  • another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
  • Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293
  • unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
  • [single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
  • [docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
  • Hunyuan I2V fast tests fix by @DN6 in #11341
  • [BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
  • Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
  • [chore] fix lora docs utils by @sayakpaul in #11338
  • [docs] add note about use_duck_shape in auraflow docs. by @sayakpaul in #11348
  • [LoRA] Propagate hotswap better by @sayakpaul in #11333
  • [Hi Dream] follow-up by @yiyixuxu in #11296
  • [bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
  • Update controlnet_flux.py by @haofanwang in #11350
  • enable 2 test cases on XPU by @yao-matrix in #11332
  • [BNB] Fix test_moving_to_cpu_throws_warning by @SunMarc in #11356
  • support Wan-FLF2V by @yiyixuxu in #11353
  • Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357
  • update output for Hidream transformer by @yiyixuxu in #11366
  • [Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
  • [Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
  • [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11240
  • fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
  • Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
  • [BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
  • Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
  • [cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
  • Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
  • [Refactor] Minor Improvement for import utils by @ishan-modi in #11161
  • Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
  • [LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
  • Update modeling imports by @a-r-r-o-w in #11129
  • [HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
  • Update README_hidream.md by @AMEERAZAM08 in #11386
  • Fix group offloading with block_level and use_stream=True by @a-r-r-o-w in #11375
  • [train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
  • [Feature] Added Xlab Controlnet support by @ishan-modi in #11249
  • Kolors additional pipelines, community contrib by @Teriks in #11372
  • [HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
  • Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
  • [BUG] fixed WAN docstring by @ishan-modi in #11226
  • Fix typos in strings and comments by @co63oc in #11407
  • [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
  • [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
  • enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
  • enable test_layerwise_casting_memory cases on XPU by @yao-matrix in #11406
  • [tests] fix import. by @sayakpaul in #11434
  • [train_text_to_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
  • [train_text_to_image_lora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
  • enable 28 GGUF test cases on XPU by @yao-matrix in #11404
  • [Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
  • Fixing missing provider options argument by @urpetkov-amd in #11397
  • Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
  • Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
  • enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
  • torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457
  • enable consistency test cases on XPU, all passed by @yao-matrix in #11446
  • enable unidiffuser test cases on xpu by @yao-matrix in #11444
  • Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
  • Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
  • make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
  • [test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models by @tongyu0924 in #11431
  • Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
  • make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu by @yao-matrix in #11461
  • [WAN] fix recompilation issues by @sayakpaul in #11475
  • Fix typos in docs and comments by @co63oc in #11416
  • [tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
  • cache packages_distributions by @vladmandic in #11453
  • [docs] Memory optims by @stevhliu in #11385
  • [docs] Adapters by @stevhliu in #11331
  • [train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
  • [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
  • enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
  • [Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
  • [train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
  • [train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
  • [train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
  • [training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
  • Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
  • Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
  • Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
  • Update setup.py to pin min version of peft by @sayakpaul in #11502
  • update dep table. by @sayakpaul in #11504
  • [LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493
  • Hunyuan Video Framepack by @a-r-r-o-w in #11428
  • enable lora cases on XPU by @yao-matrix in #11506
  • [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
  • [docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
  • Cosmos by @a-r-r-o-w in #10660
  • clean up the Init for stable_diffusion by @yiyixuxu in #11500
  • fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
  • Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
  • [LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
  • Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
  • Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
  • [tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
  • feat: pipeline-level quantization config by @sayakpaul in #11130
  • [Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322
  • [LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
  • enable 7 cases on XPU by @yao-matrix in #11503
  • [LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
  • enable dit integration cases on xpu by @yao-matrix in #11523
  • enable print_env on xpu by @yao-matrix in #11507
  • Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
  • [tests] add tests for framepack transformer model. by @sayakpaul in #11520
  • Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
  • enable several pipeline integration tests on XPU by @yao-matrix in #11526
  • [test_models_transformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
  • Add VisualCloze by @lzyhha in #11377
  • Fix typo in train_diffusion_orpo_sdxl_lora_wds.py by @Meeex2 in #11541
  • fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513
  • [train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11239
  • [LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
  • LTX Video 0.9.7 by @a-r-r-o-w in #11516
  • [tests] Enable testing for HiDream transformer by @sayakpaul in #11478
  • Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. by @Meatfucker in #11545
  • Fix deprecation warnings in test_ltx_image2video.py by @AChowdhury1211 in #11538
  • [tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
  • [Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
  • [gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
  • [tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
  • [docs] Regional compilation docs by @sayakpaul in #11556
  • enhance value guard of _device_agnostic_dispatch by @yao-matrix in #11553
  • Doc update by @Player256 in #11531
  • Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
  • [docs] tip for group offloding + quantization by @sayakpaul in #11576
  • [LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
  • [WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
  • [Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
  • Use HF Papers by @qgallouedec in #11567
  • LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
  • [LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
  • docs: fix invalid links by @osrm in #11505
  • [docs] Remove fast diffusion tutorial by @stevhliu in #11583
  • RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
  • [chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
  • Type annotation fix by @DN6 in #11597
  • [LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595
  • Update Intel Gaudi doc by @regisss in #11479
  • enable pipeline test cases on xpu by @yao-matrix in #11527
  • [Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
  • [docs] Pipeline-level quantization by @stevhliu in #11604
  • Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587
  • [tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508
  • Fix mixed variant downloading by @DN6 in #11611
  • fix security issue in build docker ci by @sayakpaul in #11614
  • Make group offloading compatible with torch.compile() by @sayakpaul in #11605
  • [training docs] smol update to README files by @linoytsaban in #11616
  • Adding NPU for get device function by @leisuzz in #11617
  • [LoRA] improve LoRA fusion tests by @sayakpaul in #11274
  • [Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
  • [CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
  • [docs] PyTorch 2.0 by @stevhliu in #11618
  • [textual_inversion_sdxl.py] fix lr scheduler steps count by @yuanjua in #11557
  • Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
  • removing unnecessary else statement by @YanivDorGalron in #11624
  • enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
  • Bug: Fixed Image 2 Image example by @vltmedia in #11619
  • typo fix in pipeline_flux.py by @YanivDorGalron in #11623
  • Fix typos in strings and comments by @co63oc in #11476
  • [docs] update torchao doc link by @sayakpaul in #11634
  • Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
  • [chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
  • [tests] chore: rename lora model-level tests. by @sayakpaul in #11481
  • [docs] Caching methods by @stevhliu in #11625
  • [docs] Model cards by @stevhliu in #11112
  • [CI] Some improvements to Nightly reports summaries by @DN6 in #11166
  • [chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
  • [examples] flux-control: use num_training_steps_for_scheduler by @Markus-Pobitzer in #11662
  • use deterministic to get stable result by @jiqing-feng in #11663
  • [tests] add test for torch.compile + group offloading by @sayakpaul in #11670
  • Wan VACE by @a-r-r-o-w in #11582
  • fixed axes_dims_rope init (huggingface#11641) by @sofinvalery in #11678
  • [tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
  • Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
  • Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
  • Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area by @Meatfucker in #11658
  • Allow remote code repo names to contain "." by @akasharidas in #11652
  • [LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
  • [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639
  • enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
  • [tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
  • [tests] model-level device_map clarifications by @sayakpaul in #11681
  • Improve Wan docstrings by @a-r-r-o-w in #11689
  • Set _torch_version to N/A if torch is disabled. by @rasmi in #11645
  • Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
  • Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
  • [docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
  • swap out token for style bot. by @sayakpaul in #11701
  • [docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
  • Support Wan AccVideo lora by @a-r-r-o-w in #11704
  • [LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
  • Cosmos Predict2 by @a-r-r-o-w in #11695
  • Chroma Pipeline by @Ednaordinary in #11698
  • [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
  • [training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
  • Fix misleading comment by @carlthome in #11722
  • Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
  • Support more Wan loras (VACE) by @a-r-r-o-w in #11726
  • [LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
  • ⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665
  • [training] add ds support to lora hidream by @leisuzz in #11737
  • [tests] device_map tests for all models. by @sayakpaul in #11708
  • [chore] change to 2025 licensing for remaining by @sayakpaul in #11741
  • Chroma Follow Up by @DN6 in #11725
  • [Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736
  • Update more licenses to 2025 by @a-r-r-o-w in #11746
  • Add missing HiDream license by @a-r-r-o-w in #11747
  • Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
  • [LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
  • [CI] Fix WAN VACE tests by @DN6 in #11757
  • [CI] Fix SANA tests by @DN6 in #11756
  • Fix HiDream pipeline test module by @DN6 in #11754
  • make group offloading work with disk/nvme transfers by @sayakpaul in #11682
  • Update Chroma Docs by @DN6 in #11753
  • fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750
  • Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
  • [docs] Quantization + torch.compile + offloading by @stevhliu in #11703
  • [docs] device_map by @stevhliu in #11711
  • [docs] LoRA scale scheduling by @stevhliu in #11727
  • Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717
  • enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
  • enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
  • [tests] properly skip tests instead of return by @sayakpaul in #11771
  • [CI] Skip ONNX Upscale tests by @DN6 in #11774
  • [Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
  • Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py by @imbr92 in #11744
  • [docs] minor cleanups in the lora docs. by @sayakpaul in #11770
  • [lora] only remove hooks that we add back by @yiyixuxu in #11768
  • [tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
  • [chore] raise as early as possible in group offloading by @sayakpaul in #11792
  • [tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
  • guard omnigen processor. by @sayakpaul in #11799
  • Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @yao-matrix
    • fix test_vanilla_funetuning failure on XPU and A100 (#11263)
    • make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
    • make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
    • make test_instant_style_multiple_masks pass on XPU (#11266)
    • make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
    • make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
    • fix CPU offloading related fail cases on XPU (#11288)
    • enable 2 test cases on XPU (#11332)
    • enable group_offload cases and quanto cases on XPU (#11405)
    • enable test_layerwise_casting_memory cases on XPU (#11406)
    • enable 28 GGUF test cases on XPU (#11404)
    • enable marigold_intrinsics cases on XPU (#11445)
    • enable consistency test cases on XPU, all passed (#11446)
    • enable unidiffuser test cases on xpu (#11444)
    • make safe diffusion test cases pass on XPU and A100 (#11458)
    • make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)
    • enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
    • enable lora cases on XPU (#11506)
    • enable 7 cases on XPU (#11503)
    • enable dit integration cases on xpu (#11523)
    • enable print_env on xpu (#11507)
    • enable several pipeline integration tests on XPU (#11526)
    • enhance value guard of _device_agnostic_dispatch (#11553)
    • enable pipeline test cases on xpu (#11527)
    • enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
    • enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
    • enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
  • @hlky
    • Fix LTX 0.9.5 single file (#11271)
    • HiDream Image (#11231)
    • Use float32 on mps or npu in transformer_hidream_image's rope (#11316)
    • Fix vae.Decoder prev_output_channel (#11280)
  • @quickjkee
    • flow matching lcm scheduler (#11170)
  • @ishan-modi
    • [ControlNet] Adds controlnet for SanaTransformer (#11040)
    • [BUG] fixed _toctree.yml alphabetical ordering (#11277)
    • [BUG] fixes in kadinsky pipeline (#11080)
    • [Refactor] Minor Improvement for import utils (#11161)
    • [Feature] Added Xlab Controlnet support (#11249)
    • [BUG] fixed WAN docstring (#11226)
    • [Feature] AutoModel can load components using model_index.json (#11401)
  • @linoytsaban
    • [HiDream] code example (#11317)
    • [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
    • [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
    • [HiDream LoRA] optimizations + small updates (#11381)
    • [Hi-Dream LoRA] fix bug in validation (#11439)
    • [LoRA] make lora alpha and dropout configurable (#11467)
    • [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
    • [LoRA] support non-diffusers LTX-Video loras (#11572)
    • [LoRA] kijai wan lora support for I2V (#11588)
    • [training docs] smol update to README files (#11616)
    • [Sana Sprint] add image-to-image pipeline (#11602)
    • [LoRA training] update metadata use for lora alpha + README (#11723)
  • @hameerabbasi
    • [LoRA] Add LoRA support to AuraFlow (#10216)
  • @DN6
    • Fix Hunyuan I2V for transformers>4.47.1 (#11293)
    • Hunyuan I2V fast tests fix (#11341)
    • [Single File] GGUF/Single File Support for HiDream (#11550)
    • [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
    • Type annotation fix (#11597)
    • Fix mixed variant downloading (#11611)
    • [CI] Some improvements to Nightly reports summaries (#11166)
    • Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
    • Chroma Follow Up (#11725)
    • [CI] Fix WAN VACE tests (#11757)
    • [CI] Fix SANA tests (#11756)
    • Fix HiDream pipeline test module (#11754)
    • Update Chroma Docs (#11753)
    • Fix failing cpu offload test for LTX Latent Upscale (#11755)
    • [CI] Skip ONNX Upscale tests (#11774)
  • @yiyixuxu
    • [Hi Dream] follow-up (#11296)
    • support Wan-FLF2V (#11353)
    • update output for Hidream transformer (#11366)
    • [Wan2.1-FLF2V] update conversion script (#11365)
    • [HiDream] move deprecation to 0.35.0 (#11384)
    • clean up the Init for stable_diffusion (#11500)
    • [lora] only remove hooks that we add back (#11768)
  • @Teriks
    • Kolors additional pipelines, community contrib (#11372)
  • @co63oc
    • Fix typos in strings and comments (#11407)
    • Fix typos in docs and comments (#11416)
    • Fix typos in strings and comments (#11476)
  • @xduzhangjiayu
    • Add StableDiffusion3InstructPix2PixPipeline (#11378)
  • @scxue
    • Add cross attention type for Sana-Sprint training in diffusers. (#11514)
  • @lzyhha
    • Add VisualCloze (#11377)
  • @b-sai
    • RegionalPrompting: Inherit from Stable Diffusion (#11525)
  • @Ednaordinary
    • Chroma Pipeline (#11698)
Apr 10, 2025
v0.33.1: fix ftfy import

All commits

  • fix ftfy import for wan pipelines by @yiyixuxu in #11262
Apr 9, 2025
Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more

New Pipelines for Video Generation

Wan 2.1

Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.

  • Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  • Wan-AI/Wan2.1-T2V-14B-Diffusers
  • Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
  • Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

Check out the docs here to learn more.

LTX Video 0.9.5

LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).

To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline and LTXVideoCondition object.

To learn more about the usage, check out the docs here.

Hunyuan Image to Video

Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.

To learn more, check out the docs here.

Others

New Pipelines for Image Generation

Sana-Sprint

SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.

Shoutout to @lawrence-cj for their help and guidance on this PR.

Check out the pipeline docs of SANA-Sprint to learn more.

Lumina2

Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.

Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.

One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.

Omnigen

OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.

Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.

Others

  • CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)

New Memory Optimizations

Layerwise Casting

PyTorch supports torch.float8_e4m3fn and torch.float8_e5m2 as weight storage dtypes, but they can’t be used for computation on many devices due to unimplemented kernel support.

However, you can still use these dtypes to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16 or torch.bfloat16 on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.  

<details> <summary>Code</summary>
import torch
from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel
from diffusers.utils import export_to_video

model_id = "THUDM/CogVideoX-5b"

# Load the model in bfloat16 and enable layerwise casting
transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16)

# Load the pipeline
pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
export_to_video(video, "output.mp4", fps=8)
</details>

Group Offloading

Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList or torch.nn.Sequential), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.

On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.

One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.

You can also use record_stream=True when using use_stream=True to obtain more speedups at the expense of slightly increased memory usage.

<details> <summary>Code</summary>
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

# Load the pipeline
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)

# We can utilize the enable_group_offload method for Diffusers model implementations
pipe.transformer.enable_group_offload(
	onload_device=onload_device, 
	offload_device=offload_device, 
	offload_type="leaf_level", 
	use_stream=True
)

prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
# This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline.
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
export_to_video(video, "output.mp4", fps=8)
</details>

Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers library.

<details> <summary>Code</summary>
import torch
from diffusers import CogVideoXPipeline
from diffusers.hooks import apply_group_offloading
from diffusers.utils import export_to_video

# Load the pipeline
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)

# For any other model implementations, the apply_group_offloading function can be used
apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
</details>

Remote Components

Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:

ModelEndpointModel
Stable Diffusion v1https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloudstabilityai/sd-vae-ft-mse
Stable Diffusion XLhttps://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloudmadebyollin/sdxl-vae-fp16-fix
Fluxhttps://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloudblack-forest-labs/FLUX.1-schnell
HunyuanVideohttps://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloudhunyuanvideo-community/HunyuanVideo

This is an example of using remote decoding with the Hunyuan Video pipeline:

<details> <summary>Code</summary>
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(
    model_id, transformer=transformer, vae=None, torch_dtype=torch.float16
).to("cuda")

latent = pipe(
    prompt="A cat walks on the grass, realistic",
    height=320,
    width=512,
    num_frames=61,
    num_inference_steps=30,
    output_type="latent",
).frames

video = remote_decode(
    endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/",
    tensor=latent,
    output_type="mp4",
)

if isinstance(video, bytes):
    with open("video.mp4", "wb") as f:
        f.write(video) 
</details>

Check out the docs to know more.

Introducing Cached Inference for DiTs

Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.

Check out the docs to learn more about the available caching methods.

Pyramind Attention Broadcast

<details> <summary>Code</summary>
import torch
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig

pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
pipe.to("cuda")

config = PyramidAttentionBroadcastConfig(
    spatial_attention_block_skip_range=2,
    spatial_attention_timestep_skip_range=(100, 800),
    current_timestep_callback=lambda: pipe.current_timestep,
)
pipe.transformer.enable_cache(config)
</details>

FasterCache

<details> <summary>Code</summary>
import torch
from diffusers import CogVideoXPipeline, FasterCacheConfig

pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
pipe.to("cuda")

config = FasterCacheConfig(
        spatial_attention_block_skip_range=2,
        spatial_attention_timestep_skip_range=(-1, 901),
        unconditional_batch_skip_range=2,
        attention_weight_callback=lambda _: 0.5,
        is_guidance_distilled=True,
)
pipe.transformer.enable_cache(config)
</details>

Quantization

Quanto Backend

Diffusers now has support for the Quanto quantization backend, which provides float8 , int8 , int4 and int2 quantization dtypes.

import torch
from diffusers import FluxTransformer2DModel, QuantoConfig

model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
      model_id,
      subfolder="transformer",
      quantization_config=quantization_config,
      torch_dtype=torch.bfloat16,
)

Quanto int8 models are also compatible with torch.compile :

<details> <summary>Code</summary>
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig

model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
      model_id,
      subfolder="transformer",
      quantization_config=quantization_config,
      torch_dtype=torch.bfloat16,
)
transformer.compile()
</details>

Improved loading for uintx TorchAO checkpoints with torch>=2.6

TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx format, such as uint4wo this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.

Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.

- state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
- with init_empty_weights():
-     transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
- transformer.load_state_dict(state_dict, strict=True, assign=True)
+ transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")  

LoRAs

We have shipped a couple of improvements on the LoRA front in this release.

🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux

Take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft installation to the latest version - pip install -U peft especially when dealing with Flux LoRAs.

torch.compile() support when hotswapping LoRAs without triggering recompilation

A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling load_lora_weights(), set_adapters(), and possibly delete_adapters() to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.

To better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.

Check out the docs to learn more about this feature.

The other major change is the support for

  • Loading LoRAs into quantized model checkpoints

dtype Maps for Pipelines

Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:

from diffusers import HunyuanVideoPipeline
import torch

pipe = HunyuanVideoPipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanVideo",
    torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
print(pipe.transformer.dtype, pipe.vae.dtype)  # (torch.bfloat16, torch.float16)

AutoModel

This release includes an AutoModel object similar to the one found in transformers that automatically fetches the appropriate model class for the provided repo.

from diffusers import AutoModel

unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")

All commits

  • [Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
  • IP-Adapter for StableDiffusion3Img2ImgPipeline by @guiyrt in #10589
  • [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595
  • Move buffers to device by @hlky in #10523
  • [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597
  • Scheduling fixes on MPS by @hlky in #10549
  • [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
  • NPU adaption for RMSNorm by @leisuzz in #10534
  • implementing flux on TPUs with ptxla by @entrpn in #10515
  • [core] ConsisID by @SHYuanBest in #10140
  • [training] set rest of the blocks with requires_grad False. by @sayakpaul in #10607
  • chore: remove redundant words by @sunxunle in #10609
  • bugfix for npu not support float64 by @baymax591 in #10123
  • [chore] change licensing to 2025 from 2024. by @sayakpaul in #10615
  • Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602
  • Remove the FP32 Wrapper when evaluating by @lmxyy in #10617
  • [tests] make tests device-agnostic (part 3) by @faaany in #10437
  • fix offload gpu tests etc by @yiyixuxu in #10366
  • Remove cache migration script by @Wauplin in #10619
  • [core] Layerwise Upcasting by @a-r-r-o-w in #10347
  • Improve TorchAO error message by @a-r-r-o-w in #10627
  • [CI] Update HF_TOKEN in all workflows by @DN6 in #10613
  • add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624
  • [Tests] modify the test slices for the failing flax test by @sayakpaul in #10630
  • [docs] fix image path in para attention docs by @sayakpaul in #10632
  • [docs] uv installation by @stevhliu in #10622
  • width and height are mixed-up by @raulc0399 in #10629
  • Add IP-Adapter example to Flux docs by @hlky in #10633
  • removing redundant requires_grad = False by @YanivDorGalron in #10628
  • [chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631
  • Add pipeline_stable_diffusion_xl_attentive_eraser by @Anonym0u3 in #10579
  • NPU Adaption for Sanna by @leisuzz in #10409
  • Add sigmoid scheduler in scheduling_ddpm.py docs by @JacobHelwig in #10648
  • create a script to train autoencoderkl by @lavinal712 in #10605
  • Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610
  • ControlNet Union controlnet_conditioning_scale for multiple control inputs by @hlky in #10666
  • [training] Convert to ImageFolder script by @hlky in #10664
  • Add provider_options to OnnxRuntimeModel by @hlky in #10661
  • fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651
  • SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663
  • Revert RePaint scheduler 'fix' by @GiusCat in #10644
  • [core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562
  • [fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600
  • Refactor gradient checkpointing by @a-r-r-o-w in #10611
  • [Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory by @sayakpaul in #10669
  • Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670
  • [tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681
  • support StableDiffusionAdapterPipeline.from_single_file by @Teriks in #10552
  • fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684
  • [FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678
  • Fix enable memory efficient attention on ROCm by @tenpercent in #10564
  • Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698
  • feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling by @badayvedat in #10699
  • Fixed grammar in "write_own_pipeline" readme by @N0-Flux-given in #10706
  • Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704
  • [bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401
  • Fix train_text_to_image.py --help by @nkthiebaut in #10711
  • Notebooks for Community Scripts-6 by @ParagEkbote in #10713
  • [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714
  • add provider_options in from_pretrained by @xieofxie in #10719
  • [Community] Enhanced Model Search by @suzukimain in #10417
  • [bugfix] NPU Adaption for Sana by @leisuzz in #10724
  • Quantized Flux with IP-Adapter by @hlky in #10728
  • EDMEulerScheduler accept sigmas, add final_sigmas_type by @hlky in #10734
  • [LoRA] fix peft state dict parsing by @sayakpaul in #10532
  • Add Self type hint to ModelMixin's from_pretrained by @hlky in #10742
  • [Tests] Test layerwise casting with training by @sayakpaul in #10765
  • speedup hunyuan encoder causal mask generation by @dabeschte in #10764
  • [CI] Fix Truffle Hog failure by @DN6 in #10769
  • Add OmniGen by @staoxiao in #10148
  • feat: new community mixture_tiling_sdxl pipeline for SDXL by @elismasilva in #10759
  • Add support for lumina2 by @zhuole1025 in #10642
  • Refactor OmniGen by @a-r-r-o-w in #10771
  • Faster set_adapters by @Luvata in #10777
  • [Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781
  • Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler by @hlky in #10740
  • MultiControlNetUnionModel on SDXL by @guiyrt in #10747
  • fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774
  • make tensors contiguous before passing to safetensors by @faaany in #10761
  • Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685
  • Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788
  • Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789
  • Module Group Offloading by @a-r-r-o-w in #10503
  • Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792
  • [FIX] check_inputs function in lumina2 by @SahilCarterr in #10784
  • follow-up refactor on lumina2 by @yiyixuxu in #10776
  • CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649
  • typo fix by @YanivDorGalron in #10802
  • Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746
  • [chore] update notes generation spaces by @sayakpaul in #10592
  • [LoRA] improve lora support for flux. by @sayakpaul in #10810
  • Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807
  • [docs] add missing entries to the lora docs. by @sayakpaul in #10819
  • DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint by @hlky in #10811
  • [LoRA] make set_adapters() robust on silent failures. by @sayakpaul in #9618
  • [FEAT] Model loading refactor by @SunMarc in #10604
  • [misc] feat: introduce a style bot. by @sayakpaul in #10274
  • Remove print statements by @a-r-r-o-w in #10836
  • [tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828
  • [LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818
  • [Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763
  • Add missing isinstance for arg checks in GGUFParameter by @AstraliteHeart in #10834
  • [tests] test encode_prompt() in isolation by @sayakpaul in #10438
  • store activation cls instead of function by @SunMarc in #10832
  • fix: support transformer models' generation_config in pipeline by @JeffersonQin in #10779
  • Notebooks for Community Scripts-7 by @ParagEkbote in #10846
  • [CI] install accelerate transformers from main by @sayakpaul in #10289
  • [CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310
  • SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718
  • Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835
  • SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837
  • fix: run tests from a pr workflow. by @sayakpaul in #9696
  • [chore] template for remote vae. by @sayakpaul in #10849
  • fix remote vae template by @sayakpaul in #10852
  • [CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854
  • [CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856
  • device_map in load_model_dict_into_meta by @hlky in #10851
  • [Fix] Docs overview.md by @SahilCarterr in #10858
  • remove format check for safetensors file by @SunMarc in #10864
  • [docs] LoRA support by @stevhliu in #10844
  • Comprehensive type checking for from_pretrained kwargs by @guiyrt in #10758
  • Fix torch_dtype in Kolors text encoder with transformers v4.49 by @hlky in #10816
  • [LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808
  • Add SD3 ControlNet to AutoPipeline by @hlky in #10888
  • [docs] Update prompt weighting docs by @stevhliu in #10843
  • [docs] Flux group offload by @stevhliu in #10847
  • [Fix] fp16 unscaling in train_dreambooth_lora_sdxl by @SahilCarterr in #10889
  • [docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885
  • [chore] correct qk norm list. by @sayakpaul in #10876
  • [Docs] Fix toctree sorting by @DN6 in #10894
  • [refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882
  • [refactor] Remove additional Flux code by @a-r-r-o-w in #10881
  • [CI] Improvements to conditional GPU PR tests by @DN6 in #10859
  • Multi IP-Adapter for Flux pipelines by @guiyrt in #10867
  • Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880
  • Security fix by @ydshieh in #10905
  • Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884
  • [Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911
  • Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907
  • [CI] Fix Fast GPU tests on PR by @DN6 in #10912
  • [CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915
  • Experimental per control type scale for ControlNet Union by @hlky in #10723
  • [style bot] improve security for the stylebot. by @sayakpaul in #10908
  • [CI] Update Stylebot Permissions by @DN6 in #10931
  • [Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922
  • Support IPAdapter for more Flux pipelines by @hlky in #10708
  • Add remote_decode to remote_utils by @hlky in #10898
  • Update VAE Decode endpoints by @hlky in #10939
  • [chore] fix-copies to flux pipelines by @sayakpaul in #10941
  • [Tests] Remove more encode prompts tests by @sayakpaul in #10942
  • Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626
  • Fix SD2.X clip single file load projection_dim by @Teriks in #10770
  • add from_single_file to animatediff by @<NOT FOUND> in #10924
  • Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934
  • Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944
  • Fix redundant prev_output_channel assignment in UNet2DModel by @ahmedbelgacem in #10945
  • Improve load_ip_adapter RAM Usage by @CyberVy in #10948
  • [tests] make tests device-agnostic (part 4) by @faaany in #10508
  • Update evaluation.md by @sayakpaul in #10938
  • [LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909
  • [Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927
  • Fix the missing parentheses when calling is_torchao_available in quantization_config.py. by @CyberVy in #10961
  • [LoRA] Support Wan by @a-r-r-o-w in #10943
  • Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964
  • feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951
  • [Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957
  • update check_input for cogview4 by @yiyixuxu in #10966
  • Add VAE Decode endpoint slow test by @hlky in #10946
  • [flux lora training] fix t5 training bug by @linoytsaban in #10845
  • use style bot GH Action from huggingface_hub by @hanouticelina in #10970
  • [train_dreambooth_lora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @flyxiv in #10973
  • [tests] fix tests for save load components by @sayakpaul in #10977
  • Fix loading OneTrainer Flux LoRA by @hlky in #10978
  • fix default values of Flux guidance_scale in docstrings by @catwell in #10982
  • [CI] remove synchornized. by @sayakpaul in #10980
  • Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984
  • Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements by @CyberVy in #10974
  • [Single File] Add user agent to SF download requests. by @DN6 in #10979
  • Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956
  • fix wan i2v pipeline bugs by @yupeng1111 in #10975
  • Hunyuan I2V by @a-r-r-o-w in #10983
  • Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959
  • Wan VAE move scaling to pipeline by @hlky in #10998
  • [LoRA] remove full key prefix from peft. by @sayakpaul in #11004
  • [Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991
  • Add STG to community pipelines by @kinam0252 in #10960
  • [LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995
  • Fix for fetching variants only by @DN6 in #10646
  • [Quantization] Add Quanto backend by @DN6 in #10756
  • [Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947
  • [LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187
  • [LoRA] CogView4 by @a-r-r-o-w in #10981
  • [Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021
  • [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998
  • [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018
  • fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012
  • [LoRA] support wan i2v loras from the world. by @sayakpaul in #11025
  • Fix SD3 IPAdapter feature extractor by @hlky in #11027
  • chore: fix help messages in advanced diffusion examples by @wonderfan in #10923
  • Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011
  • Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997
  • [Refactor] Clean up import utils boilerplate by @DN6 in #11026
  • Use output_size in repeat_interleave by @hlky in #11030
  • [hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017
  • Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007
  • [LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044
  • Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827
  • making formatted_images initialization compact by @YanivDorGalron in #10801
  • Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed by @ZhengKai91 in #10820
  • [Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052
  • [LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059
  • [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch by @andjoer in #11051
  • reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065
  • Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696
  • [Tests] add requires peft decorator. by @sayakpaul in #11037
  • CogView4 Control Block by @zRzRzRzRzRzRzR in #10809
  • [CI] pin transformers version for benchmarking. by @sayakpaul in #11067
  • Fix Wan I2V Quality by @chengzeyi in #11087
  • LTX 0.9.5 by @a-r-r-o-w in #10968
  • make PR GPU tests conditioned on styling. by @sayakpaul in #11099
  • Group offloading improvements by @a-r-r-o-w in #11094
  • Fix pipeline_flux_controlnet.py by @co63oc in #11095
  • update readme instructions. by @entrpn in #11096
  • Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098
  • Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097
  • Quality options in export_to_video by @hlky in #11090
  • [CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102
  • [BUG] Fix Autoencoderkl train script by @lavinal712 in #11113
  • [Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107
  • [tests] enable bnb tests on xpu by @faaany in #11001
  • [fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079
  • Flux with Remote Encode by @hlky in #11091
  • [tests] make cuda only tests device-agnostic by @faaany in #11058
  • Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106
  • remove F.rms_norm for now by @yiyixuxu in #11126
  • Notebooks for Community Scripts-8 by @ParagEkbote in #11128
  • fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073
  • [core] FasterCache by @a-r-r-o-w in #10163
  • add sana-sprint by @yiyixuxu in #11074
  • Don't override torch_dtype and don't use when quantization_config is set by @hlky in #11039
  • Update README and example code for AnyText usage by @tolgacangoz in #11028
  • Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125
  • [fix SANA-Sprint] by @lawrence-cj in #11142
  • New HunyuanVideo-I2V by @a-r-r-o-w in #11066
  • [doc] Fix Korean Controlnet Train doc by @flyxiv in #11141
  • Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101
  • add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150
  • [Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159
  • Set self._hf_peft_config_loaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class by @kentdan3msu in #11155
  • WanI2V encode_image by @hlky in #11164
  • [Docs] Update Wan Docs with memory optimizations by @DN6 in #11089
  • Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions by @hlky in #11139
  • Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167
  • [Docs] Fix environment variables in installation.md by @remarkablemark in #11179
  • Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline by @hlky in #11034
  • Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918
  • [tests] no hard-coded cuda by @faaany in #11186
  • [WIP] Add Wan Video2Video by @DN6 in #11053
  • map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU by @yao-matrix in #11191
  • fix autocast by @jiqing-feng in #11190
  • fix: for checking mandatory and optional pipeline components by @elismasilva in #11189
  • remove unnecessary call to F.pad by @bm-synth in #10620
  • allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301
  • [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197
  • Revert save_model in ModelMixin save_pretrained and use safe_serialization=False in test by @hlky in #11196
  • [docs] torch_dtype map by @hlky in #11194
  • Fix enable_sequential_cpu_offload in CogView4Pipeline by @hlky in #11195
  • SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192
  • Update import_utils.py by @Lakshaysharma048 in #10329
  • Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187
  • feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188
  • [Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615
  • Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198
  • Update Style Bot workflow by @hanouticelina in #11202
  • Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156
  • Fix Single File loading for LTX VAE by @DN6 in #11200
  • [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603
  • [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning by @tolgacangoz in #11174
  • Add Wan with STG as a community pipeline by @Ednaordinary in #11184
  • Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146
  • enable 1 case on XPU by @yao-matrix in #11219
  • ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391
  • [docs] MPS update by @stevhliu in #11212
  • Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175
  • [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8461
  • [Training] Better image interpolation in training scripts by @asomoza in #11206
  • [LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453
  • introduce compute arch specific expectations and fix test_sd3_img2img_inference failure by @yao-matrix in #11227
  • [Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111
  • Flux quantized with lora by @hlky in #10990
  • [feat] implement record_stream when using CUDA streams during group offloading by @sayakpaul in #11081
  • [bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132
  • minor update to sana sprint docs. by @sayakpaul in #11236
  • [docs] minor updates to dtype map docs. by @sayakpaul in #11237
  • [LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985
  • fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238
  • AudioLDM2 Fixes by @hlky in #11244
  • AutoModel by @hlky in #11115
  • fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU by @yao-matrix in #11245
  • [docs] AutoModel by @hlky in #11250
  • Update Ruff to latest Version by @DN6 in #10919
  • fix flux controlnet bug by @free001style in #11152
  • fix timeout constant by @sayakpaul in #11252
  • fix consisid imports by @sayakpaul in #11254
  • Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @guiyrt
    • IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)
    • [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
    • MultiControlNetUnionModel on SDXL (#10747)
    • SD3 IP-Adapter runtime checkpoint conversion (#10718)
    • Comprehensive type checking for from_pretrained kwargs (#10758)
    • Multi IP-Adapter for Flux pipelines (#10867)
  • @chengzeyi
    • [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)
    • Fix Graph Breaks When Compiling CogView4 (#10959)
    • Fix Wan I2V Quality (#11087)
  • @entrpn
    • implementing flux on TPUs with ptxla (#10515)
    • reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
    • update readme instructions. (#11096)
  • @SHYuanBest
    • [core] ConsisID (#10140)
  • @faaany
    • [tests] make tests device-agnostic (part 3) (#10437)
    • make tensors contiguous before passing to safetensors (#10761)
    • [tests] make tests device-agnostic (part 4) (#10508)
    • [tests] enable bnb tests on xpu (#11001)
    • [tests] make cuda only tests device-agnostic (#11058)
    • [tests] no hard-coded cuda (#11186)
    • [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
  • @yiyixuxu
    • fix offload gpu tests etc (#10366)
    • follow-up refactor on lumina2 (#10776)
    • [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)
    • update check_input for cogview4 (#10966)
    • remove F.rms_norm for now (#11126)
    • add sana-sprint (#11074)
  • @DN6
    • [CI] Update HF_TOKEN in all workflows (#10613)
    • [CI] Fix Truffle Hog failure (#10769)
    • [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
    • [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
    • [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
    • [Docs] Fix toctree sorting (#10894)
    • [CI] Improvements to conditional GPU PR tests (#10859)
    • [CI] Fix Fast GPU tests on PR (#10912)
    • [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
    • [CI] Update Stylebot Permissions (#10931)
    • [Single File] Add user agent to SF download requests. (#10979)
    • [Single File] Add single file support for Wan T2V/I2V (#10991)
    • Fix for fetching variants only (#10646)
    • [Quantization] Add Quanto backend (#10756)
    • [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
    • [Refactor] Clean up import utils boilerplate (#11026)
    • Provide option to reduce CPU RAM usage in Group Offload (#11106)
    • [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
    • [Docs] Update Wan Docs with memory optimizations (#11089)
    • [WIP] Add Wan Video2Video (#11053)
    • Add CacheMixin to Wan and LTX Transformers (#11187)
    • Fix Single File loading for LTX VAE (#11200)
    • Update Ruff to latest Version (#10919)
  • @Anonym0u3
    • Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)
  • @lavinal712
    • create a script to train autoencoderkl (#10605)
    • [BUG] Fix Autoencoderkl train script (#11113)
  • @Marlon154
    • Add community pipeline for semantic guidance for FLUX (#10610)
  • @ParagEkbote
    • Fix Documentation about Image-to-Image Pipeline (#10704)
    • Notebooks for Community Scripts-6 (#10713)
    • Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)
    • Notebooks for Community Scripts-7 (#10846)
    • Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
    • Notebooks for Community Scripts-8 (#11128)
  • @suzukimain
    • [Community] Enhanced Model Search (#10417)
  • @staoxiao
    • Add OmniGen (#10148)
  • @elismasilva
    • feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)
    • fix: [Community pipeline] Fix flattened elements on image (#10774)
    • feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
    • fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
    • fix: for checking mandatory and optional pipeline components (#11189)
    • feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
  • @zhuole1025
    • Add support for lumina2 (#10642)
  • @zRzRzRzRzRzRzR
    • CogView4 (supports different length c and uc) (#10649)
    • Update pipeline_cogview4.py (#10944)
    • [Docs] CogView4 comment fix (#10957)
    • CogView4 Control Block (#10809)
    • Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)
  • @toshas
    • Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
  • @bubbliiiing
    • Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
  • @LittleNyima
    • Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
  • @kinam0252
    • Add STG to community pipelines (#10960)
  • @tolgacangoz
    • [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
    • Update README and example code for AnyText usage (#11028)
    • [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
  • @Ednaordinary
    • Add Wan with STG as a community pipeline (#11184)
Jan 15, 2025

Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

This patch release

  • Fixes a regression in loading Comfy UI format single file checkpoints for Flux
  • Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
  • Adds unload_lora_weights for Flux Control
  • Fixes a bug that prevents Hunyuan Video from running with batch size > 1
  • Allow Hunyuan Video to load LoRAs created from the original repository code

All commits

  • [Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
  • [CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
  • [CI] Update HF Token in Fast GPU Tests by @DN6 #10568
  • Fix batch > 1 in HunyuanVideo by @hlky in #10548
  • Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
  • Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
  • [LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
  • [LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
  • [LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from by @sayakpaul in #10495
  • [LoRA] feat: support unload_lora_weights() for Flux Control. by @sayakpaul in #10206
  • Fix Flux multiple Lora loading bug by @maxs-kan in #10388
  • [LoRA] fix: lora unloading when using expanded Flux LoRAs. by @sayakpaul in #10397
Dec 25, 2024

TorchAO Quantizer fixes

This patch release fixes a few bugs related to the TorchAO Quantizer introduced in v0.32.0.

  • Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
  • Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
  • Quantization was not performed due to faulty logic. This is now fixed and better tested.

Refer to our documentation to learn more about how to use different quantization backends.

All commits

Dec 23, 2024
Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more

https://github.com/user-attachments/assets/34d5f7ca-8e33-4401-8109-5c245ce7595f

This release took a while, but it has many exciting updates. It contains several new pipelines for image and video generation, new quantization backends, and more.

Going forward, to provide more transparency to the community about ongoing developments and releases in Diffusers, we will be making use of a roadmap tracker.

New Video Generation Pipelines 📹

Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:

Check out this section to learn more about the fine-tuning options available for these new video models.

New Image Generation Pipelines

Important Note about the new Flux Models

We can combine the regular Flux.1 Dev LoRAs with Flux Control LoRAs, Flux Control, and Flux Fill. For example, you can enable few-steps inference with Flux Fill using:

from diffusers import FluxFillPipeline
from diffusers.utils import load_image
import torch

pipe = FluxFillPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16
).to("cuda")

adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha"
pipe.load_lora_weights(adapter_id)

image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    guidance_scale=30,
    num_inference_steps=8,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-fill-dev.png")

To learn more, check out the documentation.

[!NOTE]
SANA is a small model compared to other models like Flux and Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. We support LoRA fine-tuning of SANA. Check out this section for more details.

Acknowledgements

  • Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in this PR. SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in this PR.
  • Shoutout to @guiyrt for contributing SD3.5 IP Adapter in this PR.

New Quantization Backends

Please be aware of the following caveats:

  • TorchAO quantized checkpoints cannot be serialized in safetensors currently. This may change in the future.
  • GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.

New training scripts

This release features many new training scripts for the community to play:

All commits

  • post-release 0.31.0 by @sayakpaul in #9742
  • fix bug in require_accelerate_version_greater by @faaany in #9746
  • [Official callbacks] SDXL Controlnet CFG Cutoff by @asomoza in #9311
  • [SD3-5 dreambooth lora] update model cards by @linoytsaban in #9749
  • config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved by @rshah240 in #9586
  • Some minor updates to the nightly and push workflows by @sayakpaul in #9759
  • [Docs] fix docstring typo in SD3 pipeline by @shenzhiy21 in #9765
  • [bugfix] bugfix for npu free memory by @leisuzz in #9640
  • [research_projects] add flux training script with quantization by @sayakpaul in #9754
  • Add a doc for AWS Neuron in Diffusers by @JingyaHuang in #9766
  • [refactor] enhance readability of flux related pipelines by @Luciennnnnnn in #9711
  • Added Support of Xlabs controlnet to FluxControlNetInpaintPipeline by @SahilCarterr in #9770
  • [research_projects] Update README.md to include a note about NF5 T5-xxl by @sayakpaul in #9775
  • [Fix] train_dreambooth_lora_flux_advanced ValueError: unexpected save model: <class 'transformers.models.t5.modeling_t5.T5EncoderModel'> by @rootonchair in #9777
  • [Fix] remove setting lr for T5 text encoder when using prodigy in flux dreambooth lora script by @biswaroop1547 in #9473
  • [SD 3.5 Dreambooth LoRA] support configurable training block & layers by @linoytsaban in #9762
  • [flux dreambooth lora training] make LoRA target modules configurable + small bug fix by @linoytsaban in #9646
  • adds the pipeline for pixart alpha controlnet by @raulc0399 in #8857
  • [core] Allegro T2V by @a-r-r-o-w in #9736
  • Allegro VAE fix by @a-r-r-o-w in #9811
  • [CI] add new runner for testing by @sayakpaul in #9699
  • [training] fixes to the quantization training script and add AdEMAMix optimizer as an option by @sayakpaul in #9806
  • [training] use the lr when using 8bit adam. by @sayakpaul in #9796
  • [Tests] clean up and refactor gradient checkpointing tests by @sayakpaul in #9494
  • [CI] add a big GPU marker to run memory-intensive tests separately on CI by @sayakpaul in #9691
  • [LoRA] fix: lora loading when using with a device_mapped model. by @sayakpaul in #9449
  • Revert "[LoRA] fix: lora loading when using with a device_mapped mode… by @yiyixuxu in #9823
  • [Model Card] standardize advanced diffusion training sd15 lora by @chiral-carbon in #7613
  • NPU Adaption for FLUX by @leisuzz in #9751
  • Fixes EMAModel "from_pretrained" method by @SahilCarterr in #9779
  • Update train_controlnet_flux.py,Fix size mismatch issue in validation by @ScilenceForest in #9679
  • Handling mixed precision for dreambooth flux lora training by @icsl-Jeon in #9565
  • Reduce Memory Cost in Flux Training by @leisuzz in #9829
  • Add Diffusion Policy for Reinforcement Learning by @DorsaRoh in #9824
  • [feat] add load_lora_adapter() for compatible models by @sayakpaul in #9712
  • Refac training utils.py by @RogerSinghChugh in #9815
  • [core] Mochi T2V by @a-r-r-o-w in #9769
  • [Fix] Test of sd3 lora by @SahilCarterr in #9843
  • Fix: Remove duplicated comma in distributed_inference.md by @vahidaskari in #9868
  • Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA by @jellyheadandrew in #9228
  • Updated _encode_prompt_with_clip and encode_prompt in train_dreamboth_sd3 by @SahilCarterr in #9800
  • [Core] introduce controlnet module by @sayakpaul in #8768
  • [Flux] reduce explicit device transfers and typecasting in flux. by @sayakpaul in #9817
  • Improve downloads of sharded variants by @DN6 in #9869
  • [fix] Replaced shutil.copy with shutil.copyfile by @SahilCarterr in #9885
  • Enabling gradient checkpointing in eval() mode by @MikeTkachuk in #9878
  • [FIX] Fix TypeError in DreamBooth SDXL when use_dora is False by @SahilCarterr in #9879
  • [Advanced LoRA v1.5] fix: gradient unscaling problem by @sayakpaul in #7018
  • Revert "[Flux] reduce explicit device transfers and typecasting in flux." by @sayakpaul in #9896
  • Feature IP Adapter Xformers Attention Processor by @elismasilva in #9881
  • Notebooks for Community Scripts Examples by @ParagEkbote in #9905
  • Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline by @painebenjamin in #9925
  • Update pipeline_flux_img2img.py by @example-git in #9928
  • add depth controlnet sd3 pre-trained checkpoints to docs by @pureexe in #9937
  • Move Wuerstchen Dreambooth to research_projects by @ParagEkbote in #9935
  • Update ip_adapter.py by @mkknightr in #8882
  • Modify apply_overlay for inpainting with padding_mask_crop (Inpainting area: "Only Masked") by @clarkkent0618 in #8793
  • Correct pipeline_output.py to the type Mochi by @twobob in #9945
  • Add all AttnProcessor classes in AttentionProcessor type by @Prgckwb in #9909
  • Fixed Nits in Docs and Example Script by @ParagEkbote in #9940
  • Add server example by @thealmightygrant in #9918
  • CogVideoX 1.5 by @zRzRzRzRzRzRzR in #9877
  • Notebooks for Community Scripts-2 by @ParagEkbote in #9952
  • [advanced flux training] bug fix + reduce memory cost as in #9829 by @linoytsaban in #9838
  • [LoRA] feat: save_lora_adapter() by @sayakpaul in #9862
  • Make CogVideoX RoPE implementation consistent by @a-r-r-o-w in #9963
  • [CI] Unpin torch<2.5 in CI by @DN6 in #9961
  • Move IP Adapter Scripts to research project by @ParagEkbote in #9960
  • add skip_layers argument to SD3 transformer model class by @bghira in #9880
  • Fix beta and exponential sigmas + add tests by @hlky in #9954
  • Flux latents fix by @DN6 in #9929
  • [LoRA] enable LoRA for Mochi-1 by @sayakpaul in #9943
  • Improve control net block index for sd3 by @linjiapro in #9758
  • Update handle single blocks on _convert_xlabs_flux_lora_to_diffusers by @raulmosa in #9915
  • fix controlnet module refactor by @yiyixuxu in #9968
  • Fix prepare latent image ids and vae sample generators for flux by @a-r-r-o-w in #9981
  • [Tests] skip nan lora tests on PyTorch 2.5.1 CPU. by @sayakpaul in #9975
  • make pipelines tests device-agnostic (part1) by @faaany in #9399
  • ControlNet from_single_file when already converted by @hlky in #9978
  • Flux Fill, Canny, Depth, Redux by @a-r-r-o-w in #9985
  • [SD3 dreambooth lora] smol fix to checkpoint saving by @linoytsaban in #9993
  • [Docs] add: missing pipelines from the spec. by @sayakpaul in #10005
  • Add prompt about wandb in examples/dreambooth/readme. by @SkyCol in #10014
  • [docs] Fix CogVideoX table by @a-r-r-o-w in #10008
  • Notebooks for Community Scripts-3 by @ParagEkbote in #10032
  • Sd35 controlnet by @yiyixuxu in #10020
  • Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler by @hlky in #10001
  • Update sdxl reference pipeline to latest sdxl pipeline by @dimitribarbot in #9938
  • [Community Pipeline] Add some feature for regional prompting pipeline by @cjkangme in #9874
  • Add sdxl controlnet reference community pipeline by @dimitribarbot in #9893
  • Change image_gen_aux repository URL by @asomoza in #10048
  • make pipelines tests device-agnostic (part2) by @faaany in #9400
  • [Mochi-1] ensuring to compute the fourier features in FP32 in Mochi encoder by @sayakpaul in #10031
  • [Fix] Syntax error by @SahilCarterr in #10068
  • [CI] Add quantization by @sayakpaul in #9832
  • Add sigmas to Flux pipelines by @hlky in #10081
  • Fixed Nits in Evaluation Docs by @ParagEkbote in #10063
  • fix link in the docs by @coding-famer in #10058
  • fix offloading for sd3.5 controlnets by @yiyixuxu in #10072
  • [Single File] Fix SD3.5 single file loading by @DN6 in #10077
  • Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline by @hlky in #10086
  • [Single File] Pass token when fetching interpreted config by @DN6 in #10082
  • Interpolate fix on cuda for large output tensors by @pcuenca in #10067
  • Convert sigmas to np.array in FlowMatch set_timesteps by @hlky in #10088
  • fix: missing AutoencoderKL lora adapter by @beniz in #9807
  • Let server decide default repo visibility by @Wauplin in #10047
  • Fix some documentation in ./src/diffusers/models/embeddings.py for demo by @DTG2005 in #9579
  • Don't stale close-to-merge by @pcuenca in #10096
  • Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG by @painebenjamin in #9932
  • Notebooks for Community Scripts-4 by @ParagEkbote in #10094
  • Fix Broken Link in Optimization Docs by @ParagEkbote in #10105
  • DPM++ third order fixes by @StAlKeR7779 in #9104
  • update by @aihao2000 in #7067
  • Avoid compiling a progress bar. by @lsb in #10098
  • [Bug fix] "previous_timestep()" in DDPM scheduling compatible with "trailing" and "linspace" options by @AnandK27 in #9384
  • Fix multi-prompt inference by @hlky in #10103
  • Test skip_guidance_layers in SD3 pipeline by @hlky in #10102
  • Use parameters + buffers when deciding upscale_dtype by @universome in #9882
  • [tests] refactor vae tests by @sayakpaul in #9808
  • add torch_xla support in pipeline_stable_audio.py by @<NOT FOUND> in #10109
  • Fix pipeline_stable_audio formating by @hlky in #10114
  • [bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components by @sayakpaul in #9840
  • Fix Broken Links in ReadMe by @ParagEkbote in #10117
  • Add sigmas to pipelines using FlowMatch by @hlky in #10116
  • [Flux Redux] add prompt & multiple image input by @linoytsaban in #10056
  • Fix a bug in the state dict judgment in ip_adapter.py. by @zhangp365 in #10095
  • Fix a bug for SD35 control net training and improve control net block index by @linjiapro in #10065
  • pass attn mask arg for flux by @yiyixuxu in #10122
  • [docs] load_lora_adapter by @stevhliu in #10119
  • Use torch.device instead of current device index for BnB quantizer by @a-r-r-o-w in #10069
  • [Tests] fix condition argument in xfail. by @sayakpaul in #10099
  • [Tests] xfail incompatible SD configs. by @sayakpaul in #10127
  • [FIX] Bug in FluxPosEmbed by @SahilCarterr in #10115
  • [Guide] Quantize your Diffusion Models with bnb by @ariG23498 in #10012
  • Remove duplicate checks for len(generator) != batch_size when generator is a list by @a-r-r-o-w in #10134
  • [community] Load Models from Sources like Civitai into Existing Pipelines by @suzukimain in #9986
  • [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); by @lawrence-cj in #9708
  • fixed a dtype bfloat16 bug in torch_utils.py by @zhangp365 in #10125
  • [LoRA] depcrecate save_attn_procs(). by @sayakpaul in #10126
  • Update ptxla training by @entrpn in #9864
  • support sd3.5 for controlnet example by @DavyMorgan in #9860
  • [Single file] Support revision argument when loading single file config by @a-r-r-o-w in #10168
  • [community pipeline] Add RF-inversion Flux pipeline by @linoytsaban in #9816
  • Improve post-processing performance by @soof-golan in #10170
  • Use torch in get_3d_rotary_pos_embed/_allegro by @hlky in #10161
  • Flux Control LoRA by @a-r-r-o-w in #9999
  • Add PAG Support for Stable Diffusion Inpaint Pipeline by @darshil0805 in #9386
  • [community pipeline rf-inversion] - fix example in doc by @linoytsaban in #10179
  • Fix Nonetype attribute error when loading multiple Flux loras by @jonathanyin12 in #10182
  • Added Error when len(gligen_images ) is not equal to len(gligen_phrases) in StableDiffusionGLIGENTextImagePipeline by @SahilCarterr in #10176
  • [Single File] Add single file support for AutoencoderDC by @DN6 in #10183
  • Add ControlNetUnion by @hlky in #10131
  • fix min-snr implementation by @ethansmith2000 in #8466
  • Add support for XFormers in SD3 by @CanvaChen in #8583
  • [LoRA] add a test to ensure set_adapters() and attn kwargs outs match by @sayakpaul in #10110
  • [CI] merge peft pr workflow into the main pr workflow. by @sayakpaul in #10042
  • [WIP][Training] Flux Control LoRA training script by @sayakpaul in #10130
  • [core] LTX Video by @a-r-r-o-w in #10021
  • Ci update tpu by @paulinebm in #10197
  • Remove negative_* from SDXL callback by @hlky in #10203
  • refactor StableDiffusionXLControlNetUnion by @hlky in #10200
  • update StableDiffusion3Img2ImgPipeline.add image size validation by @ZHJ19970917 in #10166
  • Remove mps workaround for fp16 GELU, which is now supported natively by @skotapati in #10133
  • [RF inversion community pipeline] add eta_decay by @linoytsaban in #10199
  • Allow image resolutions multiple of 8 instead of 64 in SVD pipeline by @mlfarinha in #6646
  • Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed by @hlky in #10156
  • add reshape to fix use_memory_efficient_attention in flax by @entrpn in #7918
  • Add offload option in flux-control training by @Adenialzz in #10225
  • Test error raised when loading normal and expanding loras together in Flux by @a-r-r-o-w in #10188
  • [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. by @lawrence-cj in #9982
  • [Tests] update always test pipelines list. by @sayakpaul in #10143
  • Update sana.md with minor corrections by @sayakpaul in #10232
  • [docs] minor stuff to ltx video docs. by @sayakpaul in #10229
  • Fix format issue in push_test yml by @DN6 in #10235
  • [core] Hunyuan Video by @a-r-r-o-w in #10136
  • Update pipeline_controlnet.py add support for pytorch_xla by @<NOT FOUND> in #10222
  • [Docs] add rest of the lora loader mixins to the docs. by @sayakpaul in #10230
  • Use t instead of timestep in _apply_perturbed_attention_guidance by @hlky in #10243
  • Add dynamic_shifting to SD3 by @hlky in #10236
  • Fix use_flow_sigmas by @hlky in #10242
  • Fix ControlNetUnion _callback_tensor_inputs by @hlky in #10218
  • Use non-human subject in StableDiffusion3ControlNetPipeline example by @hlky in #10214
  • Add enable_vae_tiling to AllegroPipeline, fix example by @hlky in #10212
  • Fix checkpoint in CogView3PlusPipeline example by @hlky in #10211
  • Fix RePaint Scheduler by @hlky in #10185
  • Add ControlNetUnion to AutoPipeline from_pretrained by @hlky in #10219
  • fix downsample bug in MidResTemporalBlock1D by @holmosaint in #10250
  • [core] TorchAO Quantizer by @a-r-r-o-w in #10009
  • [docs] Add missing AttnProcessors by @stevhliu in #10246
  • [chore] add contribution note for lawrence. by @sayakpaul in #10253
  • Fix copied from comment in Mochi lora loader by @a-r-r-o-w in #10255
  • [LoRA] Support LTX Video by @a-r-r-o-w in #10228
  • [docs] Clarify dtypes for Sana by @a-r-r-o-w in #10248
  • [Single File] Add GGUF support by @DN6 in #9964
  • Fix Mochi Quality Issues by @DN6 in #10033
  • [tests] Remove/rename unsupported quantization torchao type by @a-r-r-o-w in #10263
  • [docs] delete_adapters() by @stevhliu in #10245
  • [Community Pipeline] Fix typo that cause error on regional prompting pipeline by @cjkangme in #10251
  • Add set_shift to FlowMatchEulerDiscreteScheduler by @hlky in #10269
  • [LoRA] feat: lora support for SANA. by @sayakpaul in #10234
  • [chore] fix: licensing headers in mochi and ltx by @sayakpaul in #10275
  • Use torch in get_2d_rotary_pos_embed by @hlky in #10155
  • [chore] fix: reamde -> readme by @sayakpaul in #10276
  • Make time_embed_dim of UNet2DModel changeable by @Bichidian in #10262
  • Support pass kwargs to sd3 custom attention processor by @Matrix53 in #9818
  • Flux Control(Depth/Canny) + Inpaint by @affromero in #10192
  • Fix sigma_last with use_flow_sigmas by @hlky in #10267
  • Fix Doc links in GGUF and Quantization overview docs by @DN6 in #10279
  • Make zeroing prompt embeds for Mochi Pipeline configurable by @DN6 in #10284
  • [Single File] Add single file support for Flux Canny, Depth and Fill by @DN6 in #10288
  • [tests] Fix broken cuda, nightly and lora tests on main for CogVideoX by @a-r-r-o-w in #10270
  • Rename Mochi integration test correctly by @a-r-r-o-w in #10220
  • [tests] remove nullop import checks from lora tests by @a-r-r-o-w in #10273
  • [chore] Update README_sana.md to update the default model by @sayakpaul in #10285
  • Hunyuan VAE tiling fixes and transformer docs by @a-r-r-o-w in #10295
  • Add Flux Control to AutoPipeline by @hlky in #10292
  • Update lora_conversion_utils.py by @zhaowendao30 in #9980
  • Check correct model type is passed to from_pretrained by @hlky in #10189
  • [LoRA] Support HunyuanVideo by @SHYuanBest in #10254
  • [Single File] Add single file support for Mochi Transformer by @DN6 in #10268
  • Allow Mochi Transformer to be split across multiple GPUs by @DN6 in #10300
  • Fix local_files_only for checkpoints with shards by @hlky in #10294
  • Fix failing lora tests after HunyuanVideo lora by @a-r-r-o-w in #10307
  • unet's sample_size attribute is to accept tuple(h, w) in StableDiffusionPipeline by @Foundsheep in #10181
  • Enable Gradient Checkpointing for UNet2DModel (New) by @dg845 in #7201
  • [WIP] SD3.5 IP-Adapter Pipeline Integration by @guiyrt in #9987
  • Add support for sharded models when TorchAO quantization is enabled by @a-r-r-o-w in #10256
  • Make tensors in ResNet contiguous for Hunyuan VAE by @a-r-r-o-w in #10309
  • [Single File] Add GGUF support for LTX by @DN6 in #10298
  • [LoRA] feat: support loading regular Flux LoRAs into Flux Control, and Fill by @sayakpaul in #10259
  • [Tests] add integration tests for lora expansion stuff in Flux. by @sayakpaul in #10318
  • Mochi docs by @DN6 in #9934
  • [Docs] Update ltx_video.md to remove generator from from_pretrained() by @sayakpaul in #10316
  • docs: fix a mistake in docstring by @Leojc in #10319
  • [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length by @syntaxticsugr in #10306
  • [docs] Fix quantization links by @stevhliu in #10323
  • [Sana]add 2K related model for Sana by @lawrence-cj in #10322
  • [Docs] Update gguf.md to remove generator from the pipeline from_pretrained by @sayakpaul in #10299
  • Fix push_tests_mps.yml by @hlky in #10326
  • Fix EMAModel test_from_pretrained by @hlky in #10325
  • Support Flux IP Adapter by @hlky in #10261
  • flux controlnet inpaint config bug by @yigitozgenc in #10291
  • Community hosted weights for diffusers format HunyuanVideo weights by @a-r-r-o-w in #10344
  • Fix enable_sequential_cpu_offload in test_kandinsky_combined by @hlky in #10324
  • update get_parameter_dtype by @yiyixuxu in #10342
  • [Single File] Add Single File support for HunYuan video by @DN6 in #10320
  • [Sana bug] bug fix for 2K model config by @lawrence-cj in #10340
  • .from_single_file() - Add missing .shape by @gau-nernst in #10332
  • Bump minimum TorchAO version to 0.7.0 by @a-r-r-o-w in #10293
  • [docs] fix: torchao example. by @sayakpaul in #10278
  • [tests] Refactor TorchAO serialization fast tests by @a-r-r-o-w in #10271
  • [SANA LoRA] sana lora training tests and misc. by @sayakpaul in #10296
  • [Single File] Fix loading by @DN6 in #10349
  • [Tests] QoL improvements to the LoRA test suite by @sayakpaul in #10304
  • Fix FluxIPAdapterTesterMixin by @hlky in #10354
  • Fix failing CogVideoX LoRA fuse test by @a-r-r-o-w in #10352
  • Rename LTX blocks and docs title by @a-r-r-o-w in #10213
  • [LoRA] test fix by @sayakpaul in #10351
  • [Tests] Fix more tests sayak by @sayakpaul in #10359
  • [core] LTX Video 0.9.1 by @a-r-r-o-w in #10330
  • Release: v0.32.0 by @sayakpaul (direct commit on v0.32.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @faaany
    • fix bug in require_accelerate_version_greater (#9746)
    • make pipelines tests device-agnostic (part1) (#9399)
    • make pipelines tests device-agnostic (part2) (#9400)
  • @linoytsaban
    • [SD3-5 dreambooth lora] update model cards (#9749)
    • [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
    • [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)
    • [advanced flux training] bug fix + reduce memory cost as in #9829 (#9838)
    • [SD3 dreambooth lora] smol fix to checkpoint saving (#9993)
    • [Flux Redux] add prompt & multiple image input (#10056)
    • [community pipeline] Add RF-inversion Flux pipeline (#9816)
    • [community pipeline rf-inversion] - fix example in doc (#10179)
    • [RF inversion community pipeline] add eta_decay (#10199)
  • @raulc0399
    • adds the pipeline for pixart alpha controlnet (#8857)
  • @yiyixuxu
    • Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)
    • fix controlnet module refactor (#9968)
    • Sd35 controlnet (#10020)
    • fix offloading for sd3.5 controlnets (#10072)
    • pass attn mask arg for flux (#10122)
    • update get_parameter_dtype (#10342)
  • @jellyheadandrew
    • Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)
  • @DN6
    • Improve downloads of sharded variants (#9869)
    • [CI] Unpin torch<2.5 in CI (#9961)
    • Flux latents fix (#9929)
    • [Single File] Fix SD3.5 single file loading (#10077)
    • [Single File] Pass token when fetching interpreted config (#10082)
    • [Single File] Add single file support for AutoencoderDC (#10183)
    • Fix format issue in push_test yml (#10235)
    • [Single File] Add GGUF support (#9964)
    • Fix Mochi Quality Issues (#10033)
    • Fix Doc links in GGUF and Quantization overview docs (#10279)
    • Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)
    • [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)
    • [Single File] Add single file support for Mochi Transformer (#10268)
    • Allow Mochi Transformer to be split across multiple GPUs (#10300)
    • [Single File] Add GGUF support for LTX (#10298)
    • Mochi docs (#9934)
    • [Single File] Add Single File support for HunYuan video (#10320)
    • [Single File] Fix loading (#10349)
  • @ParagEkbote
    • Notebooks for Community Scripts Examples (#9905)
    • Move Wuerstchen Dreambooth to research_projects (#9935)
    • Fixed Nits in Docs and Example Script (#9940)
    • Notebooks for Community Scripts-2 (#9952)
    • Move IP Adapter Scripts to research project (#9960)
    • Notebooks for Community Scripts-3 (#10032)
    • Fixed Nits in Evaluation Docs (#10063)
    • Notebooks for Community Scripts-4 (#10094)
    • Fix Broken Link in Optimization Docs (#10105)
    • Fix Broken Links in ReadMe (#10117)
  • @painebenjamin
    • Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline (#9925)
    • Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
  • @hlky
    • Fix beta and exponential sigmas + add tests (#9954)
    • ControlNet from_single_file when already converted (#9978)
    • Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler (#10001)
    • Add sigmas to Flux pipelines (#10081)
    • Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline (#10086)
    • Convert sigmas to np.array in FlowMatch set_timesteps (#10088)
    • Fix multi-prompt inference (#10103)
    • Test skip_guidance_layers in SD3 pipeline (#10102)
    • Fix pipeline_stable_audio formating (#10114)
    • Add sigmas to pipelines using FlowMatch (#10116)
    • Use torch in get_3d_rotary_pos_embed/_allegro (#10161)
    • Add ControlNetUnion (#10131)
    • Remove negative_* from SDXL callback (#10203)
    • refactor StableDiffusionXLControlNetUnion (#10200)
    • Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed (#10156)
    • Use t instead of timestep in _apply_perturbed_attention_guidance (#10243)
    • Add dynamic_shifting to SD3 (#10236)
    • Fix use_flow_sigmas (#10242)
    • Fix ControlNetUnion _callback_tensor_inputs (#10218)
    • Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)
    • Add enable_vae_tiling to AllegroPipeline, fix example (#10212)
    • Fix checkpoint in CogView3PlusPipeline example (#10211)
    • Fix RePaint Scheduler (#10185)
    • Add ControlNetUnion to AutoPipeline from_pretrained (#10219)
    • Add set_shift to FlowMatchEulerDiscreteScheduler (#10269)
    • Use torch in get_2d_rotary_pos_embed (#10155)
    • Fix sigma_last with use_flow_sigmas (#10267)
    • Add Flux Control to AutoPipeline (#10292)
    • Check correct model type is passed to from_pretrained (#10189)
    • Fix local_files_only for checkpoints with shards (#10294)
    • Fix push_tests_mps.yml (#10326)
    • Fix EMAModel test_from_pretrained (#10325)
    • Support Flux IP Adapter (#10261)
    • Fix enable_sequential_cpu_offload in test_kandinsky_combined (#10324)
    • Fix FluxIPAdapterTesterMixin (#10354)
  • @dimitribarbot
    • Update sdxl reference pipeline to latest sdxl pipeline (#9938)
    • Add sdxl controlnet reference community pipeline (#9893)
  • @suzukimain
    • [community] Load Models from Sources like Civitai into Existing Pipelines (#9986)
  • @lawrence-cj
    • [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)
    • [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. (#9982)
    • [Sana]add 2K related model for Sana (#10322)
    • [Sana bug] bug fix for 2K model config (#10340)
  • @darshil0805
    • Add PAG Support for Stable Diffusion Inpaint Pipeline (#9386)
  • @affromero
    • Flux Control(Depth/Canny) + Inpaint (#10192)
  • @SHYuanBest
    • [LoRA] Support HunyuanVideo (#10254)
  • @guiyrt
    • [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)
Oct 22, 2024

v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):

  • A regular one
  • A timestep-distilled one enabling few-step inference

Make sure to fill up the form by going to the model page, and then run huggingface-cli login before running the code below.

# make sure to update diffusers
# pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
	"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

Follow the documentation to know more.

Cogview3-plus

We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!

from diffusers import CogView3PlusPipeline
import torch

pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")

# Enable it to reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."

image = pipe(
    prompt=prompt,
    guidance_scale=7.0,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview3.png")

Refer to the documentation to know more.

Quantization

We have landed native quantization support in Diffusers, starting with bitsandbytes as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
pip install -Uq diffusers
from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
import torch

ckpt_id = "black-forest-labs/FLUX.1-dev"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = FluxTransformer2DModel.from_pretrained(
    ckpt_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

Then, we use model_nf4 to instantiate the FluxPipeline:


from diffusers import FluxPipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    ckpt_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"

image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.

Training scripts

We have a fresh bucket of training scripts with this release:

Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.

Misc

  • We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
  • Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!

All commits

  • Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
  • Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
  • set max_shard_size to None for pipeline save_pretrained by @a-r-r-o-w in #9447
  • adapt masked im2im pipeline for SDXL by @noskill in #7790
  • [Flux] add lora integration tests. by @sayakpaul in #9353
  • [training] CogVideoX Lora by @a-r-r-o-w in #9302
  • Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
  • [refactor] LoRA tests by @a-r-r-o-w in #9481
  • [CI] fix nightly model tests by @sayakpaul in #9483
  • [Cog] some minor fixes and nits by @sayakpaul in #9466
  • [Tests] Reduce the model size in the lumina test by @saqlain2204 in #8985
  • Fix the bug of sd3 controlnet training when using gradient checkpointing. by @pibbo88 in #9498
  • [Schedulers] Add exponential sigmas / exponential noise schedule by @hlky in #9499
  • Allow DDPMPipeline half precision by @sbinnee in #9222
  • Add Noise Schedule/Schedule Type to Schedulers Overview documentation by @hlky in #9504
  • fix bugs for sd3 controlnet training by @xduzhangjiayu in #9489
  • [Doc] Fix path and and also import imageio by @LukeLIN-web in #9506
  • [CI] allow faster downloads from the Hub in CI. by @sayakpaul in #9478
  • a few fix for SingleFile tests by @yiyixuxu in #9522
  • Add exponential sigmas to other schedulers and update docs by @hlky in #9518
  • [Community Pipeline] Batched implementation of Flux with CFG by @sayakpaul in #9513
  • Update community_projects.md by @lee101 in #9266
  • [docs] Model sharding by @stevhliu in #9521
  • update get_parameter_dtype by @yiyixuxu in #9526
  • [Doc] Improved level of clarity for latents_to_rgb. by @LagPixelLOL in #9529
  • [Schedulers] Add beta sigmas / beta noise schedule by @hlky in #9509
  • flux controlnet fix (control_modes batch & others) by @yiyixuxu in #9507
  • [Tests] Fix ChatGLMTokenizer by @asomoza in #9536
  • [bug] Precedence of operations in VAE should be slicing -> tiling by @a-r-r-o-w in #9342
  • [LoRA] make set_adapters() method more robust. by @sayakpaul in #9535
  • [examples] add train flux-controlnet scripts in example. by @PromeAIpro in #9324
  • [Tests] [LoRA] clean up the serialization stuff. by @sayakpaul in #9512
  • [Core] fix variant-identification. by @sayakpaul in #9253
  • [refactor] remove conv_cache from CogVideoX VAE by @a-r-r-o-w in #9524
  • [train_instruct_pix2pix.py]Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @AnandK27 in #9316
  • [chore] fix: retain memory utility. by @sayakpaul in #9543
  • [LoRA] support Kohya Flux LoRAs that have text encoders as well by @sayakpaul in #9542
  • Add beta sigmas to other schedulers and update docs by @hlky in #9538
  • Add PAG support to StableDiffusionControlNetPAGInpaintPipeline by @juancopi81 in #8875
  • Support bfloat16 for Upsample2D by @darhsu in #9480
  • fix cogvideox autoencoder decode by @Xiang-cd in #9569
  • [sd3] make sure height and size are divisible by 16 by @yiyixuxu in #9573
  • fix xlabs FLUX lora conversion typo by @Clement-Lelievre in #9581
  • [Chore] add a note on the versions in Flux LoRA integration tests by @sayakpaul in #9598
  • fix vae dtype when accelerate config using --mixed_precision="fp16" by @xduzhangjiayu in #9601
  • refac: docstrings in import_utils.py by @yijun-lee in #9583
  • Fix for use_safetensors parameters, allow use of parameter on loading submodels by @elismasilva in #9576)
  • Update distributed_inference.md to include transformer.device_map by @sayakpaul in #9553
  • fix: CogVideox train dataset _preprocess_data crop video by @glide-the in #9574
  • [LoRA] Handle DoRA better by @sayakpaul in #9547
  • Fixed noise_pred_text referenced before assignment. by @LagPixelLOL in #9537
  • Fix the bug that joint_attention_kwargs is not passed to the FLUX's transformer attention processors by @HorizonWind2004 in #9517
  • refac/pipeline_output by @yijun-lee in #9582
  • [LoRA] allow loras to be loaded with low_cpu_mem_usage. by @sayakpaul in #9510
  • add PAG support for SD Img2Img by @SahilCarterr in #9463
  • make controlnet support interrupt by @pureexe in #9620
  • [LoRA] fix dora test to catch the warning properly. by @sayakpaul in #9627
  • flux controlnet control_guidance_start and control_guidance_end implement by @ighoshsubho in #9571
  • fix IsADirectoryError when running the training code for sd3_dreambooth_lora_16gb.ipynb by @alaister123 in #9634
  • Add Differential Diffusion to Kolors by @saqlain2204 in #9423
  • FluxMultiControlNetModel by @hlky in #9647
  • [CI] replace ubuntu version to 22.04. by @sayakpaul in #9656
  • [docs] Fix xDiT doc image damage by @Eigensystem in #9655
  • [Tests] increase transformers version in test_low_cpu_mem_usage_with_loading by @sayakpaul in #9662
  • Flux - soft inpainting via differential diffusion by @ryanlyn in #9268
  • CogView3Plus DiT by @zRzRzRzRzRzRzR in #9570
  • Improve the performance and suitable for NPU computing by @leisuzz in #9642
  • [Community Pipeline] Add 🪆Matryoshka Diffusion Models by @tolgacangoz in #9157
  • Added Lora Support to SD3 Img2Img Pipeline by @SahilCarterr in #9659
  • Add pred_original_sample to if not return_dict path by @hlky in #9649
  • Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel by @hlky in #9652
  • Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel by @hlky in #9651
  • Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 by @hlky in #9650
  • Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral by @hlky in #9616
  • [Fix] when run load pretain with local_files_only, local variable 'cached_folder' referenced before assignment by @RobinXL in #9376
  • [Chore] fix import of EntryNotFoundError. by @sayakpaul in #9676
  • Dreambooth lora flux bug 3dtensor to 2dtensor by @0x-74 in #9653
  • refactor image_processor.py file by @charchit7 in #9608
  • [doc] Fix some docstrings in src/diffusers/training_utils.py by @mreraser in #9606
  • [docs] refactoring docstrings in community/hd_painter.py by @Jwaminju in #9593
  • [docs] refactoring docstrings in models/embeddings_flax.py by @Jwaminju in #9592
  • Fix some documentation in ./src/diffusers/models/adapter.py by @ahnjj in #9591
  • [training] CogVideoX-I2V LoRA by @a-r-r-o-w in #9482
  • [authored by @Anghellia) Add support of Xlabs Controlnets #9638 by @yiyixuxu in #9687
  • Docs: CogVideoX by @glide-the in #9578
  • Resolves [BUG] 'GatheredParameters' object is not callable by @charchit7 in #9614
  • [LoRA] log a warning when there are missing keys in the LoRA loading. by @sayakpaul in #9622
  • [SD3 dreambooth-lora training] small updates + bug fixes by @linoytsaban in #9682
  • [peft] simple update when unscale by @sweetcocoa in #9689
  • [pipeline] CogVideoX-Fun Control by @a-r-r-o-w in #9671
  • [core] improve VAE encode/decode framewise batching by @a-r-r-o-w in #9684
  • [tests] fix name and unskip CogI2V integration test by @a-r-r-o-w in #9683
  • [Flux] Add advanced training script + support textual inversion inference by @linoytsaban in #9434
  • [refactor] DiffusionPipeline.download by @a-r-r-o-w in #9557
  • [advanced flux lora script] minor updates to readme by @linoytsaban in #9705
  • Fix bug in Textual Inversion Unloading by @bonlime in #9304
  • Add prompt scheduling callback to community scripts by @hlky in #9718
  • [CI] pin max torch version to fix CI errors by @a-r-r-o-w in #9709
  • [Docker] pin torch versions in the dockerfiles. by @sayakpaul in #9721
  • make deps_table_update to fix CI tests by @a-r-r-o-w in #9720
  • [Quantization] Add quantization support for bitsandbytes by @sayakpaul in #9213
  • Fix typo in cogvideo pipeline by @lichenyu20 in #9722
  • [Docs] docs to xlabs controlnets. by @sayakpaul in #9688
  • [docs] add docstrings in pipline_stable_diffusion.py by @jeongiin in #9590
  • minor doc/test update by @yiyixuxu in #9734
  • [bugfix] reduce float value error when adding noise by @gameofdimension in #9004
  • fix singlestep dpm tests by @yiyixuxu in #9716
  • Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models by @tolgacangoz in #9723
  • Update sd3 controlnet example by @DavyMorgan in #9735
  • [Fix] Using sharded checkpoints with gated repositories by @asomoza in #9737
  • [bitsandbbytes] follow-ups by @sayakpaul in #9730
  • Fix typos by @DN6 in #9739
  • is_safetensors_compatible fix by @DN6 in #9741
  • Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ighoshsubho
    • Feature flux controlnet img2img and inpaint pipeline (#9408)
    • flux controlnet control_guidance_start and control_guidance_end implement (#9571)
  • @noskill
    • adapt masked im2im pipeline for SDXL (#7790)
  • @saqlain2204
    • [Tests] Reduce the model size in the lumina test (#8985)
    • Add Differential Diffusion to Kolors (#9423)
  • @hlky
    • [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
    • Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
    • Add exponential sigmas to other schedulers and update docs (#9518)
    • [Schedulers] Add beta sigmas / beta noise schedule (#9509)
    • Add beta sigmas to other schedulers and update docs (#9538)
    • FluxMultiControlNetModel (#9647)
    • Add pred_original_sample to if not return_dict path (#9649)
    • Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
    • Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
    • Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
    • Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
    • Add prompt scheduling callback to community scripts (#9718)
  • @yiyixuxu
    • a few fix for SingleFile tests (#9522)
    • update get_parameter_dtype (#9526)
    • flux controlnet fix (control_modes batch & others) (#9507)
    • [sd3] make sure height and size are divisible by 16 (#9573)
    • [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
    • minor doc/test update (#9734)
    • fix singlestep dpm tests (#9716)
  • @PromeAIpro
    • [examples] add train flux-controlnet scripts in example. (#9324)
  • @juancopi81
    • Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
  • @glide-the
    • fix: CogVideox train dataset _preprocess_data crop video (#9574)
    • Docs: CogVideoX (#9578)
  • @SahilCarterr
    • add PAG support for SD Img2Img (#9463)
    • Added Lora Support to SD3 Img2Img Pipeline (#9659)
  • @ryanlyn
    • Flux - soft inpainting via differential diffusion (#9268)
  • @zRzRzRzRzRzRzR
    • CogView3Plus DiT (#9570)
  • @tolgacangoz
    • [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
    • Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
  • @linoytsaban
    • [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
    • [Flux] Add advanced training script + support textual inversion inference (#9434)
    • [advanced flux lora script] minor updates to readme (#9705)
Sep 17, 2024
v0.30.3: CogVideoX Image-to-Video and Video-to-Video

This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at THUDM/CogVideoX-5b-I2V (Link). Stay tuned for the release!

This release features two new pipelines:

  • CogVideoXImageToVideoPipeline
  • CogVideoXVideoToVideoPipeline

Additionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the vae.enable_tiling() method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.

CogVideoXImageToVideoPipeline

The code below demonstrates how to use the new image-to-video pipeline:

import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Optionally, enable memory optimizations.
# If enabling CPU offloading, remember to remove `pipe.to("cuda")` above
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
video = pipe(image, prompt, use_dynamic_cfg=True)
export_to_video(video.frames[0], "output.mp4", fps=8)
<table align=center> <tr> <td align=center colspan=1><img src="https://github.com/user-attachments/assets/1c7c1d86-f97e-44dd-9b17-4fec2bbc2b1a" /></td> <td align=center colspan=1><video src="https://github.com/user-attachments/assets/a115372e-c539-4ca0-b0d0-770d62862257"> Your broswer does not support the video tag. </video></td> </tr> </table>

CogVideoXVideoToVideoPipeline

The code below demonstrates how to use the new video-to-video pipeline:

import torch
from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline
from diffusers.utils import export_to_video, load_video

# Models: "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"
pipe = CogVideoXVideoToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-trial", torch_dtype=torch.bfloat16)
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

input_video = load_video(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
prompt = (
    "An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and "
    "valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in "
    "the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, "
    "moons, but the remainder of the scene is mostly realistic."
)

video = pipe(
    video=input_video, prompt=prompt, strength=0.8, guidance_scale=6, num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=8)
<table align=center> <tr> <td align=center><video src="https://github.com/user-attachments/assets/bc9273ff-e459-42f9-af1e-c9b084b28f4d"> Your browser does not support the video tag. </video></td> </tr> </table>

Shoutout to @tin2tin for the awesome demonstration!

Refer to our documentation to learn more about it.

All commits

  • [core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
  • [core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
  • [CI] Quick fix for Cog Video Test by @DN6 in #9373
  • [refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
  • CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418
Aug 31, 2024
v0.30.2: Update from single file default repository

All commits

  • update runway repo for single_file by @yiyixuxu in #9323
  • Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1 by @DN6 in #9280
  • [IP Adapter] Fix cache_dir and local_files_only for image encoder by @asomoza in #9272
Aug 24, 2024
V0.30.1: CogVideoX-5B & Bug fixes

CogVideoX-5B

This patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on the Huggingface Hub at THUDM/CogVideoX-5b. Stay tuned for the release!

Additionally, we have implemented VAE tiling feature, which reduces the memory requirement for CogVideoX models. With this update, the total memory requirement is now 12GB for CogVideoX-2B and 21GB for CogVideoX-5B (with CPU offloading). To Enable this feature, simply call enable_tiling() on the VAE.

The code below shows how to generate a video with CogVideoX-5B

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

prompt = "Tracking shot,late afternoon light casting long shadows,a cyclist in athletic gear pedaling down a scenic mountain road,winding path with trees and a lake in the background,invigorating and adventurous atmosphere."

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
)

pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

video = pipe(
    prompt=prompt,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
).frames[0]

export_to_video(video, "output.mp4", fps=8)

https://github.com/user-attachments/assets/c2d4f7e8-ef86-4da6-8085-cb9f83f47f34

Refer to our documentation to learn more about it.

All commits

  • Update Video Loading/Export to use imageio by @DN6 in #9094
  • [refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
  • Add Learned PE selection for Auraflow by @cloneofsimo in #9182
  • [Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
  • [Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
  • [tests] fix broken xformers tests by @a-r-r-o-w in #9206
  • Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
  • [Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243
Aug 7, 2024
v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactors

New pipelines

Image taken from the Lumina’s GitHub.

This release features many new pipelines. Below, we provide a list:

Audio pipelines 🎼

Video pipelines 📹

  • Latte (thanks to @maxin-cn for the contribution through #8404)
  • CogVideoX (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)

Image pipelines 🎇

Be sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:

  • Lumina introduces a new DiT architecture that is multilingual in nature.
  • Kolors is inspired by SDXL and is also multilingual in nature.
  • Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date. For efficient DreamBooth + LoRA training, we recommend @bghira’s guide here.
  • We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with optimum.quanto. Check it out here.
  • CogVideoX introduces a novel and truly 3D VAE into Diffusers.

Perturbed Attention Guidance (PAG)

Without PAGWith PAG

We already had community pipelines for PAG, but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG here, which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently support the following pipelines with PAG:

  • StableDiffusionPAGPipeline
  • StableDiffusion3PAGPipeline
  • StableDiffusionControlNetPAGPipeline
  • StableDiffusionXLPAGPipeline
  • StableDiffusionXLPAGImg2ImgPipeline
  • StableDiffusionXLPAGInpaintPipeline
  • StableDiffusionXLControlNetPAGPipeline
  • StableDiffusion3PAGPipeline
  • PixArtSigmaPAGPipeline
  • HunyuanDiTPAGPipeline
  • AnimateDiffPAGPipeline
  • KolorsPAGPipeline

If you’re interested in helping us extend our PAG support for other pipelines, please check out this thread. Special thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.

AnimateDiff with SparseCtrl

SparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by ControlNet, to process these signals in the AnimateDiff framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.

There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:

Scribble Interpolation Example:

<table> <tr> <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png" alt="Image 1"></td> <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png" alt="Image 2"></td> <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png" alt="Image 3"></td> </tr> <tr> <td colspan="3" style="text-align: center; vertical-align: middle;"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-sparsectrl-scribble-results.gif" alt="Image 4"></td> </tr> </table>
import torch

from diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel
from diffusers.schedulers import DPMSolverMultistepScheduler
from diffusers.utils import export_to_gif, load_image

motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3", torch_dtype=torch.float16).to(device)
controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16).to(device)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to(device)
pipe = AnimateDiffSparseControlNetPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    motion_adapter=motion_adapter,
    controlnet=controlnet,
    vae=vae,
    scheduler=scheduler,
    torch_dtype=torch.float16,
).to(device)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, beta_schedule="linear", algorithm_type="dpmsolver++", use_karras_sigmas=True)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-v1-5-3", adapter_name="motion_lora")
pipe.fuse_lora(lora_scale=1.0)

prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality"
negative_prompt = "low quality, worst quality, letterboxed"

image_files = [
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png"
]
condition_frame_indices = [0, 8, 15]
conditioning_frames = [load_image(img_file) for img_file in image_files]

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=25,
    conditioning_frames=conditioning_frames,
    controlnet_conditioning_scale=1.0,
    controlnet_frame_indices=condition_frame_indices,
    generator=torch.Generator().manual_seed(1337),
).frames[0]
export_to_gif(video, "output.gif")

📜 Check out the docs here.

FreeNoise for AnimateDiff

FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.

Instead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.
 

import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
pipe = AnimateDiffPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerAncestralDiscreteScheduler(
    beta_schedule="linear",
    beta_start=0.00085,
    beta_end=0.012,
)

pipe.enable_free_noise()
pipe.vae.enable_slicing()

pipe.enable_model_cpu_offload()
frames = pipe(
    "An astronaut riding a horse on Mars.",
    num_frames=64,
    num_inference_steps=20,
    guidance_scale=7.0,
    decode_chunk_size=2,
).frames[0]

export_to_gif(frames, "freenoise-64.gif")

LoRA refactor

We have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a LoraBaseMixin class which is subclassed by the different pipeline-level LoRA loading classes such as StableDiffusionXLLoraLoaderMixin. This document provides an overview of the available classes.

Additionally, we have increased the coverage of methods within the PeftAdapterMixin class. This refactoring allows all the supported models to share common LoRA functionalities such set_adapter(), add_adapter(), and so on.

To learn more details, please follow this PR. If you see any LoRA-related issues stemming from these refactors, please open an issue.

🚨 Fixing attention projection fusion

We discovered that the implementation of fuse_qkv_projections() was broken. This was fixed in this PR. Additionally, this PR added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available here.

All commits

  • [Release notification] add some info when there is an error. by @sayakpaul in #8718
  • Modify FlowMatch Scale Noise by @asomoza in #8678
  • Fix json WindowsPath crash by @vincedovy in #8662
  • Motion Model / Adapter versatility by @Arlaz in #8301
  • [Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
  • [Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
  • [Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
  • fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
  • [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline by @asomoza in #8679
  • Benchmarking workflow fix by @sayakpaul in #8389
  • add PAG support for SD architecture by @shauray8 in #8725
  • shift cache in benchmarking. by @sayakpaul in #8740
  • [train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8476
  • fix the LR schedulers for dreambooth_lora by @WenheLI in #8510
  • [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support by @gnobitab in #8747
  • Always raise from previous error by @Wauplin in #8751
  • [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart by @yiyixuxu in #8735
  • Remove legacy single file model loading mixins by @DN6 in #8754
  • Allow from_transformer in SD3ControlNetModel by @haofanwang in #8749
  • [SD3 LoRA Training] Fix errors when not training text encoders by @asomoza in #8743
  • [Tests] add test suite for SD3 DreamBooth by @sayakpaul in #8650
  • [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding by @yiyixuxu in #8761
  • Enforce ordering when running Pipeline slow tests by @DN6 in #8763
  • Fix warning in UNetMotionModel by @DN6 in #8756
  • Fix indent in dreambooth lora advanced SD 15 script by @DN6 in #8753
  • Fix mistake in Single File Docs page by @DN6 in #8765
  • Reflect few contributions on philosophy.md that were not reflected on #8294 by @mreraser in #8690
  • correct attention_head_dim for JointTransformerBlock by @yiyixuxu in #8608
  • [LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8670
  • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @sayakpaul in #8773
  • Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab by @sayakpaul in #8762
  • Update README.md to include Colab link by @sayakpaul in #8775
  • [Chore] add dummy lora attention processors to prevent failures in other libs by @sayakpaul in #8777
  • [advanced dreambooth lora] add clip_skip arg by @linoytsaban in #8715
  • [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet by @gnobitab in #8783
  • Fix minor bug in SD3 img2img test by @a-r-r-o-w in #8779
  • [Tests] fix sharding tests by @sayakpaul in #8764
  • Add vae_roundtrip.py example by @thomaseding in #7104
  • [Single File] Allow loading T5 encoder in mixed precision by @DN6 in #8778
  • Fix saving text encoder weights and kohya weights in advanced dreambooth lora script by @DN6 in #8766
  • Improve model card for push_to_hub trainers by @apolinario in #8697
  • fix loading sharded checkpoints from subfolder by @yiyixuxu in #8798
  • [Alpha-VLLM Team] Add Lumina-T2X to diffusers by @PommesPeter in #8652
  • Fix static typing and doc typos by @zhuoqun-chen in #8807
  • Remove unnecessary lines by @tolgacangoz in #8569
  • Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference by @IrohXu in #8709
  • [Tests] fix more sharding tests by @sayakpaul in #8797
  • Reformat docstring for get_timestep_embedding by @alanhdu in #8811
  • Latte: Latent Diffusion Transformer for Video Generation by @maxin-cn in #8404
  • [Core] Add Kolors by @asomoza in #8812
  • [Core] Add AuraFlow by @sayakpaul in #8796
  • Add VAE tiling option for SD3 by @DN6 in #8791
  • Add single file loading support for AnimateDiff by @DN6 in #8819
  • [Docs] add AuraFlow docs by @sayakpaul in #8851
  • [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU by @ustcuna in #8643
  • add PAG support sd15 controlnet by @tuanh123789 in #8820
  • [tests] fix typo in pag tests by @a-r-r-o-w in #8845
  • [Docker] include python3.10 dev and solve header missing problem by @sayakpaul in #8865
  • [Cont'd] Add the SDE variant of DPM-Solver and DPM-Solver++ to DPM Single Step by @tolgacangoz in #8269
  • modify pocs. by @sayakpaul in #8867
  • [Core] fix: shard loading and saving when variant is provided. by @sayakpaul in #8869
  • [Chore] allow auraflow latest to be torch compile compatible. by @sayakpaul in #8859
  • Add AuraFlowPipeline and KolorsPipeline to auto map by @Beinsezii in #8849
  • Fix multi-gpu case for train_cm_ct_unconditional.py by @tolgacangoz in #8653
  • [docs] pipeline docs for latte by @a-r-r-o-w in #8844
  • [Chore] add disable forward chunking to SD3 transformer. by @sayakpaul in #8838
  • [Core] remove resume_download from Hub related stuff by @sayakpaul in #8648
  • Add option to SSH into CPU runner. by @DN6 in #8884
  • SSH into cpu runner fix by @DN6 in #8888
  • SSH into cpu runner additional fix by @DN6 in #8893
  • [SDXL] Fix uncaught error with image to image by @asomoza in #8856
  • fix loop bug in SlicedAttnProcessor by @shinetzh in #8836
  • [fix code annotation] Adjust the dimensions of the rotary positional embedding. by @wangqixun in #8890
  • allow tensors in several schedulers step() call by @catwell in #8905
  • Use model_info.id instead of model_info.modelId by @Wauplin in #8912
  • [Training] SD3 training fixes by @sayakpaul in #8917
  • 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) by @Snailpong in #8804
  • [Docs] small fixes to pag guide. by @sayakpaul in #8920
  • Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 by @mreraser in #8914
  • [Tests] proper skipping of request caching test by @sayakpaul in #8908
  • Add attentionless VAE support by @Gothos in #8769
  • [Benchmarking] check if runner helps to restore benchmarking by @sayakpaul in #8929
  • Update pipeline test fetcher by @DN6 in #8931
  • [Tests] reduce the model size in the audioldm2 fast test by @ariG23498 in #7846
  • fix: checkpoint save issue in advanced dreambooth lora sdxl script by @akbaig in #8926
  • [Tests] Improve transformers model test suite coverage - Temporal Transformer by @rootonchair in #8932
  • Fix Colab and Notebook checks for diffusers-cli env by @tolgacangoz in #8408
  • Fix name when saving text inversion embeddings in dreambooth advanced scripts by @DN6 in #8927
  • [Core] fix QKV fusion for attention by @sayakpaul in #8829
  • remove residual i from auraflow. by @sayakpaul in #8949
  • [CI] Skip flaky download tests in PR CI by @DN6 in #8945
  • [AuraFlow] fix long prompt handling by @sayakpaul in #8937
  • Added Code for Gradient Accumulation to work for basic_training by @RandomGamingDev in #8961
  • [AudioLDM2] Fix cache pos for GPT-2 generation by @sanchit-gandhi in #8964
  • [Tests] fix slices of 26 tests (first half) by @sayakpaul in #8959
  • [CI] Slow Test Updates by @DN6 in #8870
  • [tests] speed up animatediff tests by @a-r-r-o-w in #8846
  • [LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8774
  • Update TensorRT img2img community pipeline by @asfiyab-nvidia in #8899
  • Enable CivitAI SDXL Inpainting Models Conversion by @mazharosama in #8795
  • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @yiyixuxu in #8976
  • fix guidance_scale value not equal to the value in comments by @efwfe in #8941
  • [Chore] remove all is from auraflow. by @sayakpaul in #8980
  • [Chore] add LoraLoaderMixin to the inits by @sayakpaul in #8981
  • Added accelerator based gradient accumulation for basic_example by @RandomGamingDev in #8966
  • [CI] Fix parallelism in nightly tests by @DN6 in #8983
  • [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix by @DN6 in #8986
  • [fix] FreeInit step index out of bounds by @a-r-r-o-w in #8969
  • [core] AnimateDiff SparseCtrl by @a-r-r-o-w in #8897
  • remove unused code from pag attn procs by @a-r-r-o-w in #8928
  • [Kolors] Add IP Adapter by @asomoza in #8901
  • [CI] Update runner configuration for setup and nightly tests by @XciD in #9005
  • [Docs] credit where it's due for Lumina and Latte. by @sayakpaul in #9000
  • handle lora scale and clip skip in lpw sd and sdxl community pipelines by @noskill in #8988
  • [LoRA] fix: animate diff lora stuff. by @sayakpaul in #8995
  • Stable Audio integration by @ylacombe in #8716
  • [core] Move community AnimateDiff ControlNet to core by @a-r-r-o-w in #8972
  • Fix Stable Audio repository id by @ylacombe in #9016
  • PAG variant for AnimateDiff by @a-r-r-o-w in #8789
  • Updates deps for pipeline test fetcher by @DN6 in #9033
  • fix load sharded checkpoint from a subfolder (local path) by @yiyixuxu in #8913
  • [docs] fix pia example by @a-r-r-o-w in #9015
  • Flux pipeline by @sayakpaul in #9043
  • [Core] Add PAG support for PixArtSigma by @sayakpaul in #8921
  • [Flux] allow tests to run by @sayakpaul in #9050
  • Fix Nightly Deps by @DN6 in #9036
  • Update transformer_flux.py by @haofanwang in #9060
  • Errata: Fix typos & \s+$ by @tolgacangoz in #9008
  • [refactor] create modeling blocks specific to AnimateDiff by @a-r-r-o-w in #8979
  • Fix grammar mistake. by @prideout in #9072
  • [Flux] minor documentation fixes for flux. by @sayakpaul in #9048
  • Update TensorRT txt2img and inpaint community pipelines by @asfiyab-nvidia in #9037
  • type get_attention_scores as optional in get_attention_scores by @psychedelicious in #9075
  • [refactor] apply qk norm in attention processors by @a-r-r-o-w in #9071
  • [FLUX] support LoRA by @sayakpaul in #9057
  • [Tests] Improve transformers model test suite coverage - Latte by @rootonchair in #8919
  • PAG variant for HunyuanDiT, PAG refactor by @a-r-r-o-w in #8936
  • [Docs] add stable cascade unet doc. by @sayakpaul in #9066
  • add sentencepiece as a soft dependency by @yiyixuxu in #9065
  • Fix typos by @omahs in #9077
  • Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor by @tolgacangoz in #9002
  • [Core] add QKV fusion to AuraFlow and PixArt Sigma by @sayakpaul in #8952
  • [bug] remove unreachable norm_type=ada_norm_continuous from norm3 initialization conditions by @a-r-r-o-w in #9006
  • [Tests] Improve transformers model test suite coverage - Hunyuan DiT by @rootonchair in #8916
  • update by @DN6 (direct commit on v0.30.0-release)
  • [Docs] Add community projects section to docs by @DN6 in #9013
  • add PAG support for Stable Diffusion 3 by @sunovivid in #8861
  • Fix loading sharded checkpoints when we have variants by @SunMarc in #9061
  • [Single File] Add single file support for Flux Transformer by @DN6 in #9083
  • [Kolors] Add PAG by @asomoza in #8934
  • fix train_dreambooth_lora_sd3.py loading hook by @sayakpaul in #9107
  • [core] FreeNoise by @a-r-r-o-w in #8948
  • Flux fp16 inference fix by @latentCall145 in #9097
  • [feat] allow sparsectrl to be loaded from single file by @a-r-r-o-w in #9073
  • Freenoise change vae_batch_size to decode_chunk_size by @DN6 in #9110
  • Add CogVideoX text-to-video generation model by @zRzRzRzRzRzRzR in #9082
  • Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @DN6
    • [Tests] Fix precision related issues in slow pipeline tests (#8720)
    • Remove legacy single file model loading mixins (#8754)
    • Enforce ordering when running Pipeline slow tests (#8763)
    • Fix warning in UNetMotionModel (#8756)
    • Fix indent in dreambooth lora advanced SD 15 script (#8753)
    • Fix mistake in Single File Docs page (#8765)
    • [Single File] Allow loading T5 encoder in mixed precision (#8778)
    • Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
    • Add VAE tiling option for SD3 (#8791)
    • Add single file loading support for AnimateDiff (#8819)
    • Add option to SSH into CPU runner. (#8884)
    • SSH into cpu runner fix (#8888)
    • SSH into cpu runner additional fix (#8893)
    • Update pipeline test fetcher (#8931)
    • Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
    • [CI] Skip flaky download tests in PR CI (#8945)
    • [CI] Slow Test Updates (#8870)
    • [CI] Fix parallelism in nightly tests (#8983)
    • [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
    • Updates deps for pipeline test fetcher (#9033)
    • Fix Nightly Deps (#9036)
    • update
    • [Docs] Add community projects section to docs (#9013)
    • [Single File] Add single file support for Flux Transformer (#9083)
    • Freenoise change vae_batch_size to decode_chunk_size (#9110)
  • @shauray8
    • add PAG support for SD architecture (#8725)
  • @gnobitab
    • [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
    • [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
  • @yiyixuxu
    • [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
    • [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
    • correct attention_head_dim for JointTransformerBlock (#8608)
    • fix loading sharded checkpoints from subfolder (#8798)
    • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
    • fix load sharded checkpoint from a subfolder (local path) (#8913)
    • add sentencepiece as a soft dependency (#9065)
  • @PommesPeter
    • [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
  • @IrohXu
    • Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)
  • @maxin-cn
    • Latte: Latent Diffusion Transformer for Video Generation (#8404)
  • @ustcuna
    • [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
  • @tuanh123789
    • add PAG support sd15 controlnet (#8820)
  • @Snailpong
    • 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
  • @asfiyab-nvidia
    • Update TensorRT img2img community pipeline (#8899)
    • Update TensorRT txt2img and inpaint community pipelines (#9037)
  • @ylacombe
    • Stable Audio integration (#8716)
    • Fix Stable Audio repository id (#9016)
  • @sunovivid
    • add PAG support for Stable Diffusion 3 (#8861)
  • @zRzRzRzRzRzRzR
    • Add CogVideoX text-to-video generation model (#9082)
Jun 27, 2024
v0.29.2: fix deprecation and LoRA bugs 🐞

All commits

  • [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558
  • [LoRA] refactor lora conversion utility. by @sayakpaul in #8295
  • [LoRA] fix conversion utility so that lora dora loads correctly by @sayakpaul in #8688
  • [Chore] remove deprecation from transformer2d regarding the output class. by @sayakpaul in #8698
  • [LoRA] fix vanilla fine-tuned lora loading. by @sayakpaul in #8691
  • Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)
Jun 21, 2024
v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixes

SD3 CntrolNet

<img width="624" alt="image" src="https://github.com/huggingface/diffusers/assets/46553287/db384753-cfbb-488c-bc74-8280f9bee24e">
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers.utils import load_image

controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)

pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")
control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg")
prompt = "A girl holding a sign that says InstantX"
image = pipe(prompt, control_image=control_image, controlnet_conditioning_scale=0.7).images[0]
image.save("sd3.png")

📜 Refer to the official docs here to learn more about it.

Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!

Expanded single file support

We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file-t5-fp8.png')

Using Long Prompts with the T5 Text Encoder

We increased the default sequence length for the T5 Text Encoder from a maximum of 77 to 256! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length to a maximum of 512. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. It’s set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."

image = pipe(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
Beforemax_sequence_length=256max_sequence_length=512

All commits

  • Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)
  • prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)
  • fix warning log for Transformer SD3 by @sayakpaul in #8496
  • Add SD3 AutoPipeline mappings by @Beinsezii in #8489
  • Add Hunyuan AutoPipe mapping by @Beinsezii in #8505
  • Expand Single File support in SD3 Pipeline by @DN6 in #8517
  • [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed. by @DN6 in #8462
  • Fix sharding when no device_map is passed by @SunMarc in #8531
  • [SD3 Inference] T5 Token limit by @asomoza in #8506
  • Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in #8542
  • Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in #8566
  • fix from_single_file for checkpoints with t5 by @yiyixuxu in #8631
  • [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @wangqixun
    • Support SD3 ControlNet and Multi-ControlNet. (#8566)
Jun 12, 2024
v0.29.0: Stable Diffusion 3

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

As the model is gated, before using it with diffusers, you first need to go to the Stable Diffusion 3 Medium Hugging Face page, fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate.

huggingface-cli login

The code below shows how to perform text-to-image generation with SD3:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]
image

Refer to our documentation for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline.

Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out this directory for more details.

Previous123Next
Latest
v0.37.1
Tracking Since
Jul 21, 2022
Last fetched Apr 18, 2026