{"id":"src_-khuqUkENPl1hYY6-1dXV","slug":"diffusers","name":"Diffusers","type":"github","url":"https://github.com/huggingface/diffusers","orgId":"org_GDdYeYynEgCEBNBwy-m6s","org":{"slug":"hugging-face","name":"Hugging Face"},"isPrimary":false,"metadata":"{\"evaluatedMethod\":\"github\",\"evaluatedAt\":\"2026-04-07T17:19:13.586Z\",\"changelogDetectedAt\":\"2026-04-07T17:27:26.845Z\"}","releaseCount":90,"releasesLast30Days":1,"avgReleasesPerWeek":0.2,"latestVersion":"v0.37.1","latestDate":"2026-03-25T08:23:43.000Z","changelogUrl":null,"hasChangelogFile":false,"lastFetchedAt":"2026-04-18T14:04:48.066Z","trackingSince":"2022-07-21T14:52:31.000Z","releases":[{"id":"rel_TdmtCG-Re804wAMDMM_dF","version":"v0.37.1","title":"Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading ","summary":"-  Fix for loading `ModularPipelines` with `AutoModel` type hints in their `modular_model_index.json` #13271\r\n-  Fix Flux Klein LoRA loading #13313\r\n-...","content":"-  Fix for loading `ModularPipelines` with `AutoModel` type hints in their `modular_model_index.json` #13271\r\n-  Fix Flux Klein LoRA loading #13313\r\n-  Fix unguarded `torchvision` import in Cosmos Predict 2.5 #13321    \r\n","publishedAt":"2026-03-25T08:23:43.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.37.1","media":[]},{"id":"rel_KkHFN8zt4taA9jQA70fAE","version":"v0.37.0","title":"Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥","summary":"## Modular Diffusers\r\n\r\nModular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pi...","content":"## Modular Diffusers\r\n\r\nModular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing `DiffusionPipeline` class, providing a more flexible way to create custom diffusion pipelines.\r\n\r\nFind more details on how to get started with Modular Diffusers [here](https://huggingface.co/docs/diffusers/en/modular_diffusers/quickstart), and also check out the [announcement post](https://huggingface.co/blog/modular-diffusers).\r\n\r\n## New Pipelines and Models\r\n\r\n### Image 🌆\r\n\r\n- [Z Image Omni Base](https://huggingface.co/docs/diffusers/en/api/pipelines/z_image): Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.\r\n- [Flux2 Klein](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2#diffusers.Flux2KleinPipeline):FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.\r\n- [Qwen Image Layered](https://huggingface.co/Qwen/Qwen-Image-Layered): Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.\r\n- [FIBO Edit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo_edit): Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in [https://github.com/huggingface/diffusers/pull/12930](https://github.com/huggingface/diffusers/pull/12930).\r\n- [Cosmos Predict2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.\r\n- [Cosmos Transfer2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.\r\n- [GLM-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/glm_image): GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in [https://github.com/huggingface/diffusers/pull/12973](https://github.com/huggingface/diffusers/pull/12973).\r\n- [RAE](https://huggingface.co/docs/diffusers/main/api/models/autoencoder_rae): Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.\r\n\r\n### Video + audio 🎥 🎼\r\n\r\n- [LTX-2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2): LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!\r\n- [Helios](https://huggingface.co/docs/diffusers/main/api/pipelines/helios): Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in [https://github.com/huggingface/diffusers/pull/13208](https://github.com/huggingface/diffusers/pull/13208).\r\n\r\n## Improvements to Core Library\r\n\r\n### New caching methods\r\n\r\n- [MagCache](https://github.com/huggingface/diffusers/pull/12744) — thanks to @AlanPonnachan!\r\n- [TaylorSeer](https://github.com/huggingface/diffusers/pull/12648/) — thanks to @toilaluan!\r\n\r\n### New context-parallelism (CP) backends\r\n\r\n- [Unified Sequence Parallel attention](https://github.com/huggingface/diffusers/pull/12693) — thanks to @Bissmella!\r\n- [Ulysses Anything Attention](https://github.com/huggingface/diffusers/pull/12996) — thanks to @DefTruth!\r\n\r\n### Misc\r\n\r\n- Mambo-G Guidance: New guider implementation (#12862)\r\n- Laplace Scheduler for DDPM (#11320)\r\n- Custom Sigmas in UniPCMultistepScheduler (#12109)\r\n- MultiControlNet support for SD3 Inpainting (#11251)\r\n- Context parallel in native flash attention (#12829)\r\n- NPU Ulysses Attention Support (#12919)\r\n- Fix Wan 2.1 I2V Context Parallel Inference (#12909)\r\n- Fix Qwen-Image Context Parallel Inference (#12970)\r\n- Introduction to `@apply_lora_scale` decorator for simplifying model definitions (#12994)\r\n- Introduction of pipeline-level “cpu” `device_map` (#12811)\r\n- Enable CP for kernels-based attention backends (#12812)\r\n- Diffusers is fully functional with Transformers V5 (https://github.com/huggingface/diffusers/pull/12976)\r\n\r\nA lot of the above features/improvements came as part of the [MVP program](https://github.com/huggingface/diffusers/issues/12635) we have been running. Immense thanks to the contributors!\r\n\r\n## Bug Fixes\r\n\r\n- Fix QwenImageEditPlus on NPU (#13017)\r\n- Fix MT5Tokenizer → use `T5Tokenizer` for Transformers v5.0+ compatibility (#12877)\r\n- Fix Wan/WanI2V patchification (#13038)\r\n- Fix LTX-2 inference with `num_videos_per_prompt > 1` and CFG (#13121)\r\n- Fix Flux2 img2img prediction (#12855)\r\n- Fix QwenImage `txt_seq_lens` handling (#12702)\r\n- Fix `prefix_token_len` bug (#12845)\r\n- Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)\r\n- Fix `is_fsdp` determination (#12960)\r\n- Fix GLM-Image `get_image_features` API (#13052)\r\n- Fix Wan 2.2 when either transformer isn't present (#13055)\r\n- Fix guider issue (#13147)\r\n- Fix torchao quantizer for new versions (#12901)\r\n- Fix GGUF for unquantized types with unquantize kernels (#12498)\r\n- Make Qwen hidden states contiguous for torchao (#13081)\r\n- Make Flux hidden states contiguous (#13068)\r\n- Fix Kandinsky 5 hardcoded CUDA autocast (#12814)\r\n- Fix `aiter` availability check (#13059)\r\n- Fix attention mask check for unsupported backends (#12892)\r\n- Allow `prompt` and `prior_token_ids` simultaneously in `GlmImagePipeline` (#13092)\r\n- GLM-Image batch support (#13007)\r\n- Cosmos 2.5 Video2World frame extraction fix (#13018)\r\n- ResNet: only use contiguous in training mode (#12977)\r\n\r\n## All commits\r\n\r\n* [PRX] Improve model compilation  by @WaterKnight1998 in #12787\r\n* Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py  by @delmalih in #12798\r\n* [Modular]z-image  by @yiyixuxu in #12808\r\n* Fix Qwen Edit Plus modular for multi-image input  by @sayakpaul in #12601\r\n* [WIP] Add Flux2 modular  by @DN6 in #12763\r\n* [docs] improve distributed inference cp docs.  by @sayakpaul in #12810\r\n* post release 0.36.0  by @sayakpaul in #12804\r\n* Update distributed_inference.md to correct syntax  by @sayakpaul in #12827\r\n* [lora] Remove lora docs unneeded and add \" # Copied from ...\"  by @sayakpaul in #12824\r\n* support CP in native flash attention  by @sywangyi in #12829\r\n* [qwen-image] edit 2511 support  by @naykun in #12839\r\n* fix pytest tests/pipelines/pixart_sigma/test_pixart.py::PixArtSigmaPi…  by @sywangyi in #12842\r\n* Support for control-lora  by @lavinal712 in #10686\r\n* Add support for LongCat-Image  by @junqiangwu in #12828\r\n* fix the prefix_token_len bug  by @junqiangwu in #12845\r\n* extend TorchAoTest::test_model_memory_usage to other platform  by @sywangyi in #12768\r\n* Qwen Image Layered Support  by @naykun in #12853\r\n* Z-Image-Turbo ControlNet  by @hlky in #12792\r\n* Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion  by @miguelmartin75 in #12852\r\n* more update in modular   by @yiyixuxu in #12560\r\n* Feature: Add Mambo-G Guidance as Guider  by @MatrixTeam-AI in #12862\r\n* Add `OvisImagePipeline` in `AUTO_TEXT2IMAGE_PIPELINES_MAPPING`  by @alvarobartt in #12876\r\n* Cosmos Predict2.5 14b Conversion  by @miguelmartin75 in #12863\r\n* Use `T5Tokenizer` instead of `MT5Tokenizer` (removed in Transformers v5.0+)  by @alvarobartt in #12877\r\n* Add z-image-omni-base implementation  by @RuoyiDu in #12857\r\n* fix torchao quantizer for new torchao versions  by @vkuzo in #12901\r\n* fix Qwen Image Transformer single file loading mapping function to be consistent with other loader APIs  by @mbalabanski in #12894\r\n* Z-Image-Turbo from_single_file fix  by @hlky in #12888\r\n* chore: fix dev version in setup.py  by @DefTruth in #12904\r\n* Community Pipeline: Add z-image differential img2img  by @r4inm4ker in #12882\r\n* Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py  by @miguelmartin75 in #12914\r\n* Fix wan 2.1 i2v context parallel  by @DefTruth in #12909\r\n* fix the use of device_map in CP docs  by @sayakpaul in #12902\r\n* [core] remove unneeded autoencoder methods when subclassing from `AutoencoderMixin`  by @sayakpaul in #12873\r\n* Detect 2.0 vs 2.1 ZImageControlNetModel  by @hlky in #12861\r\n* Refactor environment variable assignments in workflow  by @paulinebm in #12916\r\n* Add codeQL workflow  by @paulinebm in #12917\r\n* Delete .github/workflows/codeql.yml by @paulinebm (direct commit on v0.37.0-release)\r\n* CodeQL workflow for security analysis by @paulinebm (direct commit on v0.37.0-release)\r\n* Check for attention mask in backends that don't support it  by @dxqb in #12892\r\n* [Flux.1] improve pos embed for ascend npu by computing on npu  by @zhangtao0408 in #12897\r\n* LTX Video 0.9.8  long multi prompt  by @yaoqih in #12614\r\n* Add FSDP option for Flux2  by @leisuzz in #12860\r\n* Add transformer cache context for SkyReels-V2 pipelines & Update docs  by @tolgacangoz in #12837\r\n* [docs] fix torchao typo.  by @sayakpaul in #12883\r\n* Update wan.md to remove unneeded hfoptions  by @sayakpaul in #12890\r\n* Improve docstrings and type hints in scheduling_edm_euler.py  by @delmalih in #12871\r\n* [Modular] Video for Mellon  by @asomoza in #12924\r\n* Add LTX 2.0 Video Pipelines  by @dg845 in #12915\r\n* Add environment variables to checkout step  by @paulinebm in #12927\r\n* Improve docstrings and type hints in scheduling_consistency_decoder.py  by @delmalih in #12928\r\n* Fix: Remove hardcoded CUDA autocast in Kandinsky 5 to fix import warning  by @adi776borate in #12814\r\n* Upgrade GitHub Actions for Node 24 compatibility  by @salmanmkc in #12865\r\n* fix the warning torch_dtype is deprecated  by @msdsm in #12841\r\n* [NPU] npu attention enable ulysses  by @TmacAaron in #12919\r\n* Torchao floatx version guard  by @howardzhang-cv in #12923\r\n* Bugfix for dreambooth flux2 img2img2  by @leisuzz in #12825\r\n* [Modular] qwen refactor  by @yiyixuxu in #12872\r\n* [modular] Tests for custom blocks in modular diffusers  by @sayakpaul in #12557\r\n* [chore] remove controlnet implementations outside controlnet module.  by @sayakpaul in #12152\r\n* [core] Handle progress bar and logging in distributed environments  by @sayakpaul in #12806\r\n* Improve docstrings and type hints in scheduling_consistency_models.py  by @delmalih in #12931\r\n* [Feature] MultiControlNet support for SD3Impainting  by @ishan-modi in #11251\r\n* Laplace Scheduler for DDPM  by @gapatron in #11320\r\n* Store vae.config.scaling_factor to prevent missing attr reference (sdxl advanced dreambooth training script)  by @Teriks in #12346\r\n* Add thread-safe wrappers for components in pipeline (examples/server-async/utils/requestscopedpipeline.py)  by @FredyRivera-dev in #12515\r\n* [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL  by @kashif in #11573\r\n* Change timestep device to cpu for xla  by @bhavya01 in #11501\r\n* [LoRA] add lora_alpha to sana README  by @linoytsaban in #11780\r\n* Fix wrong param types, docs, and handles noise=None in scale_noise of FlowMatching schedulers  by @Promisery in #11669\r\n* [docs] Remote inference  by @stevhliu in #12372\r\n* Align HunyuanVideoConditionEmbedding with CombinedTimestepGuidanceTextProjEmbeddings  by @samutamm in #12316\r\n* [Fix] syntax in QwenImageEditPlusPipeline  by @SahilCarterr in #12371\r\n* Fix ftfy name error in Wan pipeline  by @dsocek in #12314\r\n* [modular] error early in `enable_auto_cpu_offload`  by @sayakpaul in #12578\r\n* [ChronoEdit] support multiple loras  by @zhangjiewu in #12679\r\n* fix how `is_fsdp` is determined  by @sayakpaul in #12960\r\n* [LoRA] add LoRA support to LTX-2  by @sayakpaul in #12933\r\n* Fix: typo in autoencoder_dc.py  by @tvelovraf in #12687\r\n* [Modular] better docstring  by @yiyixuxu in #12932\r\n* [docs] polish caching docs.  by @sayakpaul in #12684\r\n* Fix typos  by @omahs in #12705\r\n* Fix link to diffedit implementation reference  by @JuanFKurucz in #12708\r\n*  Fix QwenImage txt_seq_lens handling  by @kashif in #12702\r\n* Bugfix for flux2 img2img2 prediction  by @leisuzz in #12855\r\n* Add Flag to `PeftLoraLoaderMixinTests` to Enable/Disable Text Encoder LoRA Tests  by @dg845 in #12962\r\n* Add Unified Sequence Parallel attention  by @Bissmella in #12693\r\n* [Modular] Changes for using WAN I2V  by @asomoza in #12959\r\n* Z rz rz rz rz rz rz r cogview  by @sayakpaul in #12973\r\n* Update distributed_inference.md to reposition sections  by @sayakpaul in #12971\r\n* [chore] make transformers version check stricter for glm image.  by @sayakpaul in #12974\r\n* Remove 8bit device restriction  by @SunMarc in #12972\r\n* `disable_mmap` in pipeline `from_pretrained`  by @hlky in #12854\r\n* [Modular] mellon utils  by @yiyixuxu in #12978\r\n* LongCat Image pipeline: Allow offloading/quantization of text_encoder component  by @Yahweasel in #12963\r\n* Add `ChromaInpaintPipeline`  by @hameerabbasi in #12848\r\n* fix Qwen-Image series context parallel  by @DefTruth in #12970\r\n* Flux2 klein  by @yiyixuxu in #12982\r\n* [modular] fix a bug in mellon param & improve docstrings  by @yiyixuxu in #12980\r\n* add klein docs.  by @sayakpaul in #12984\r\n* LTX 2 Single File Support  by @dg845 in #12983\r\n* [core] gracefully error out when attn-backend x cp combo isn't supported.  by @sayakpaul in #12832\r\n* Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py  by @delmalih in #12936\r\n* [Docs] Replace root CONTRIBUTING.md with symlink to source docs  by @delmalih in #12986\r\n* make style && make quality by @sayakpaul (direct commit on v0.37.0-release)\r\n* Revert \"make style && make quality\" by @sayakpaul (direct commit on v0.37.0-release)\r\n* [chore] make style to push new changes.  by @sayakpaul in #12998\r\n* Fibo edit pipeline  by @galbria in #12930\r\n* Fix variable name in docstring for PeftAdapterMixin.set_adapters  by @geekuillaume in #13003\r\n* Improve docstrings and type hints in scheduling_ddim_cogvideox.py  by @delmalih in #12992\r\n* [scheduler] Support custom sigmas in UniPCMultistepScheduler  by @a-r-r-o-w in #12109\r\n* feat: accelerate longcat-image with regional compile  by @lgyStoic in #13019\r\n* Improve docstrings and type hints in scheduling_ddim_flax.py  by @delmalih in #13010\r\n* Improve docstrings and type hints in scheduling_ddim_inverse.py  by @delmalih in #13020\r\n* fix Dockerfiles for cuda and xformers.  by @sayakpaul in #13022\r\n* Resnet only use contiguous in training mode.  by @jiqing-feng in #12977\r\n* feat: add qkv projection fuse for longcat transformers  by @lgyStoic in #13021\r\n* Improve docstrings and type hints in scheduling_ddim_parallel.py  by @delmalih in #13023\r\n* Improve docstrings and type hints in scheduling_ddpm_flax.py  by @delmalih in #13024\r\n* Improve docstrings and type hints in scheduling_ddpm_parallel.py  by @delmalih in #13027\r\n* Remove `*pooled_*` mentions from Chroma inpaint  by @hameerabbasi in #13026\r\n* Flag Flax schedulers as deprecated  by @delmalih in #13031\r\n* [modular] add auto_docstring & more doc related refactors   by @yiyixuxu in #12958\r\n* Upgrade GitHub Actions to latest versions  by @salmanmkc in #12866\r\n* [From Single File] support `from_single_file` method for `WanAnimateTransformer3DModel`  by @samadwar in #12691\r\n* Fix: Cosmos2.5 Video2World frame extraction and add default negative prompt  by @adi776borate in #13018\r\n* [GLM-Image] Add batch support for GlmImagePipeline  by @JaredforReal in #13007\r\n* [Qwen] avoid creating attention masks when there is no padding  by @kashif in #12987\r\n* [modular]support klein  by @yiyixuxu in #13002\r\n* [QwenImage] fix prompt isolation tests  by @sayakpaul in #13042\r\n* fast tok update  by @itazap in #13036\r\n* change to CUDA 12.9.  by @sayakpaul in #13045\r\n* remove torchao autoquant from diffusers docs  by @vkuzo in #13048\r\n* docs: improve docstring scheduling_dpm_cogvideox.py  by @delmalih in #13044\r\n* Fix Wan/WanI2V patchification  by @Jayce-Ping in #13038\r\n* LTX2 distilled checkpoint support  by @rootonchair in #12934\r\n* [wan] fix layerwise upcasting tests on CPU  by @sayakpaul in #13039\r\n* [ci] uniform run times and wheels for pytorch cuda.  by @sayakpaul in #13047\r\n* docs: fix grammar in fp16_safetensors CLI warning  by @Olexandr88 in #13040\r\n* [wan] fix wan 2.2 when either of the transformers isn't present.  by @sayakpaul in #13055\r\n* [bug fix] GLM-Image fit new `get_image_features` API  by @JaredforReal in #13052\r\n* Fix aiter availability check  by @lauri9 in #13059\r\n* [Modular]add a real quick start guide  by @yiyixuxu in #13029\r\n* feat: support Ulysses Anything Attention  by @DefTruth in #12996\r\n* Refactor Model Tests  by @DN6 in #12822\r\n* [Flux2] Fix LoRA loading for Flux2 Klein by adaptively enumerating transformer blocks  by @songkey in #13030\r\n* [Modular] loader related  by @yiyixuxu in #13025\r\n* [Modular] mellon doc etc  by @yiyixuxu in #13051\r\n* [modular] change the template modular pipeline card  by @sayakpaul in #13072\r\n* Add support for Magcache   by @AlanPonnachan in #12744\r\n* [docs] Fix syntax error in quantization configuration  by @sayakpaul in #13076\r\n* docs: improve docstring scheduling_dpmsolver_multistep_inverse.py  by @delmalih in #13083\r\n* [core] make flux hidden states contiguous  by @sayakpaul in #13068\r\n* [core] make qwen hidden states contiguous to make torchao happy.  by @sayakpaul in #13081\r\n* Feature/zimage inpaint pipeline  by @CalamitousFelicitousness in #13006\r\n* GGUF fix for unquantized types when using unquantize kernels  by @dxqb in #12498\r\n* docs: improve docstring scheduling_dpmsolver_multistep_inverse.py  by @delmalih in #13085\r\n* [modular]simplify components manager doc  by @yiyixuxu in #13088\r\n* ZImageControlNet cfg  by @hlky in #13080\r\n* [Modular] refactor Wan: modular pipelines by task etc  by @yiyixuxu in #13063\r\n* [Modular] guard `ModularPipeline.blocks` attribute  by @yiyixuxu in #13014\r\n* LTX 2 Improve `encode_video` by Accepting More Input Types  by @dg845 in #13057\r\n* Z image lora training  by @linoytsaban in #13056\r\n* [modular] add modular tests for Z-Image and Wan  by @sayakpaul in #13078\r\n* [Docs] Add guide for AutoModel with custom code  by @DN6 in #13099\r\n* [SkyReelsV2] Fix ftfy import  by @asomoza in #13113\r\n* [lora] fix non-diffusers lora key handling for flux2  by @sayakpaul in #13119\r\n* [CI] Refactor Wan Model Tests  by @DN6 in #13082\r\n* docs: improve docstring scheduling_edm_dpmsolver_multistep.py  by @delmalih in #13122\r\n* [Fix]Allow `prompt` and `prior_token_ids` to be provided simultaneously in `GlmImagePipeline`  by @JaredforReal in #13092\r\n* docs: improve docstring scheduling_flow_match_euler_discrete.py  by @delmalih in #13127\r\n* Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge}  by @miguelmartin75 in #13066\r\n* [modular] add tests for robust model loading.  by @sayakpaul in #13120\r\n* Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled  by @dg845 in #13121\r\n* [CI] Fix `setuptools` `pkg_resources` Errors  by @dg845 in #13129\r\n* docs: improve docstring scheduling_flow_match_heun_discrete.py  by @delmalih in #13130\r\n* [CI] Fix `setuptools` `pkg_resources` Bug for PR GPU Tests  by @dg845 in #13132\r\n* fix cosmos transformer typing.  by @sayakpaul in #13134\r\n* Sunset Python 3.8 & get rid of explicit `typing` exports where possible  by @sayakpaul in #12524\r\n* feat: implement apply_lora_scale to remove boilerplate.  by @sayakpaul in #12994\r\n* [docs] fix ltx2 i2v docstring.  by @sayakpaul in #13135\r\n* [Modular] add different pipeine blocks to init  by @yiyixuxu in #13145\r\n* fix MT5Tokenizer  by @yiyixuxu in #13146\r\n* fix guider   by @yiyixuxu in #13147\r\n* [Modular] update doc for `ModularPipeline`  by @yiyixuxu in #13100\r\n* [Modular] add explicit workflow support   by @yiyixuxu in #13028\r\n* [LTX2] Fix wrong lora mixin  by @asomoza in #13144\r\n* [Pipelines] Remove k-diffusion  by @DN6 in #13152\r\n* [tests] accept recompile_limit from the user in tests  by @sayakpaul in #13150\r\n* [core] support device type device_maps to work with offloading.  by @sayakpaul in #12811\r\n* [Bug] Fix QwenImageEditPlus Series on NPU  by @zhangtao0408 in #13017\r\n* [CI] Add ftfy as a test dependency  by @DN6 in #13155\r\n* docs: improve docstring scheduling_flow_match_lcm.py  by @delmalih in #13160\r\n* [docs] add docs for qwenimagelayered  by @stevhliu in #13158\r\n* Flux2: Tensor tuples can cause issues for checkpointing  by @dxqb in #12777\r\n* [CI] Revert `setuptools` CI Fix as the Failing Pipelines are Deprecated  by @dg845 in #13149\r\n* Fix `ftfy` import for PRX Pipeline  by @dg845 in #13154\r\n* [core] Enable CP for kernels-based attention backends  by @sayakpaul in #12812\r\n* remove deps related to test from ci  by @sayakpaul in #13164\r\n* [CI] Fix new LoRAHotswap tests  by @DN6 in #13163\r\n* [gguf][torch.compile time] Convert to plain tensor earlier in dequantize_gguf_tensor  by @anijain2305 in #13166\r\n* Support Flux Klein peft (fal) lora format  by @asomoza in #13169\r\n* Fix T5GemmaEncoder loading for transformers 5.x composite T5GemmaConfig  by @DavidBert in #13143\r\n* Allow Automodel to use  `from_config` with custom code.  by @DN6 in #13123\r\n* Fix AutoModel `typing` Import Error  by @dg845 in #13178\r\n* migrate to `transformers` v5  by @sayakpaul in #12976\r\n* fix: graceful fallback when attention backends fail to import  by @sym-bot in #13060\r\n* [docs] Fix torchrun command argument order in docs  by @sayakpaul in #13181\r\n* [attention backends] use dedicated wrappers from fa3 for cp.  by @sayakpaul in #13165\r\n* Cosmos Transfer2.5 Auto-Regressive Inference Pipeline  by @miguelmartin75 in #13114\r\n* Fix wrong `do_classifier_free_guidance` threshold in ZImagePipeline  by @kirillsst in #13183\r\n* Fix Flash Attention 3 interface for new FA3 return format  by @veeceey in #13173\r\n* Fix LTX-2 image-to-video generation failure in two stages generation  by @Songrui625 in #13187\r\n* Fixing Kohya loras loading: Flux.1-dev loras with TE (\"lora_te1_\" prefix)  by @christopher5106 in #13188\r\n* [Modular] update the auto pipeline blocks doc  by @yiyixuxu in #13148\r\n* [tests] consistency tests for modular index  by @sayakpaul in #13192\r\n* [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline  by @yiyixuxu in #13193\r\n* [chore] updates in the pypi publication workflow.  by @sayakpaul in #12805\r\n* [tests] enable cpu offload test in torchao without compilation.  by @sayakpaul in #12704\r\n* remove db utils from benchmarking  by @sayakpaul in #13199\r\n* [AutoModel] Fix bug with subfolders and local model paths when loading custom code  by @DN6 in #13197\r\n* [AutoModel] Allow registering `auto_map` to model config  by @DN6 in #13186\r\n* [Modular] Save Modular Pipeline weights to Hub  by @DN6 in #13168\r\n* docs: improve docstring scheduling_ipndm.py  by @delmalih in #13198\r\n* Clean up accidental files  by @DN6 in #13202\r\n* [modular]Update model card to include workflow  by @yiyixuxu in #13195\r\n* [modular] not pass trust_remote_code to external repos   by @yiyixuxu in #13204\r\n* [Modular] implement requirements validation for custom blocks  by @sayakpaul in #12196\r\n* cogvideo example: Distribute VAE video encoding across processes in CogVideoX LoRA training  by @jiqing-feng in #13207\r\n* Fix group-offloading bug  by @SHYuanBest in #13211\r\n* Add Helios-14B Video Generation Pipelines  by @dg845 in #13208\r\n* [Z-Image] Fix more `do_classifier_free_guidance` thresholds  by @asomoza in #13212\r\n* [lora] fix zimage lora conversion to support for more lora.  by @sayakpaul in #13209\r\n* adding lora support to z-image controlnet pipelines  by @christopher5106 in #13200\r\n* Add LTX2 Condition Pipeline  by @dg845 in #13058\r\n* Fix Helios paper link in documentation  by @SHYuanBest in #13213\r\n* [attention backends] change to updated repo and version.  by @sayakpaul in #13161\r\n* feat: implement rae autoencoder.  by @Ando233 in #13046\r\n* Release: v0.37.0-release by @sayakpaul (direct commit on v0.37.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @delmalih\r\n    * Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)\r\n    * Improve docstrings and type hints in scheduling_edm_euler.py (#12871)\r\n    * Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)\r\n    * Improve docstrings and type hints in scheduling_consistency_models.py (#12931)\r\n    * Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)\r\n    * [Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)\r\n    * Improve docstrings and type hints in scheduling_ddim_cogvideox.py (#12992)\r\n    * Improve docstrings and type hints in scheduling_ddim_flax.py (#13010)\r\n    * Improve docstrings and type hints in scheduling_ddim_inverse.py (#13020)\r\n    * Improve docstrings and type hints in scheduling_ddim_parallel.py (#13023)\r\n    * Improve docstrings and type hints in scheduling_ddpm_flax.py (#13024)\r\n    * Improve docstrings and type hints in scheduling_ddpm_parallel.py (#13027)\r\n    * Flag Flax schedulers as deprecated (#13031)\r\n    * docs: improve docstring scheduling_dpm_cogvideox.py (#13044)\r\n    * docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13083)\r\n    * docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13085)\r\n    * docs: improve docstring scheduling_edm_dpmsolver_multistep.py (#13122)\r\n    * docs: improve docstring scheduling_flow_match_euler_discrete.py (#13127)\r\n    * docs: improve docstring scheduling_flow_match_heun_discrete.py (#13130)\r\n    * docs: improve docstring scheduling_flow_match_lcm.py (#13160)\r\n    * docs: improve docstring scheduling_ipndm.py (#13198)\r\n* @yiyixuxu\r\n    * [Modular]z-image (#12808)\r\n    * more update in modular  (#12560)\r\n    * [Modular] qwen refactor (#12872)\r\n    * [Modular] better docstring (#12932)\r\n    * [Modular] mellon utils (#12978)\r\n    * Flux2 klein (#12982)\r\n    * [modular] fix a bug in mellon param & improve docstrings (#12980)\r\n    * [modular] add auto_docstring & more doc related refactors  (#12958)\r\n    * [modular]support klein (#13002)\r\n    * [Modular]add a real quick start guide (#13029)\r\n    * [Modular] loader related (#13025)\r\n    * [Modular] mellon doc etc (#13051)\r\n    * [modular]simplify components manager doc (#13088)\r\n    * [Modular] refactor Wan: modular pipelines by task etc (#13063)\r\n    * [Modular] guard `ModularPipeline.blocks` attribute (#13014)\r\n    * [Modular] add different pipeine blocks to init (#13145)\r\n    * fix MT5Tokenizer (#13146)\r\n    * fix guider  (#13147)\r\n    * [Modular] update doc for `ModularPipeline` (#13100)\r\n    * [Modular] add explicit workflow support  (#13028)\r\n    * [Modular] update the auto pipeline blocks doc (#13148)\r\n    * [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline (#13193)\r\n    * [modular]Update model card to include workflow (#13195)\r\n    * [modular] not pass trust_remote_code to external repos  (#13204)\r\n* @sayakpaul\r\n    * Fix Qwen Edit Plus modular for multi-image input (#12601)\r\n    * [docs] improve distributed inference cp docs. (#12810)\r\n    * post release 0.36.0 (#12804)\r\n    * Update distributed_inference.md to correct syntax (#12827)\r\n    * [lora] Remove lora docs unneeded and add \" # Copied from ...\" (#12824)\r\n    * fix the use of device_map in CP docs (#12902)\r\n    * [core] remove unneeded autoencoder methods when subclassing from `AutoencoderMixin` (#12873)\r\n    * [docs] fix torchao typo. (#12883)\r\n    * Update wan.md to remove unneeded hfoptions (#12890)\r\n    * [modular] Tests for custom blocks in modular diffusers (#12557)\r\n    * [chore] remove controlnet implementations outside controlnet module. (#12152)\r\n    * [core] Handle progress bar and logging in distributed environments (#12806)\r\n    * [modular] error early in `enable_auto_cpu_offload` (#12578)\r\n    * fix how `is_fsdp` is determined (#12960)\r\n    * [LoRA] add LoRA support to LTX-2 (#12933)\r\n    * [docs] polish caching docs. (#12684)\r\n    * Z rz rz rz rz rz rz r cogview (#12973)\r\n    * Update distributed_inference.md to reposition sections (#12971)\r\n    * [chore] make transformers version check stricter for glm image. (#12974)\r\n    * add klein docs. (#12984)\r\n    * [core] gracefully error out when attn-backend x cp combo isn't supported. (#12832)\r\n    * make style && make quality\r\n    * Revert \"make style && make quality\"\r\n    * [chore] make style to push new changes. (#12998)\r\n    * fix Dockerfiles for cuda and xformers. (#13022)\r\n    * [QwenImage] fix prompt isolation tests (#13042)\r\n    * change to CUDA 12.9. (#13045)\r\n    * [wan] fix layerwise upcasting tests on CPU (#13039)\r\n    * [ci] uniform run times and wheels for pytorch cuda. (#13047)\r\n    * [wan] fix wan 2.2 when either of the transformers isn't present. (#13055)\r\n    * [modular] change the template modular pipeline card (#13072)\r\n    * [docs] Fix syntax error in quantization configuration (#13076)\r\n    * [core] make flux hidden states contiguous (#13068)\r\n    * [core] make qwen hidden states contiguous to make torchao happy. (#13081)\r\n    * [modular] add modular tests for Z-Image and Wan (#13078)\r\n    * [lora] fix non-diffusers lora key handling for flux2 (#13119)\r\n    * [modular] add tests for robust model loading. (#13120)\r\n    * fix cosmos transformer typing. (#13134)\r\n    * Sunset Python 3.8 & get rid of explicit `typing` exports where possible (#12524)\r\n    * feat: implement apply_lora_scale to remove boilerplate. (#12994)\r\n    * [docs] fix ltx2 i2v docstring. (#13135)\r\n    * [tests] accept recompile_limit from the user in tests (#13150)\r\n    * [core] support device type device_maps to work with offloading. (#12811)\r\n    * [core] Enable CP for kernels-based attention backends (#12812)\r\n    * remove deps related to test from ci (#13164)\r\n    * migrate to `transformers` v5 (#12976)\r\n    * [docs] Fix torchrun command argument order in docs (#13181)\r\n    * [attention backends] use dedicated wrappers from fa3 for cp. (#13165)\r\n    * [tests] consistency tests for modular index (#13192)\r\n    * [chore] updates in the pypi publication workflow. (#12805)\r\n    * [tests] enable cpu offload test in torchao without compilation. (#12704)\r\n    * remove db utils from benchmarking (#13199)\r\n    * [Modular] implement requirements validation for custom blocks (#12196)\r\n    * [lora] fix zimage lora conversion to support for more lora. (#13209)\r\n    * [attention backends] change to updated repo and version. (#13161)\r\n    * Release: v0.37.0-release\r\n* @DN6\r\n    * [WIP] Add Flux2 modular (#12763)\r\n    * Refactor Model Tests (#12822)\r\n    * [Docs] Add guide for AutoModel with custom code (#13099)\r\n    * [CI] Refactor Wan Model Tests (#13082)\r\n    * [Pipelines] Remove k-diffusion (#13152)\r\n    * [CI] Add ftfy as a test dependency (#13155)\r\n    * [CI] Fix new LoRAHotswap tests (#13163)\r\n    * Allow Automodel to use  `from_config` with custom code. (#13123)\r\n    * [AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)\r\n    * [AutoModel] Allow registering `auto_map` to model config (#13186)\r\n    * [Modular] Save Modular Pipeline weights to Hub (#13168)\r\n    * Clean up accidental files (#13202)\r\n* @naykun\r\n    * [qwen-image] edit 2511 support (#12839)\r\n    * Qwen Image Layered Support (#12853)\r\n* @junqiangwu\r\n    * Add support for LongCat-Image (#12828)\r\n    * fix the prefix_token_len bug (#12845)\r\n* @hlky\r\n    * Z-Image-Turbo ControlNet (#12792)\r\n    * Z-Image-Turbo from_single_file fix (#12888)\r\n    * Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)\r\n    * `disable_mmap` in pipeline `from_pretrained` (#12854)\r\n    * ZImageControlNet cfg (#13080)\r\n* @miguelmartin75\r\n    * Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)\r\n    * Cosmos Predict2.5 14b Conversion (#12863)\r\n    * Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914)\r\n    * Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)\r\n    * Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)\r\n* @RuoyiDu\r\n    * Add z-image-omni-base implementation (#12857)\r\n* @r4inm4ker\r\n    * Community Pipeline: Add z-image differential img2img (#12882)\r\n* @yaoqih\r\n    * LTX Video 0.9.8  long multi prompt (#12614)\r\n* @dg845\r\n    * Add LTX 2.0 Video Pipelines (#12915)\r\n    * Add Flag to `PeftLoraLoaderMixinTests` to Enable/Disable Text Encoder LoRA Tests (#12962)\r\n    * LTX 2 Single File Support (#12983)\r\n    * LTX 2 Improve `encode_video` by Accepting More Input Types (#13057)\r\n    * Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled (#13121)\r\n    * [CI] Fix `setuptools` `pkg_resources` Errors (#13129)\r\n    * [CI] Fix `setuptools` `pkg_resources` Bug for PR GPU Tests (#13132)\r\n    * [CI] Revert `setuptools` CI Fix as the Failing Pipelines are Deprecated (#13149)\r\n    * Fix `ftfy` import for PRX Pipeline (#13154)\r\n    * Fix AutoModel `typing` Import Error (#13178)\r\n    * Add Helios-14B Video Generation Pipelines (#13208)\r\n    * Add LTX2 Condition Pipeline (#13058)\r\n* @kashif\r\n    * [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)\r\n    *  Fix QwenImage txt_seq_lens handling (#12702)\r\n    * [Qwen] avoid creating attention masks when there is no padding (#12987)\r\n* @bhavya01\r\n    * Change timestep device to cpu for xla (#11501)\r\n* @linoytsaban\r\n    * [LoRA] add lora_alpha to sana README (#11780)\r\n    * Z image lora training (#13056)\r\n* @stevhliu\r\n    * [docs] Remote inference (#12372)\r\n    * [docs] add docs for qwenimagelayered (#13158)\r\n* @hameerabbasi\r\n    * Add `ChromaInpaintPipeline` (#12848)\r\n    * Remove `*pooled_*` mentions from Chroma inpaint (#13026)\r\n* @galbria\r\n    * Fibo edit pipeline (#12930)\r\n* @JaredforReal\r\n    * [GLM-Image] Add batch support for GlmImagePipeline (#13007)\r\n    * [bug fix] GLM-Image fit new `get_image_features` API (#13052)\r\n    * [Fix]Allow `prompt` and `prior_token_ids` to be provided simultaneously in `GlmImagePipeline` (#13092)\r\n* @rootonchair\r\n    * LTX2 distilled checkpoint support (#12934)\r\n* @AlanPonnachan\r\n    * Add support for Magcache  (#12744)\r\n* @CalamitousFelicitousness\r\n    * Feature/zimage inpaint pipeline (#13006)\r\n* @Ando233\r\n    * feat: implement rae autoencoder. (#13046)\r\n","publishedAt":"2026-03-05T15:05:21.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.37.0","media":[]},{"id":"rel_OHA3BZv5YFYfyO3e9eTZD","version":"v0.36.0","title":"Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄","summary":"The release features a number of new image and video pipelines, a new caching method, a new training script, new `kernels` - powered attention backend...","content":"The release features a number of new image and video pipelines, a new caching method, a new training script, new `kernels` - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀\r\n\r\n## New image pipelines\r\n\r\n- [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2): Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.\r\n- [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/z_image): Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in [https://github.com/huggingface/diffusers/pull/12703](https://github.com/huggingface/diffusers/pull/12703).\r\n- [QwenImage Edit Plus](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage): It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in https://github.com/huggingface/diffusers/issues/12357.\r\n- [Bria FIBO:](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo) FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in [https://github.com/huggingface/diffusers/pull/12545](https://github.com/huggingface/diffusers/pull/12545).\r\n- [Kandinsky Image Lite](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_image): Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12664](https://github.com/huggingface/diffusers/pull/12664).\r\n- [ChronoEdit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/chronoedit): ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in https://github.com/huggingface/diffusers/pull/12593.\r\n\r\n## New video pipelines\r\n\r\n- [Sana-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_video): Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in [https://github.com/huggingface/diffusers/pull/12634](https://github.com/huggingface/diffusers/pull/12634).\r\n- [Kandinsky 5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_video): Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12478](https://github.com/huggingface/diffusers/pull/12478).\r\n- [Hunyuan 1.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video15): HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.\r\n- [Wan Animate](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#wan-animate-unified-character-animation-and-replacement-with-holistic-replication): Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.\r\n\r\n## New `kernels`-powered attention backends\r\n\r\nThe `kernels` [library](https://github.com/huggingface/kernels) helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new `kernels`-powered attention backends:\r\n\r\n- Flash Attention 3 (+ its `varlen` variant)\r\n- Flash Attention 2 (+ its `varlen` variant)\r\n- SAGE\r\n\r\nThis means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:\r\n\r\n```python\r\n# Make sure you have `kernels` installed: `pip install kernels`.\r\n# You can choose `flash_hub` or `sage_hub`, too.\r\npipe.transformer.set_attention_backend(\"_flash_3_hub\")\r\n```\r\n\r\nFor more details, check out the [documentation](https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends). \r\n\r\n## TaylorSeer cache\r\n\r\nTaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in [https://github.com/huggingface/diffusers/pull/12648](https://github.com/huggingface/diffusers/pull/12648). Check out the documentation [here](https://huggingface.co/docs/diffusers/main/en/optimization/cache#taylorseer-cache).\r\n\r\n## New training script\r\n\r\nOur Flux.2 integration features a LoRA fine-tuning script that you can check out [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux2.md). We provide a number of optimizations to help make it run on consumer GPUs.\r\n\r\n## Misc\r\n\r\n- Reusing `AttentionMixin`: Making certain compatible models subclass from the `AttentionMixin` class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out https://github.com/huggingface/diffusers/pull/12463 for more details.\r\n- Diffusers backend in SGLang: https://github.com/sgl-project/sglang/pull/14112.\r\n- We started the [Diffusers MVP program](https://github.com/huggingface/diffusers/issues/12635) to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.\r\n\r\n## All commits\r\n\r\n* remove unneeded checkpoint imports.  by @sayakpaul in #12488\r\n* [tests] fix clapconfig for text backbone in audioldm2  by @sayakpaul in #12490\r\n* ltx0.9.8 (without IC lora,  autoregressive sampling)  by @yiyixuxu in #12493\r\n* [docs] Attention checks  by @stevhliu in #12486\r\n* [CI] Check links  by @stevhliu in #12491\r\n* [ci] xfail more incorrect transformer imports.  by @sayakpaul in #12455\r\n* [tests] introduce `VAETesterMixin` to consolidate tests for slicing and tiling  by @sayakpaul in #12374\r\n* docs: cleanup of runway model  by @EazyAl in #12503\r\n* Kandinsky 5 is finally in Diffusers!  by @leffff in #12478\r\n* Remove Qwen Image Redundant RoPE Cache  by @dg845 in #12452\r\n* Raise warning instead of error when imports are missing for custom code   by @DN6 in #12513\r\n* Fix: Use incorrect temporary variable key when replacing adapter name…  by @FeiXie8 in #12502\r\n* [docs] Organize toctree by modality  by @stevhliu in #12514\r\n* styling issues.  by @sayakpaul in #12522\r\n* Add Photon model and pipeline support  by @DavidBert in #12456\r\n* purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet  by @Vaibhavs10 in #12497\r\n* Prx  by @DavidBert in #12525\r\n* [core] `AutoencoderMixin` to abstract common methods  by @sayakpaul in #12473\r\n* Kandinsky5 No cfg fix  by @asomoza in #12527\r\n* Fix: Add _skip_keys for AutoencoderKLWan  by @yiyixuxu in #12523\r\n* [CI] xfail the test_wuerstchen_prior test  by @sayakpaul in #12530\r\n* [tests] Test attention backends  by @sayakpaul in #12388\r\n* fix CI bug for kandinsky3_img2img case  by @kaixuanliu in #12474\r\n* Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432  by @Aishwarya0811 in #12449\r\n* Handle deprecated transformer classes  by @DN6 in #12517\r\n* fix constants.py to user `upper()`  by @sayakpaul in #12479\r\n* HunyuanImage21  by @yiyixuxu in #12333\r\n* Loose the criteria tolerance appropriately for Intel XPU devices  by @kaixuanliu in #12460\r\n* Deprecate Stable Cascade  by @DN6 in #12537\r\n* [chore] Move guiders experimental warning  by @sayakpaul in #12543\r\n* Fix Chroma attention padding order and update docs to use `lodestones/Chroma1-HD`  by @josephrocca in #12508\r\n* Add AITER attention backend  by @lauri9 in #12549\r\n* Fix small inconsistency in output dimension of \"_get_t5_prompt_embeds\" function in sd3 pipeline  by @alirezafarashah in #12531\r\n* Kandinsky 5 10 sec (NABLA suport)  by @leffff in #12520\r\n* Improve pos embed for Flux.1 inference on Ascend NPU  by @gameofdimension in #12534\r\n* support latest few-step wan LoRA.  by @sayakpaul in #12541\r\n* [Pipelines] Enable Wan VACE to run since single transformer  by @DN6 in #12428\r\n* fix crash if tiling mode is enabled  by @sywangyi in #12521\r\n* Fix typos in kandinsky5 docs  by @Meatfucker in #12552\r\n* [ci] don't run sana layerwise casting tests in CI.  by @sayakpaul in #12551\r\n* Bria fibo  by @galbria in #12545\r\n* Avoiding graph break by changing the way we infer dtype in vae.decoder  by @ppadjinTT in #12512\r\n* [Modular] Fix for custom block kwargs  by @DN6 in #12561\r\n* [Modular] Allow custom blocks to be saved to `local_dir`  by @DN6 in #12381\r\n* Fix Stable Diffusion 3.x pooled prompt embedding with multiple images  by @friedrich in #12306\r\n* Fix custom code loading in Automodel  by @DN6 in #12571\r\n* [modular] better warn message  by @yiyixuxu in #12573\r\n* [tests] add tests for flux modular (t2i, i2i, kontext)  by @sayakpaul in #12566\r\n* [modular]pass hub_kwargs to load_config  by @yiyixuxu in #12577\r\n* ulysses enabling in native attention path  by @sywangyi in #12563\r\n* Kandinsky 5.0 Docs fixes  by @leffff in #12582\r\n* [docs] sort doc  by @sayakpaul in #12586\r\n* [LoRA] add support for more Qwen LoRAs  by @linoytsaban in #12581\r\n* [Modular] Allow ModularPipeline to load from revisions  by @DN6 in #12592\r\n* Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py  by @turian in #12596\r\n* [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference  by @lawrence-cj in #12584\r\n* Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch)  by @MohammadSadeghSalehi in #12546\r\n* [Modular] Some clean up for Modular tests  by @DN6 in #12579\r\n* feat: enable attention dispatch for huanyuan video  by @DefTruth in #12591\r\n* fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled  by @sywangyi in #12562\r\n* [CI] Push test fix  by @DN6 in #12617\r\n* add ChronoEdit  by @zhangjiewu in #12593\r\n* [modular] wan!  by @yiyixuxu in #12611\r\n* [CI] Fix typo in uv install  by @DN6 in #12618\r\n* fix: correct import path for load_model_dict_into_meta in conversion scripts  by @yashwantbezawada in #12616\r\n* Fix Context Parallel validation checks  by @DN6 in #12446\r\n* [Modular] Clean up docs  by @DN6 in #12604\r\n* Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples  by @cesaryuan in #12544\r\n* [CI] Remove unittest dependency from `testing_utils.py`  by @DN6 in #12621\r\n* Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers  by @charchit7 in #12594\r\n* fix copies  by @yiyixuxu in #12637\r\n* Add MLU Support.  by @a120092009 in #12629\r\n* fix dispatch_attention_fn check  by @yiyixuxu in #12636\r\n* [modular] add tests for qwen modular  by @sayakpaul in #12585\r\n* ArXiv -> HF Papers  by @qgallouedec in #12583\r\n* [docs] Update install instructions  by @stevhliu in #12626\r\n* [modular] add a check   by @yiyixuxu in #12628\r\n* Improve docstrings and type hints in scheduling_amused.py  by @delmalih in #12623\r\n* [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz)  by @dg845 in #12526\r\n* adjust unit tests for `test_save_load_float16`  by @kaixuanliu in #12500\r\n* skip autoencoderdl layerwise casting memory  by @sayakpaul in #12647\r\n* [utils] Update check_doc_toc  by @stevhliu in #12642\r\n* [docs] AutoModel  by @stevhliu in #12644\r\n* Improve docstrings and type hints in scheduling_ddim.py  by @delmalih in #12622\r\n* Improve docstrings and type hints in scheduling_ddpm.py  by @delmalih in #12651\r\n* [Modular] Add Custom Blocks guide to doc  by @DN6 in #12339\r\n* Improve docstrings and type hints in scheduling_euler_discrete.py  by @delmalih in #12654\r\n* Update Wan Animate Docs  by @dg845 in #12658\r\n* Rope in float32 for mps or npu compatibility  by @DavidBert in #12665\r\n* [PRX pipeline]: add 1024 resolution ratio bins  by @DavidBert in #12670\r\n* SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support  by @lawrence-cj in #12634\r\n* [CI] Make CI logs less verbose   by @DN6 in #12674\r\n* Revert `AutoencoderKLWan`'s `dim_mult` default value back to list  by @dg845 in #12640\r\n* [CI] Temporarily pin transformers  by @DN6 in #12677\r\n* [core] Refactor hub attn kernels  by @sayakpaul in #12475\r\n* [CI] Fix indentation issue in workflow files  by @DN6 in #12685\r\n* [CI] Fix failing Pipeline CPU tests  by @DN6 in #12681\r\n* Improve docstrings and type hints in scheduling_pndm.py  by @delmalih in #12676\r\n* Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet  by @pratim4dasude in #12649\r\n* Improve docstrings and type hints in scheduling_lms_discrete.py  by @delmalih in #12678\r\n* Add FluxLoraLoaderMixin to Fibo pipeline  by @SwayStar123 in #12688\r\n* bugfix: fix chrono-edit context parallel  by @DefTruth in #12660\r\n* [core] support sage attention + FA2 through `kernels`  by @sayakpaul in #12439\r\n* [i8n-pt] Fix grammar and expand Portuguese documentation  by @cdutr in #12598\r\n* Fix variable naming typos in community FluxControlNetFillInpaintPipeline  by @sqhuang in #12701\r\n* fix typo in docs  by @lawrence-cj in #12675\r\n* Add Support for Z-Image Series  by @JerryWu-code in #12703\r\n* let's go Flux2 🚀  by @sayakpaul in #12711\r\n* Update script names in README for Flux2 training  by @anvilarth in #12713\r\n* [lora]: Fix Flux2 LoRA NaN test  by @sayakpaul in #12714\r\n* [docs] Correct flux2 links  by @sayakpaul in #12716\r\n* [docs] put autopipeline after overview and hunyuanimage in images  by @sayakpaul in #12548\r\n* Improve docstrings and type hints in scheduling_dpmsolver_multistep.py  by @delmalih in #12710\r\n* Support unittest for Z-image ⚡️  by @JerryWu-code in #12715\r\n* [chore] remove torch.save from remnant code.  by @sayakpaul in #12717\r\n* Enable regional compilation on z-image transformer model  by @sayakpaul in #12736\r\n* Fix examples not loading LoRA adapter weights from checkpoint  by @SurAyush in #12690\r\n* [Modular] Add single file support to Modular  by @DN6 in #12383\r\n* fix type-check for z-image transformer  by @DefTruth in #12739\r\n* Hunyuanvideo15  by @yiyixuxu in #12696\r\n* [Docs] Update Imagen Video paper link in schedulers  by @delmalih in #12724\r\n* Improve docstrings and type hints in scheduling_heun_discrete.py  by @delmalih in #12726\r\n* Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py  by @delmalih in #12766\r\n* fix FLUX.2 context parallel  by @DefTruth in #12737\r\n* Rename BriaPipeline to BriaFiboPipeline in documentation  by @galbria in #12758\r\n* Update bria_fibo.md with minor fixes  by @sayakpaul in #12731\r\n* [feat]: implement \"local\" caption upsampling for Flux.2  by @sayakpaul in #12718\r\n* Add ZImage LoRA support and integrate into ZImagePipeline  by @CalamitousFelicitousness in #12750\r\n* Add support for Ovis-Image  by @DoctorKey in #12740\r\n* Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim.  by @JerryWu-code in #12770\r\n* Fixes #12673. `record_stream` in group offloading is not working properly  by @KimbingNg in #12721\r\n* [core] start varlen variants for attn backend kernels.  by @sayakpaul in #12765\r\n* [core] reuse `AttentionMixin` for compatible classes  by @sayakpaul in #12463\r\n* Deprecate `upcast_vae` in SDXL based pipelines  by @DN6 in #12619\r\n* Kandinsky 5.0 Video Pro and Image Lite  by @leffff in #12664\r\n* Fix: leaf_level offloading breaks after delete_adapters  by @adi776borate in #12639\r\n* [tests] fix hunuyanvideo 1.5 offloading tests.  by @sayakpaul in #12782\r\n* [Z-Image] various small changes, Z-Image transformer tests, etc.  by @sayakpaul in #12741\r\n* Z-Image-Turbo `from_single_file`  by @hlky in #12756\r\n* Update attention_backends.md to format kernels  by @sayakpaul in #12757\r\n* Improve docstrings and type hints in scheduling_unipc_multistep.py  by @delmalih in #12767\r\n* fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode  by @jerry2102 in #12753\r\n* [lora] support more ZImage LoRAs  by @sayakpaul in #12790\r\n* PRX Set downscale_freq_shift to 0 for consistency with internal implementation  by @DavidBert in #12791\r\n* Fix broken group offloading with block_level for models with standalone layers  by @rycerzes in #12692\r\n* [Docs] Add Z-Image docs  by @asomoza in #12775\r\n* move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)\r\n* [docs] minor fixes to kandinsky docs  by @sayakpaul in #12797\r\n* Improve docstrings and type hints in scheduling_deis_multistep.py  by @delmalih in #12796\r\n* [Feat] TaylorSeer Cache  by @toilaluan in #12648\r\n* Update the TensorRT-ModelOPT to Nvidia-ModelOPT  by @jingyu-ml in #12793\r\n* add post init for safty checker  by @jiqing-feng in #12794\r\n* [HunyuanVideo1.5] support step-distilled  by @yiyixuxu in #12802\r\n* Add ZImageImg2ImgPipeline  by @CalamitousFelicitousness in #12751\r\n* Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @yiyixuxu\r\n    * ltx0.9.8 (without IC lora,  autoregressive sampling) (#12493)\r\n    * Fix: Add _skip_keys for AutoencoderKLWan (#12523)\r\n    * HunyuanImage21 (#12333)\r\n    * [modular] better warn message (#12573)\r\n    * [modular]pass hub_kwargs to load_config (#12577)\r\n    * [modular] wan! (#12611)\r\n    * fix copies (#12637)\r\n    * fix dispatch_attention_fn check (#12636)\r\n    * [modular] add a check  (#12628)\r\n    * Hunyuanvideo15 (#12696)\r\n    * [HunyuanVideo1.5] support step-distilled (#12802)\r\n* @leffff\r\n    * Kandinsky 5 is finally in Diffusers! (#12478)\r\n    * Kandinsky 5 10 sec (NABLA suport) (#12520)\r\n    * Kandinsky 5.0 Docs fixes (#12582)\r\n    * Kandinsky 5.0 Video Pro and Image Lite (#12664)\r\n* @dg845\r\n    * Remove Qwen Image Redundant RoPE Cache (#12452)\r\n    * [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)\r\n    * Update Wan Animate Docs (#12658)\r\n    * Revert `AutoencoderKLWan`'s `dim_mult` default value back to list (#12640)\r\n* @DN6\r\n    * Raise warning instead of error when imports are missing for custom code  (#12513)\r\n    * Handle deprecated transformer classes (#12517)\r\n    * Deprecate Stable Cascade (#12537)\r\n    * [Pipelines] Enable Wan VACE to run since single transformer (#12428)\r\n    * [Modular] Fix for custom block kwargs (#12561)\r\n    * [Modular] Allow custom blocks to be saved to `local_dir` (#12381)\r\n    * Fix custom code loading in Automodel (#12571)\r\n    * [Modular] Allow ModularPipeline to load from revisions (#12592)\r\n    * [Modular] Some clean up for Modular tests (#12579)\r\n    * [CI] Push test fix (#12617)\r\n    * [CI] Fix typo in uv install (#12618)\r\n    * Fix Context Parallel validation checks (#12446)\r\n    * [Modular] Clean up docs (#12604)\r\n    * [CI] Remove unittest dependency from `testing_utils.py` (#12621)\r\n    * [Modular] Add Custom Blocks guide to doc (#12339)\r\n    * [CI] Make CI logs less verbose  (#12674)\r\n    * [CI] Temporarily pin transformers (#12677)\r\n    * [CI] Fix indentation issue in workflow files (#12685)\r\n    * [CI] Fix failing Pipeline CPU tests (#12681)\r\n    * [Modular] Add single file support to Modular (#12383)\r\n    * Deprecate `upcast_vae` in SDXL based pipelines (#12619)\r\n* @DavidBert\r\n    * Add Photon model and pipeline support (#12456)\r\n    * Prx (#12525)\r\n    * Rope in float32 for mps or npu compatibility (#12665)\r\n    * [PRX pipeline]: add 1024 resolution ratio bins (#12670)\r\n    * PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)\r\n* @galbria\r\n    * Bria fibo (#12545)\r\n    * Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)\r\n* @lawrence-cj\r\n    * [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)\r\n    * SANA-Video Image to Video pipeline `SanaImageToVideoPipeline` support (#12634)\r\n    * fix typo in docs (#12675)\r\n* @zhangjiewu\r\n    * add ChronoEdit (#12593)\r\n* @delmalih\r\n    * Improve docstrings and type hints in scheduling_amused.py (#12623)\r\n    * Improve docstrings and type hints in scheduling_ddim.py (#12622)\r\n    * Improve docstrings and type hints in scheduling_ddpm.py (#12651)\r\n    * Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)\r\n    * Improve docstrings and type hints in scheduling_pndm.py (#12676)\r\n    * Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)\r\n    * Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)\r\n    * [Docs] Update Imagen Video paper link in schedulers (#12724)\r\n    * Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)\r\n    * Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)\r\n    * Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)\r\n    * Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)\r\n* @pratim4dasude\r\n    * Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)\r\n* @JerryWu-code\r\n    * Add Support for Z-Image Series (#12703)\r\n    * Support unittest for Z-image ⚡️ (#12715)\r\n    * Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)\r\n* @CalamitousFelicitousness\r\n    * Add ZImage LoRA support and integrate into ZImagePipeline (#12750)\r\n    * Add ZImageImg2ImgPipeline (#12751)\r\n* @DoctorKey\r\n    * Add support for Ovis-Image (#12740)\r\n","publishedAt":"2025-12-08T10:17:31.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.36.0","media":[]},{"id":"rel_WE76FCxKwvrtWfsbSEQ67","version":"v0.35.2","title":"🐞 fixes for `transformers` models, imports, ","summary":"## All commits\r\n\r\n* Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)\r\n* handle offload_state_dict when initing transformers model...","content":"## All commits\r\n\r\n* Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)\r\n* handle offload_state_dict when initing transformers models  by @sayakpaul in #12438\r\n* [CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue  by @DN6 in #12354\r\n* Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.…  by @Aishwarya0811 in #12206\r\n* fix scale_shift_factor being on cpu for wan and ltx  by @vladmandic in #12347\r\n* Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)","publishedAt":"2025-10-15T04:14:33.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.35.2","media":[]},{"id":"rel_IULuOPBxAM6SXCJQjg8Of","version":"v0.35.1","title":"v0.35.1 for improvements in Qwen-Image Edit","summary":"Thanks to @naykun for the following PRs that improve Qwen-Image Edit:\r\n\r\n* https://github.com/huggingface/diffusers/pull/12188\r\n* https://github.com/h...","content":"Thanks to @naykun for the following PRs that improve Qwen-Image Edit:\r\n\r\n* https://github.com/huggingface/diffusers/pull/12188\r\n* https://github.com/huggingface/diffusers/pull/12190","publishedAt":"2025-08-20T04:17:30.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.35.1","media":[]},{"id":"rel_9J6dD6bkZBvsiBIS_73l6","version":"v0.35.0","title":"Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more","summary":"This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, a...","content":"This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.\r\n\r\n## New pipelines 🧨\r\n\r\nWe welcomed new pipelines in this release:\r\n\r\n- Wan 2.2\r\n- Flux-Kontext\r\n- Qwen-Image\r\n- Qwen-Image-Edit\r\n\r\n### Wan 2.2 📹\r\n\r\nThis update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the [official doc](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan) to learn more.\r\n\r\n### Flux-Kontext 🎇\r\n\r\nFlux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the [official doc](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#kontext) to learn more about it.\r\n\r\n### Qwen-Image 🌅\r\n\r\nAfter a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage).\r\n\r\nThanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via [this PR](https://github.com/huggingface/diffusers//issues/12055) and [this PR](https://github.com/huggingface/diffusers/pull/12164/).\r\n\r\n## New training scripts 🎛️\r\n\r\nMake these newly added models your own with our training scripts:\r\n\r\n- [Kontext trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)\r\n- [Qwen-Image trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)\r\n\r\n## Single-file modeling implementations\r\n\r\nFollowing the 🤗 Transformers’ [philosophy](https://huggingface.co/blog/transformers-design-philosophy) of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer [code](https://github.com/huggingface/diffusers/blob/baa9b582f348e52aa2fc245e366611f454e1082b/src/diffusers/models/transformers/transformer_flux.py) is one example of this.\r\n\r\n## Attention refactor\r\n\r\nWe have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native `scaled_dot_product_attention`, Flash Attention 3, SAGE attention, etc.) in the library seamlessly. \r\n\r\nHaving attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow [this PR](https://github.com/huggingface/diffusers/pull/11941) if you’re interested.\r\n\r\nUsers shouldn’t be affected at all by these changes. Please open an issue if you face any problems.\r\n\r\n## Regional compilation\r\n\r\nRegional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to [this doc](https://huggingface.co/docs/diffusers/main/en/optimization/fp16#regional-compilation) to learn more.\r\n\r\nThanks to @anijain2305 for contributing this feature in [this PR](https://github.com/huggingface/diffusers/pull/11705).\r\n\r\nWe have also authored a number of posts that center around the use of `torch.compile`. You can check them out at the links below:\r\n\r\n- [Presenting Flux Fast: Making Flux go brrr on H100s](https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/)\r\n- [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/)\r\n- [Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast)\r\n\r\n## Faster pipeline loading ⚡️\r\n\r\nUsers can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.\r\n\r\n```diff\r\nfrom diffusers import DiffusionPipeline\r\nimport torch \r\n\r\nckpt_id = \"Qwen/Qwen-Image\"\r\npipe = DiffusionPipeline.from_pretrained(\r\n-    ckpt_id, torch_dtype=torch.bfloat16\r\n- ).to(\"cuda\")\r\n+    ckpt_id, torch_dtype=torch.bfloat16, device_map=\"cuda\"\r\n+ )\t\t\t\t\r\n```\r\n\r\nYou can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files. \r\n\r\n```python\r\nimport os\r\nos.environ[\"HF_ENABLE_PARALLEL_LOADING\"] = \"yes\"\r\n\r\n# rest of the loading code\r\n....\r\n```\r\n\r\n## Better GGUF integration\r\n\r\n@Isotr0py contributed support for native GGUF CUDA kernels in [this PR](https://github.com/huggingface/diffusers/pull/11869). This should provide an approximately 10% improvement in inference speed. \r\n\r\nWe have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more [here](https://huggingface.co/spaces/diffusers-internal-dev/diffusers-to-gguf/).\r\n\r\nWe now support loading of Diffusers format GGUF checkpoints. \r\n\r\nYou can learn more about all of this in our [GGUF official docs](https://huggingface.co/docs/diffusers/main/en/quantization/gguf).\r\n\r\n## Modular Diffusers (Experimental)\r\n\r\nModular Diffusers is a system for building diffusion pipelines pipelines with *individual* *pipeline blocks*. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.  \r\n\r\nThe API is currently in active development and is being released as an experimental feature. Learn more in our [docs](https://huggingface.co/docs/diffusers/main/en/modular_diffusers/overview).\r\n\r\n## All commits\r\n\r\n* [tests] skip instead of returning.  by @sayakpaul in #11793\r\n* adjust to get CI test cases passed on XPU  by @kaixuanliu in #11759\r\n* fix deprecation in lora after 0.34.0 release  by @sayakpaul in #11802\r\n* [chore] post release v0.34.0  by @sayakpaul in #11800\r\n* Follow up for Group Offload to Disk   by @DN6 in #11760\r\n* [rfc][compile] compile method for DiffusionPipeline  by @anijain2305 in #11705\r\n* [tests] add a test on torch compile for varied resolutions  by @sayakpaul in #11776\r\n* adjust tolerance criteria for `test_float16_inference` in unit test  by @kaixuanliu in #11809\r\n* Flux Kontext  by @a-r-r-o-w in #11812\r\n* Kontext training  by @sayakpaul in #11813\r\n* Kontext fixes  by @a-r-r-o-w in #11815\r\n* remove syncs before denoising in Kontext  by @sayakpaul in #11818\r\n* [CI] disable onnx, mps, flax from the CI  by @sayakpaul in #11803\r\n* TorchAO compile + offloading tests  by @a-r-r-o-w in #11697\r\n* Support dynamically loading/unloading loras with group offloading  by @a-r-r-o-w in #11804\r\n* [lora] fix: lora unloading behvaiour  by @sayakpaul in #11822\r\n* [lora]feat: use exclude modules to loraconfig.  by @sayakpaul in #11806\r\n* ENH: Improve speed of function expanding LoRA scales  by @BenjaminBossan in #11834\r\n* Remove print statement in SCM Scheduler  by @a-r-r-o-w in #11836\r\n* [tests] add test for hotswapping + compilation on resolution changes  by @sayakpaul in #11825\r\n* reset deterministic in tearDownClass  by @jiqing-feng in #11785\r\n* [tests] Fix failing float16 cuda tests  by @a-r-r-o-w in #11835\r\n* [single file] Cosmos  by @a-r-r-o-w in #11801\r\n* [docs] fix single_file example.  by @sayakpaul in #11847\r\n* Use real-valued instead of complex tensors in Wan2.1 RoPE  by @mjkvaak-amd in #11649\r\n* [docs] Batch generation  by @stevhliu in #11841\r\n* [docs] Deprecated pipelines  by @stevhliu in #11838\r\n* fix norm not training in train_control_lora_flux.py  by @Luo-Yihang in #11832\r\n* [From Single File] support `from_single_file` method for `WanVACE3DTransformer`  by @J4BEZ in #11807\r\n* [lora] tests for `exclude_modules` with Wan VACE  by @sayakpaul in #11843\r\n* update: FluxKontextInpaintPipeline support  by @vuongminh1907 in #11820\r\n* [Flux Kontext] Support Fal Kontext LoRA  by @linoytsaban in #11823\r\n* [docs] Add a note of `_keep_in_fp32_modules`  by @a-r-r-o-w in #11851\r\n* [benchmarks] overhaul benchmarks  by @sayakpaul in #11565\r\n* FIX set_lora_device when target layers differ  by @BenjaminBossan in #11844\r\n* Fix Wan AccVideo/CausVid fuse_lora  by @a-r-r-o-w in #11856\r\n* [chore] deprecate blip controlnet pipeline.  by @sayakpaul in #11877\r\n* [docs] fix references in flux pipelines.  by @sayakpaul in #11857\r\n* [tests] remove tests for deprecated pipelines.  by @sayakpaul in #11879\r\n* [docs] LoRA metadata  by @stevhliu in #11848\r\n* [training ] add Kontext i2i training  by @sayakpaul in #11858\r\n* [CI] Fix big GPU test marker  by @DN6 in #11786\r\n* First Block Cache  by @a-r-r-o-w in #11180\r\n* [tests] annotate compilation test classes with bnb  by @sayakpaul in #11715\r\n* Update chroma.md  by @shm4r7 in #11891\r\n* [CI] Speed up GPU PR Tests  by @DN6 in #11887\r\n* Pin k-diffusion for CI  by @sayakpaul in #11894\r\n* [Docker] update doc builder dockerfile to include quant libs.  by @sayakpaul in #11728\r\n* [tests] Remove more deprecated tests  by @sayakpaul in #11895\r\n* [tests] mark the wanvace lora tester flaky  by @sayakpaul in #11883\r\n* [tests] add compile + offload tests for GGUF.  by @sayakpaul in #11740\r\n* feat: add multiple input image support in Flux Kontext  by @Net-Mist in #11880\r\n* Fix unique memory address when doing group-offloading with disk  by @sayakpaul in #11767\r\n* [SD3] CFG Cutoff fix and official callback  by @asomoza in #11890\r\n* The Modular Diffusers  by @yiyixuxu in #9672\r\n* [quant] QoL improvements for pipeline-level quant config  by @sayakpaul in #11876\r\n* Bump torch from 2.4.1 to 2.7.0 in /examples/server  by @dependabot[bot] in #11429\r\n* [LoRA] fix: disabling hooks when loading loras.  by @sayakpaul in #11896\r\n* [utils] account for MPS when available in get_device().  by @sayakpaul in #11905\r\n* [ControlnetUnion] Multiple Fixes  by @asomoza in #11888\r\n* Avoid creating tensor in CosmosAttnProcessor2_0  by @chenxiao111222 in #11761) \r\n* [tests] Unify compilation + offloading tests in quantization  by @sayakpaul in #11910\r\n* Speedup model loading by 4-5x ⚡  by @a-r-r-o-w in #11904\r\n* [docs] torch.compile blog post  by @stevhliu in #11837\r\n* Flux: pass joint_attention_kwargs when using gradient_checkpointing  by @piercus in #11814\r\n* Fix: Align VAE processing in ControlNet SD3 training with inference  by @Henry-Bi in #11909\r\n* Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server  by @dependabot[bot] in #11924\r\n* [tests] Improve Flux tests  by @a-r-r-o-w in #11919\r\n* Remove device synchronization when loading weights  by @a-r-r-o-w in #11927\r\n* Remove forced float64 from onnx stable diffusion pipelines  by @lostdisc in #11054\r\n* Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel  by @lengmo1996 in #11923\r\n* [ControlnetUnion] Propagate #11888 to img2img  by @asomoza in #11929\r\n* enable flux pipeline compatible with unipc and dpm-solver  by @gameofdimension in #11908\r\n* [training] add an offload utility that can be used as a context manager.  by @sayakpaul in #11775\r\n* Add SkyReels V2: Infinite-Length Film Generative Model  by @tolgacangoz in #11518\r\n* [refactor] Flux/Chroma single file implementation + Attention Dispatcher  by @a-r-r-o-w in #11916\r\n* [docs] clarify the mapping between `Transformer2DModel` and finegrained variants.  by @sayakpaul in #11947\r\n* [Modular] Updates for Custom Pipeline Blocks  by @DN6 in #11940\r\n* [docs] Update toctree  by @stevhliu in #11936\r\n* [docs] include bp link.  by @sayakpaul in #11952\r\n* Fix kontext finetune issue when batch size >1   by @mymusise in #11921\r\n* [tests] Add test slices for Hunyuan Video  by @a-r-r-o-w in #11954\r\n* [tests] Add test slices for Cosmos  by @a-r-r-o-w in #11955\r\n* [tests] Add fast test slices for HiDream-Image  by @a-r-r-o-w in #11953\r\n* [Modular] update the collection behavior  by @yiyixuxu in #11963\r\n* fix \"Expected all tensors to be on the same device, but found at least two devices\" error  by @yao-matrix in #11690\r\n* Remove logger warnings for attention backends and hard error during runtime instead  by @a-r-r-o-w in #11967\r\n* [Examples] Uniform notations in train_flux_lora  by @tomguluson92 in #10011\r\n* fix style   by @yiyixuxu in #11975\r\n* [tests] Add test slices for Wan  by @a-r-r-o-w in #11920\r\n* [docs] update `guidance_scale` docstring for guidance_distilled models.  by @sayakpaul in #11935\r\n* [tests] enforce torch version in the compilation tests.  by @sayakpaul in #11979\r\n* [modular diffusers] Wan  by @a-r-r-o-w in #11913\r\n* [compile] logger statements create unnecessary guards during dynamo tracing  by @a-r-r-o-w in #11987\r\n* enable quantcompile test on xpu  by @yao-matrix in #11988\r\n* [WIP] Wan2.2  by @yiyixuxu in #12004\r\n* [refactor] some shared parts between hooks + docs  by @a-r-r-o-w in #11968\r\n* [refactor] Wan single file implementation  by @a-r-r-o-w in #11918\r\n* Fix huggingface-hub failing tests  by @asomoza in #11994\r\n* feat: add flux kontext  by @jlonge4 in #11985\r\n* [modular] add Modular flux for text-to-image  by @sayakpaul in #11995\r\n* [docs] include lora fast post.  by @sayakpaul in #11993\r\n* [docs] quant_kwargs  by @stevhliu in #11712\r\n* [docs] Fix link  by @stevhliu in #12018\r\n* [wan2.2] add 5b i2v  by @yiyixuxu in #12006\r\n* wan2.2 i2v FirstBlockCache fix  by @okaris in #12013\r\n* [core] support attention backends for LTX  by @sayakpaul in #12021\r\n* [docs] Update index  by @stevhliu in #12020\r\n* [Fix] huggingface-cli to hf missed files  by @asomoza in #12008\r\n* [training-scripts] Make pytorch examples UV-compatible  by @sayakpaul in #12000\r\n* [wan2.2] fix vae patches  by @yiyixuxu in #12041\r\n* Allow SD pipeline to use newer schedulers, eg: FlowMatch  by @ppbrown in #12015\r\n* [LoRA] support lightx2v lora in wan  by @sayakpaul in #12040\r\n* Fix type of force_upcast to bool  by @BerndDoser in #12046\r\n* Update autoencoder_kl_cosmos.py  by @tanuj-rai in #12045\r\n* Qwen-Image  by @naykun in #12055\r\n* [wan2.2] follow-up  by @yiyixuxu in #12024\r\n* tests + minor refactor for QwenImage  by @a-r-r-o-w in #12057\r\n* Cross attention module to Wan Attention  by @samuelt0 in #12058\r\n* fix(qwen-image): update vae license  by @naykun in #12063\r\n* CI fixing  by @paulinebm in #12059\r\n* enable all gpus when running ci.  by @sayakpaul in #12062\r\n* fix the rest for all GPUs in CI  by @sayakpaul in #12064\r\n* [docs] Install  by @stevhliu in #12026\r\n* [wip] feat: support lora in qwen image and training script  by @sayakpaul in #12056\r\n* [docs] small corrections to the example in the Qwen docs  by @sayakpaul in #12068\r\n* [tests] Fix Qwen test_inference slices  by @a-r-r-o-w in #12070\r\n* [tests] deal with the failing AudioLDM2 tests  by @sayakpaul in #12069\r\n* optimize QwenImagePipeline to reduce unnecessary CUDA synchronization  by @chengzeyi in #12072\r\n* Add cuda kernel support for GGUF inference  by @Isotr0py in #11869\r\n* fix input shape for WanGGUFTexttoVideoSingleFileTests  by @jiqing-feng in #12081\r\n* [refactor] condense group offloading   by @a-r-r-o-w in #11990\r\n* Fix group offloading synchronization bug for parameter-only GroupModule's  by @a-r-r-o-w in #12077\r\n* Helper functions to return skip-layer compatible layers  by @a-r-r-o-w in #12048\r\n* Make `prompt_2` optional in Flux Pipelines   by @DN6 in #12073\r\n* [tests] tighten compilation tests for quantization  by @sayakpaul in #12002\r\n* Implement Frequency-Decoupled Guidance (FDG) as a Guider  by @dg845 in #11976\r\n* fix flux type hint  by @DefTruth in #12089\r\n* [qwen] device typo  by @yiyixuxu in #12099\r\n* [lora] adapt new LoRA config injection method  by @sayakpaul in #11999\r\n* lora_conversion_utils: replace lora up/down with a/b even if `transformer.` in key  by @Beinsezii in #12101\r\n* [tests] device placement for non-denoiser components in group offloading LoRA tests  by @sayakpaul in #12103\r\n* [Modular] Fast Tests  by @yiyixuxu in #11937\r\n* [GGUF] feat: support loading diffusers format gguf checkpoints.  by @sayakpaul in #11684\r\n* [docs] diffusers gguf checkpoints  by @sayakpaul in #12092\r\n* [core] add modular support for Flux I2I  by @sayakpaul in #12086\r\n* [lora] support loading loras from `lightx2v/Qwen-Image-Lightning`  by @sayakpaul in #12119\r\n* [Modular] More Updates for Custom Code Loading  by @DN6 in #11969\r\n* enable compilation in qwen image.  by @sayakpaul in #12061\r\n* [tests] Add inference test slices for SD3 and remove unnecessary tests  by @a-r-r-o-w in #12106\r\n* [chore] complete the licensing statement.  by @sayakpaul in #12001\r\n* [docs] Cache link  by @stevhliu in #12105\r\n* [Modular] Add experimental feature warning for Modular Diffusers  by @DN6 in #12127\r\n* Add low_cpu_mem_usage option to from_single_file to align with from_pretrained  by @IrisRainbowNeko in #12114\r\n* [docs] Modular diffusers  by @stevhliu in #11931\r\n* [Bugfix] typo fix in NPU FA  by @leisuzz in #12129\r\n* Add QwenImage Inpainting and Img2Img pipeline  by @Trgtuan10 in #12117\r\n* [core] parallel loading of shards  by @sayakpaul in #12028\r\n* try to use deepseek with an agent to auto i18n to zh  by @SamYuan1990 in #12032\r\n* [docs] Refresh effective and efficient doc  by @stevhliu in #12134\r\n* Fix bf15/fp16 for pipeline_wan_vace.py  by @SlimRG in #12143\r\n* make parallel loading flag a part of constants.  by @sayakpaul in #12137\r\n* [docs] Parallel loading of shards  by @stevhliu in #12135\r\n* feat: cuda device_map for pipelines.  by @sayakpaul in #12122\r\n* [core] respect `local_files_only=True` when using sharded checkpoints  by @sayakpaul in #12005\r\n* support `hf_quantizer` in cache warmup.  by @sayakpaul in #12043\r\n* make test_gguf all pass on xpu  by @yao-matrix in #12158\r\n* [docs] Quickstart  by @stevhliu in #12128\r\n* Qwen Image Edit Support  by @naykun in #12164\r\n* remove silu for CogView4  by @lambertwjh in #12150\r\n* [qwen] Qwen image edit followups  by @sayakpaul in #12166\r\n* Minor modification to support DC-AE-turbo  by @chenjy2003 in #12169\r\n* [Docs] typo error in qwen image  by @leisuzz in #12144\r\n* fix: caching allocator behaviour for quantization.  by @sayakpaul in #12172\r\n* fix(training_utils): wrap device in list for DiffusionPipeline  by @MengAiDev in #12178\r\n* [docs] Clarify guidance scale in Qwen pipelines  by @sayakpaul in #12181\r\n* [LoRA] feat: support more Qwen LoRAs from the community.  by @sayakpaul in #12170\r\n* Update README.md  by @Taechai in #12182\r\n* [chore] add lora button to qwenimage docs  by @sayakpaul in #12183\r\n* [Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora   by @linoytsaban in #12074\r\n* Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @vuongminh1907\r\n    * update: FluxKontextInpaintPipeline support (#11820)\r\n* @Net-Mist\r\n    * feat: add multiple input image support in Flux Kontext (#11880)\r\n* @tolgacangoz\r\n    * Add SkyReels V2: Infinite-Length Film Generative Model (#11518)\r\n* @naykun\r\n    * Qwen-Image (#12055)\r\n    * fix(qwen-image): update vae license (#12063)\r\n    * Qwen Image Edit Support (#12164)\r\n* @Trgtuan10\r\n    * Add QwenImage Inpainting and Img2Img pipeline (#12117)\r\n* @SamYuan1990\r\n    * try to use deepseek with an agent to auto i18n to zh (#12032)\r\n","publishedAt":"2025-08-19T03:28:44.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.35.0","media":[]},{"id":"rel_RPTKNQVq6NvbhyOjIccg_","version":"v0.34.0","title":"Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more","summary":"## 📹 New video generation pipelines\r\n\r\n### Wan VACE\r\n\r\nWan VACE supports various generation techniques which achieve controllable video generation. I...","content":"## 📹 New video generation pipelines\r\n\r\n### Wan VACE\r\n\r\nWan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:\r\n\r\n- Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: [**huggingface/controlnet_aux**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan)\r\n- Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)\r\n- Inpainting and Outpainting\r\n- Subject to Video (faces, object, characters, etc.)\r\n- Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)\r\n\r\nThe code snippets available in [**this**](https://github.com/huggingface/diffusers/pull/11582) pull request demonstrate some examples of how videos can be generated with controllability signals.\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#any-to-video-controllable-generation) to learn more.\r\n\r\n### Cosmos Predict2 Video2World\r\n\r\nCosmos-Predict2 is a key branch of the [Cosmos World Foundation Models](https://www.nvidia.com/en-us/ai/cosmos) (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.\r\n\r\nThe Video2World model comes in a 2B and 14B variant. Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos) to learn more.\r\n\r\n### LTX 0.9.7 and Distilled\r\n\r\nLTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/en/api/pipelines/ltx_video) to learn more.\r\n\r\n### Hunyuan Video Framepack and F1\r\n\r\n[Framepack](https://github.com/lllyasviel/FramePack) is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/framepack#framepack) to learn more.\r\n\r\n### FusionX\r\n\r\nThe [FusionX family of models and LoRAs](https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX), built on top of Wan2.1-14B, should already be supported. To load the model, use `from_single_file()`:\r\n\r\n```python\r\nfrom diffusers import WanTransformer3DModel\r\n\r\ntransformer = WanTransformer3DModel.from_single_file(\r\n    \"https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors\",\r\n    torch_dtype=torch.bfloat16\r\n)\r\n```\r\n\r\nTo load the LoRAs, use `load_lora_weights()`:\r\n\r\n```python\r\npipe = DiffusionPipeline.from_pretrained(\r\n    \"Wan-AI/Wan2.1-T2V-14B-Diffusers\",\r\n    torch_dtype=torch.bfloat16\r\n).to(\"cuda\")\r\npipe.load_lora_weights(\r\n    \"vrgamedevgirl84/Wan14BT2VFusioniX\", weight_name=\"FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors\"\r\n)\r\n```\r\n\r\n### AccVideo and CausVid (only LoRAs)\r\n\r\n[AccVideo](https://github.com/aejion/AccVideo) and [CausVid](https://github.com/tianweiy/CausVid) are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.\r\n\r\n## 🌠 New image generation pipelines\r\n\r\n### Cosmos Predict2 Text2Image\r\n\r\nText-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos) to learn more.\r\n\r\n### Chroma\r\n\r\nChroma is a **8.9B** parameter model based on **FLUX.1-schnell.** It’s fully **Apache 2.0 licensed**, ensuring that **anyone** can use, modify, and build on top of it. Checkout the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/chroma#chroma) to learn more\r\n\r\nThanks to @Ednaordinary for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11698)!\r\n\r\n### VisualCloze\r\n\r\n[VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning](https://huggingface.co/papers/2504.07960) is an innovative in-context learning framework based universal image generation framework that offers key capabilities:\r\n\r\n1. Support for various in-domain tasks\r\n2. Generalization to unseen tasks through in-context learning\r\n3. Unify multiple tasks into one step and generate both target image and intermediate results\r\n4. Support reverse-engineering conditions from target images\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/visualcloze) to learn more. Thanks to @lzyhha for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/11377)!\r\n\r\n## Better `torch.compile` support\r\n\r\nWe have worked with the PyTorch team to improve how we provide `torch.compile()` compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing `torch.compile()` benefits. Refer to the following links to learn more:\r\n\r\n- https://github.com/huggingface/diffusers/pull/11085\r\n- https://github.com/huggingface/diffusers/issues/11430\r\n\r\nAdditionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import DiffusionPipeline\r\ntorch._dynamo.config.cache_size_limit = 10000\r\n\r\npipeline = DiffusionPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-schnell\", torch_dtype=torch.bfloat16\r\n)\r\npipline.enable_model_cpu_offload()\r\n# Compile.\r\npipeline.transformer.compile()\r\n\r\nimage = pipeline(\r\n    prompt=\"An astronaut riding a horse on Mars\",\r\n    guidance_scale=0.,\r\n    height=768,\r\n    width=1360,\r\n    num_inference_steps=4,\r\n    max_sequence_length=256,\r\n).images[0]\r\nprint(f\"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB\")\r\n```\r\n\r\n</details>\r\n\r\nThis is compatible with [group offloading](https://huggingface.co/docs/diffusers/main/en/optimization/memory#group-offloading), too. Interested readers can check out the concerned PRs below:\r\n\r\n- https://github.com/huggingface/diffusers/pull/11605\r\n- https://github.com/huggingface/diffusers/pull/11670\r\n\r\nYou can substantially reduce memory requirements by combining quantization with offloading and then improving speed with `torch.compile()`. Below is an example:\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nfrom diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig\r\nfrom transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig\r\nfrom diffusers import AutoModel, FluxPipeline\r\nfrom transformers import T5EncoderModel\r\n\r\nimport torch\r\ntorch._dynamo.config.recompile_limit = 1000 \r\n\r\nquant_kwargs = {\"load_in_4bit\": True, \"bnb_4bit_compute_dtype\": torch_dtype, \"bnb_4bit_quant_type\": \"nf4\"}\r\ntext_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs)\r\ndit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs)\r\n\r\nckpt_id = \"black-forest-labs/FLUX.1-dev\"\r\ntext_encoder_2 = T5EncoderModel.from_pretrained(\r\n    ckpt_id,\r\n    subfolder=\"text_encoder_2\",\r\n    quantization_config=text_encoder_2_quant_config,\r\n    torch_dtype=torch_dtype,\r\n)\r\ntransformer = AutoModel.from_pretrained(\r\n    ckpt_id,\r\n    subfolder=\"transformer\",\r\n    quantization_config=dit_quant_config,\r\n    torch_dtype=torch_dtype,\r\n)\r\npipe = FluxPipeline.from_pretrained(\r\n    ckpt_id,\r\n    transformer=transformer,\r\n    text_encoder_2=text_encoder_2,\r\n    torch_dtype=torch_dtype,\r\n)\r\npipe.enable_model_cpu_offload()\r\npipe.transformer.compile()\r\n\r\nimage = pipeline(\r\n    prompt=\"An astronaut riding a horse on Mars\",\r\n    guidance_scale=3.5,\r\n    height=768,\r\n    width=1360,\r\n    num_inference_steps=28,\r\n    max_sequence_length=512,\r\n).images[0]\r\n```\r\n\r\n</details>\r\n    \r\nStarting from `bitsandbytes==0.46.0` onwards, bnb-quantized models should be fully compatible with `torch.compile()` without graph-breaks. This means that when compiling a bnb-quantized model, users can do: `model.compile(fullgraph=True)`. This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to [this benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) to learn more. \r\n\r\n![image](https://github.com/user-attachments/assets/d0fd34bf-618e-41d2-b2bb-e821fe02f59b)\r\n\r\nNote that for 4bit bnb models, it’s currently needed to install PyTorch nightly if `fullgraph=True` is specified during compilation.\r\n\r\nHuge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.\r\n\r\n## PipelineQuantizationConfig\r\n\r\nUsers can now provide a quantization config while initializing a pipeline:\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import DiffusionPipeline\r\nfrom diffusers.quantizers import PipelineQuantizationConfig\r\n\r\npipeline_quant_config = PipelineQuantizationConfig(\r\n     quant_backend=\"bitsandbytes_4bit\",\r\n     quant_kwargs={\"load_in_4bit\": True, \"bnb_4bit_quant_type\": \"nf4\", \"bnb_4bit_compute_dtype\": torch.bfloat16},\r\n     components_to_quantize=[\"transformer\", \"text_encoder_2\"],\r\n)\r\npipe = DiffusionPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-dev\",\r\n    quantization_config=pipeline_quant_config,\r\n    torch_dtype=torch.bfloat16,\r\n).to(\"cuda\")\r\n\r\nimage = pipe(\"photo of a cute dog\").images[0]\r\n```\r\n\r\nThis reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about [different configurations](https://huggingface.co/docs/diffusers/main/en/quantization/overview#pipeline-level-quantization) allowed through `PipelineQuantizationConfig`.\r\n\r\n## Group offloading with disk\r\n\r\nIn the [previous release](https://github.com/huggingface/diffusers/releases/tag/v0.33.0), we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.\r\n\r\nHowever, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work. \r\n\r\nStarting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the `offload_to_disk_path` to enable this feature.\r\n\r\n```python\r\npipeline.transformer.enable_group_offload(\r\n    onload_device=\"cuda\", \r\n    offload_device=\"cpu\", \r\n    offload_type=\"leaf_level\", \r\n    offload_to_disk_path=\"path/to/disk\"\r\n)\r\n```\r\n\r\nRefer to these [two](https://github.com/huggingface/diffusers/pull/11682#issue-3129365363) [tables](https://github.com/huggingface/diffusers/pull/11682#issuecomment-2955715126) to compare the speed and memory trade-offs.\r\n\r\n## LoRA metadata parsing\r\n\r\nIt is beneficial to include the `LoraConfig` in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom `lora_alpha` through the CLI. Refer to [this thread](https://github.com/huggingface/diffusers/pull/11723) for more updates. Refer to [this comment](https://github.com/huggingface/diffusers/pull/11324#issuecomment-2998516012) for some extended clarifications.\r\n\r\n## New training scripts\r\n\r\n- We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out [this resource](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/sana) for more details. Thanks to @scxue and @lawrence-cj for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11514).\r\n- HiDream LoRA DreamBooth training script ([docs](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md)). The script supports training with quantization. [HiDream](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream) is an MIT-licensed model. So, make it yours with this training script.\r\n\r\n## Updates on educational materials on quantization\r\n\r\nWe have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:\r\n\r\n- [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization)\r\n- [(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware](https://huggingface.co/blog/flux-qlora)\r\n\r\n## All commits\r\n\r\n* [LoRA] support musubi wan loras.  by @sayakpaul in #11243\r\n* fix test_vanilla_funetuning failure on XPU and A100  by @yao-matrix in #11263\r\n* make test_stable_diffusion_inpaint_fp16 pass on XPU  by @yao-matrix in #11264\r\n* make test_dict_tuple_outputs_equivalent pass on XPU  by @yao-matrix in #11265\r\n* add onnxruntime-qnn & onnxruntime-cann  by @xieofxie in #11269\r\n* make test_instant_style_multiple_masks pass on XPU  by @yao-matrix in #11266\r\n* [BUG] Fix convert_vae_pt_to_diffusers bug  by @lavinal712 in #11078\r\n* Fix LTX 0.9.5 single file  by @hlky in #11271\r\n* [Tests] Cleanup lora tests utils  by @sayakpaul in #11276\r\n* [CI] relax tolerance for unclip further  by @sayakpaul in #11268\r\n* do not use `DIFFUSERS_REQUEST_TIMEOUT` for notification bot  by @sayakpaul in #11273\r\n* Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi  by @kuantuna in #11294\r\n* HiDream Image  by @hlky in #11231\r\n* flow matching lcm scheduler  by @quickjkee in #11170\r\n* Update autoencoderkl_allegro.md  by @Forbu in #11303\r\n* Hidream refactoring follow ups  by @a-r-r-o-w in #11299\r\n* Fix incorrect tile_latent_min_width calculations  by @kuantuna in #11305\r\n* [ControlNet] Adds controlnet for SanaTransformer  by @ishan-modi in #11040\r\n* make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU  by @yao-matrix in #11308\r\n* make test_stable_diffusion_karras_sigmas pass on XPU  by @yao-matrix in #11310\r\n* make `KolorsPipelineFastTests::test_inference_batch_single_identical` pass on XPU  by @faaany in #11313\r\n* [LoRA] support more SDXL loras.  by @sayakpaul in #11292\r\n* [HiDream] code example  by @linoytsaban in #11317\r\n* import for FlowMatchLCMScheduler  by @asomoza in #11318\r\n* Use float32 on mps or npu in transformer_hidream_image's rope  by @hlky in #11316\r\n* Add `skrample` section to `community_projects.md`  by @Beinsezii in #11319\r\n* [docs] Promote `AutoModel` usage  by @sayakpaul in #11300\r\n* [LoRA] Add LoRA support to AuraFlow  by @hameerabbasi in #10216\r\n* Fix vae.Decoder prev_output_channel  by @hlky in #11280\r\n* fix CPU offloading related fail cases on XPU  by @yao-matrix in #11288\r\n* [docs] fix hidream docstrings.  by @sayakpaul in #11325\r\n* Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible  by @AstraliteHeart in #11297\r\n* post release 0.33.0  by @sayakpaul in #11255\r\n* another fix for FlowMatchLCMScheduler forgotten import  by @asomoza in #11330\r\n* Fix Hunyuan I2V for `transformers>4.47.1`    by @DN6 in #11293\r\n* unpin torch versions for onnx Dockerfile  by @sayakpaul in #11290\r\n* [single file] enable telemetry for single file loading when using GGUF.  by @sayakpaul in #11284\r\n* [docs] add a snippet for compilation in the auraflow docs.  by @sayakpaul in #11327\r\n* Hunyuan I2V fast tests fix  by @DN6 in #11341\r\n* [BUG] fixed _toctree.yml alphabetical ordering  by @ishan-modi in #11277\r\n* Fix wrong dtype argument name as torch_dtype  by @nPeppon in #11346\r\n* [chore] fix lora docs utils  by @sayakpaul in #11338\r\n* [docs] add note about use_duck_shape in auraflow docs.  by @sayakpaul in #11348\r\n* [LoRA] Propagate `hotswap` better  by @sayakpaul in #11333\r\n* [Hi Dream] follow-up  by @yiyixuxu in #11296\r\n* [bitsandbytes] improve dtype mismatch handling for bnb + lora.  by @sayakpaul in #11270\r\n* Update controlnet_flux.py  by @haofanwang in #11350\r\n* enable 2 test cases on XPU  by @yao-matrix in #11332\r\n* [BNB] Fix test_moving_to_cpu_throws_warning   by @SunMarc in #11356\r\n* support Wan-FLF2V  by @yiyixuxu in #11353\r\n* Fix: `StableDiffusionXLControlNetAdapterInpaintPipeline` incorrectly inherited `StableDiffusionLoraLoaderMixin`  by @Kazuki-Yoda in #11357\r\n* update output for Hidream transformer  by @yiyixuxu in #11366\r\n* [Wan2.1-FLF2V] update conversion script  by @yiyixuxu in #11365\r\n* [Flux LoRAs] fix lr scheduler bug in distributed scenarios  by @linoytsaban in #11242\r\n* [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env  by @kghamilton89 in #11240\r\n* fix issue that training flux controlnet was unstable and validation r…  by @PromeAIpro in #11373\r\n* Fix Wan I2V prepare_latents dtype  by @a-r-r-o-w in #11371\r\n* [BUG] fixes in kadinsky pipeline  by @ishan-modi in #11080\r\n* Add Serialized Type Name kwarg in Model Output  by @anzr299 in #10502\r\n* [cogview4][feat] Support attention mechanism with variable-length support and batch packing  by @OleehyO in #11349\r\n* Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma  by @josephrocca in #11120\r\n* [Refactor] Minor Improvement for import utils  by @ishan-modi in #11161\r\n* Add stochastic sampling to FlowMatchEulerDiscreteScheduler  by @apolinario in #11369\r\n* [LoRA] add LoRA support to HiDream and fine-tuning script  by @linoytsaban in #11281\r\n* Update modeling imports  by @a-r-r-o-w in #11129\r\n* [HiDream] move deprecation to 0.35.0  by @yiyixuxu in #11384\r\n* Update README_hidream.md  by @AMEERAZAM08 in #11386\r\n* Fix group offloading with block_level and use_stream=True  by @a-r-r-o-w in #11375\r\n* [train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing  by @ishandutta0098 in #11395\r\n* [Feature] Added Xlab Controlnet support  by @ishan-modi in #11249\r\n* Kolors additional pipelines, community contrib  by @Teriks in #11372\r\n* [HiDream LoRA] optimizations + small updates  by @linoytsaban in #11381\r\n* Fix Flux IP adapter argument in the pipeline example  by @AeroDEmi in #11402\r\n* [BUG] fixed WAN docstring  by @ishan-modi in #11226\r\n* Fix typos in strings and comments  by @co63oc in #11407\r\n* [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing  by @merterbak in #11421\r\n* [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile()  by @sayakpaul in #11085\r\n* enable group_offload cases and quanto cases on XPU  by @yao-matrix in #11405\r\n* enable test_layerwise_casting_memory cases on XPU  by @yao-matrix in #11406\r\n* [tests] fix import.  by @sayakpaul in #11434\r\n* [train_text_to_image] Better image interpolation in training scripts follow up  by @tongyu0924 in #11426\r\n* [train_text_to_image_lora] Better image interpolation in training scripts follow up  by @tongyu0924 in #11427\r\n* enable 28 GGUF test cases on XPU  by @yao-matrix in #11404\r\n* [Hi-Dream LoRA] fix bug in validation  by @linoytsaban in #11439\r\n* Fixing missing provider options argument  by @urpetkov-amd in #11397\r\n* Set LANCZOS as the default interpolation for image resizing in ControlNet training  by @YoulunPeng in #11449\r\n* Raise warning instead of error for block offloading with streams  by @a-r-r-o-w in #11425\r\n* enable marigold_intrinsics cases on XPU  by @yao-matrix in #11445\r\n* `torch.compile` fullgraph compatibility for Hunyuan Video  by @a-r-r-o-w in #11457\r\n* enable consistency test cases on XPU, all passed  by @yao-matrix in #11446\r\n* enable unidiffuser test cases on xpu  by @yao-matrix in #11444\r\n* Add generic support for Intel Gaudi accelerator (hpu device)  by @dsocek in #11328\r\n* Add StableDiffusion3InstructPix2PixPipeline  by @xduzhangjiayu in #11378\r\n* make safe diffusion test cases pass on XPU and A100  by @yao-matrix in #11458\r\n* [test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models  by @tongyu0924 in #11431\r\n* Add LANCZOS as default interplotation mode.  by @Va16hav07 in #11463\r\n* make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu  by @yao-matrix in #11461\r\n* [WAN] fix recompilation issues  by @sayakpaul in #11475\r\n* Fix typos in docs and comments  by @co63oc in #11416\r\n* [tests] xfail recent pipeline tests for specific methods.  by @sayakpaul in #11469\r\n* cache packages_distributions  by @vladmandic in #11453\r\n* [docs] Memory optims  by @stevhliu in #11385\r\n* [docs] Adapters  by @stevhliu in #11331\r\n* [train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing  by @yuanjua in #11471\r\n* [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing  by @ysurs in #11472\r\n* enable semantic diffusion and stable diffusion panorama cases on XPU  by @yao-matrix in #11459\r\n* [Feature] Implement tiled VAE encoding/decoding for Wan model.  by @c8ef in #11414\r\n* [train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing  by @ParagEkbote in #11455\r\n* [train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos)  by @MinJu-Ha in #11490\r\n* [train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing  by @cjfghk5697 in #11491\r\n* [training] feat: enable quantization for hidream lora training.  by @sayakpaul in #11494\r\n* Set LANCZOS as the default interpolation method for image resizing.  by @yijun-lee in #11492\r\n* Update training script for txt to img sdxl with lora supp with new interpolation.  by @RogerSinghChugh in #11496\r\n* Fix torchao docs typo for fp8 granular quantization  by @a-r-r-o-w in #11473\r\n* Update setup.py to pin min version of `peft`  by @sayakpaul in #11502\r\n* update dep table.  by @sayakpaul in #11504\r\n* [LoRA] use `removeprefix` to preserve sanity.  by @sayakpaul in #11493\r\n* Hunyuan Video Framepack  by @a-r-r-o-w in #11428\r\n* enable lora cases on XPU  by @yao-matrix in #11506\r\n* [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility  by @iamwavecut in #11441) \r\n* [docs] minor updates to bitsandbytes docs.  by @sayakpaul in #11509\r\n* Cosmos  by @a-r-r-o-w in #10660\r\n* clean up the __Init__ for stable_diffusion  by @yiyixuxu in #11500\r\n* fix audioldm by @sayakpaul (direct commit on v0.34.0-release)\r\n* Revert \"fix audioldm\" by @sayakpaul (direct commit on v0.34.0-release)\r\n* [LoRA] make lora alpha and dropout configurable   by @linoytsaban in #11467\r\n* Add cross attention type for Sana-Sprint training in diffusers.  by @scxue in #11514\r\n* Conditionally import torchvision in Cosmos transformer  by @a-r-r-o-w in #11524\r\n* [tests] fix audioldm2 for transformers main.  by @sayakpaul in #11522\r\n* feat: pipeline-level quantization config  by @sayakpaul in #11130\r\n* [Tests] Enable more general testing for `torch.compile()` with LoRA hotswapping  by @sayakpaul in #11322\r\n* [LoRA] support non-diffusers hidream loras  by @sayakpaul in #11532\r\n* enable 7 cases on XPU  by @yao-matrix in #11503\r\n* [LTXPipeline] Update latents dtype to match VAE dtype  by @james-p-xu in #11533\r\n* enable dit integration cases on xpu  by @yao-matrix in #11523\r\n* enable print_env on xpu  by @yao-matrix in #11507\r\n* Change Framepack transformer layer initialization order  by @a-r-r-o-w in #11535\r\n* [tests] add tests for framepack transformer model.  by @sayakpaul in #11520\r\n* Hunyuan Video Framepack F1  by @a-r-r-o-w in #11534\r\n* enable several pipeline integration tests on XPU  by @yao-matrix in #11526\r\n* [test_models_transformer_ltx.py] help us test torch.compile() for impactful models  by @cjfghk5697 in #11512\r\n* Add VisualCloze  by @lzyhha in #11377\r\n* Fix typo in train_diffusion_orpo_sdxl_lora_wds.py  by @Meeex2 in #11541\r\n* fix: remove `torch_dtype=\"auto\"` option from docstrings  by @johannaSommer in #11513\r\n* [train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env  by @kghamilton89 in #11239\r\n* [LoRA] small change to support Hunyuan LoRA Loading for FramePack  by @linoytsaban in #11546\r\n* LTX Video 0.9.7  by @a-r-r-o-w in #11516\r\n* [tests] Enable testing for HiDream transformer  by @sayakpaul in #11478\r\n* Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls.  by @Meatfucker in #11545\r\n*  Fix deprecation warnings in test_ltx_image2video.py  by @AChowdhury1211 in #11538\r\n* [tests] Add torch.compile test for UNet2DConditionModel  by @olccihyeon in #11537\r\n* [Single File] GGUF/Single File Support for HiDream  by @DN6 in #11550\r\n* [gguf] Refactor __torch_function__ to avoid unnecessary computation  by @anijain2305 in #11551\r\n* [tests] add tests for combining layerwise upcasting and groupoffloading.  by @sayakpaul in #11558\r\n* [docs] Regional compilation docs  by @sayakpaul in #11556\r\n* enhance value guard of _device_agnostic_dispatch  by @yao-matrix in #11553\r\n* Doc update  by @Player256 in #11531\r\n* Revert error to warning when loading LoRA from repo with multiple weights  by @apolinario in #11568\r\n* [docs] tip for group offloding + quantization  by @sayakpaul in #11576\r\n* [LoRA] support non-diffusers LTX-Video loras  by @linoytsaban in #11572\r\n* [WIP][LoRA] start supporting kijai wan lora.  by @sayakpaul in #11579\r\n* [Single File] Fix loading for LTX 0.9.7 transformer  by @DN6 in #11578\r\n* Use HF Papers  by @qgallouedec in #11567\r\n* LTX 0.9.7-distilled; documentation improvements  by @a-r-r-o-w in #11571\r\n* [LoRA] kijai wan lora support for I2V  by @linoytsaban in #11588\r\n* docs: fix invalid links  by @osrm in #11505\r\n* [docs] Remove fast diffusion tutorial  by @stevhliu in #11583\r\n* RegionalPrompting: Inherit from Stable Diffusion  by @b-sai in #11525\r\n* [chore] allow string device to be passed to randn_tensor.  by @sayakpaul in #11559\r\n* Type annotation fix  by @DN6 in #11597\r\n* [LoRA] minor fix for `load_lora_weights()` for Flux and a test  by @sayakpaul in #11595\r\n* Update Intel Gaudi doc  by @regisss in #11479\r\n* enable pipeline test cases on xpu  by @yao-matrix in #11527\r\n* [Feature] AutoModel can load components using model_index.json  by @ishan-modi in #11401\r\n* [docs] Pipeline-level quantization  by @stevhliu in #11604\r\n* Fix bug when `variant` and `safetensor` file does not match  by @kaixuanliu in #11587\r\n* [tests] Changes to the `torch.compile()` CI and tests  by @sayakpaul in #11508\r\n* Fix mixed variant downloading  by @DN6 in #11611\r\n* fix security issue in build docker ci  by @sayakpaul in #11614\r\n* Make group offloading compatible with torch.compile()  by @sayakpaul in #11605\r\n* [training docs] smol update to README files   by @linoytsaban in #11616\r\n* Adding NPU for get device function  by @leisuzz in #11617\r\n* [LoRA] improve LoRA fusion tests  by @sayakpaul in #11274\r\n* [Sana Sprint] add image-to-image pipeline   by @linoytsaban in #11602\r\n* [CI] fix the filename for displaying failures in lora ci.  by @sayakpaul in #11600\r\n* [docs] PyTorch 2.0  by @stevhliu in #11618\r\n* [textual_inversion_sdxl.py] fix lr scheduler steps count  by @yuanjua in #11557\r\n* Fix wrong indent for examples of controlnet script  by @Justin900429 in #11632\r\n* removing unnecessary else statement  by @YanivDorGalron in #11624\r\n* enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed  by @yao-matrix in #11620\r\n* Bug: Fixed Image 2 Image example  by @vltmedia in #11619\r\n* typo fix in pipeline_flux.py  by @YanivDorGalron in #11623\r\n* Fix typos in strings and comments  by @co63oc in #11476\r\n* [docs] update torchao doc link  by @sayakpaul in #11634\r\n* Use float32 RoPE freqs in Wan with MPS backends  by @hvaara in #11643\r\n* [chore] misc changes in the bnb tests for consistency.  by @sayakpaul in #11355\r\n* [tests] chore: rename lora model-level tests.  by @sayakpaul in #11481\r\n* [docs] Caching methods  by @stevhliu in #11625\r\n* [docs] Model cards  by @stevhliu in #11112\r\n* [CI] Some improvements to Nightly reports summaries  by @DN6 in #11166\r\n* [chore] bring PipelineQuantizationConfig at the top of the import chain.  by @sayakpaul in #11656\r\n* [examples] flux-control: use num_training_steps_for_scheduler  by @Markus-Pobitzer in #11662\r\n* use deterministic to get stable result  by @jiqing-feng in #11663\r\n* [tests] add test for torch.compile + group offloading  by @sayakpaul in #11670\r\n* Wan VACE  by @a-r-r-o-w in #11582\r\n* fixed axes_dims_rope init (huggingface#11641)  by @sofinvalery in #11678\r\n* [tests] Fix how compiler mixin classes are used  by @sayakpaul in #11680\r\n* Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process  by @DN6 in #11596\r\n* Add community class StableDiffusionXL_T5Pipeline  by @ppbrown in #11626\r\n* Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area  by @Meatfucker in #11658\r\n* Allow remote code repo names to contain \".\"  by @akasharidas in #11652\r\n* [LoRA] support Flux Control LoRA with bnb 8bit.  by @sayakpaul in #11655\r\n* [`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline`  by @tolgacangoz in #11639\r\n* enable torchao test cases on XPU and switch to device agnostic APIs for test cases  by @yao-matrix in #11654\r\n* [tests] tests for compilation + quantization (bnb)  by @sayakpaul in #11672\r\n* [tests] model-level `device_map` clarifications  by @sayakpaul in #11681\r\n* Improve Wan docstrings  by @a-r-r-o-w in #11689\r\n* Set _torch_version to N/A if torch is disabled.  by @rasmi in #11645\r\n* Avoid DtoH sync from access of nonzero() item in scheduler  by @jbschlosser in #11696\r\n* Apply Occam's Razor in position embedding calculation  by @tolgacangoz in #11562\r\n* [docs] add compilation bits to the bitsandbytes docs.  by @sayakpaul in #11693\r\n* swap out token for style bot.  by @sayakpaul in #11701\r\n* [docs] mention fp8 benefits on supported hardware.  by @sayakpaul in #11699\r\n* Support Wan AccVideo lora  by @a-r-r-o-w in #11704\r\n* [LoRA] parse metadata from LoRA and save metadata  by @sayakpaul in #11324\r\n* Cosmos Predict2  by @a-r-r-o-w in #11695\r\n* Chroma Pipeline  by @Ednaordinary in #11698\r\n* [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers  by @sayakpaul in #11716\r\n* [training] show how metadata stuff should be incorporated in training scripts.  by @sayakpaul in #11707\r\n* Fix misleading comment  by @carlthome in #11722\r\n* Add Pruna optimization framework documentation  by @davidberenstein1957 in #11688\r\n* Support more Wan loras (VACE)  by @a-r-r-o-w in #11726\r\n* [LoRA training] update metadata use for lora alpha + README  by @linoytsaban in #11723\r\n* ⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886%  by @misrasaurabh1 in #11665\r\n* [training] add ds support to lora hidream  by @leisuzz in #11737\r\n* [tests] device_map tests for all models.  by @sayakpaul in #11708\r\n* [chore] change to 2025 licensing for remaining  by @sayakpaul in #11741\r\n* Chroma Follow Up   by @DN6 in #11725\r\n* [Quantizers] add `is_compileable` property to quantizers.  by @sayakpaul in #11736\r\n* Update more licenses to 2025  by @a-r-r-o-w in #11746\r\n* Add missing HiDream license  by @a-r-r-o-w in #11747\r\n* Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server  by @dependabot[bot] in #11748\r\n* [LoRA] refactor lora loading at the model-level  by @sayakpaul in #11719\r\n* [CI] Fix WAN VACE tests  by @DN6 in #11757\r\n* [CI] Fix SANA tests  by @DN6 in #11756\r\n* Fix HiDream pipeline test module   by @DN6 in #11754\r\n* make group offloading work with disk/nvme transfers  by @sayakpaul in #11682\r\n* Update Chroma Docs  by @DN6 in #11753\r\n* fix invalid component handling behaviour in `PipelineQuantizationConfig`  by @sayakpaul in #11750\r\n* Fix failing cpu offload test for LTX Latent Upscale  by @DN6 in #11755\r\n* [docs] Quantization + torch.compile + offloading  by @stevhliu in #11703\r\n* [docs] device_map  by @stevhliu in #11711\r\n* [docs] LoRA scale scheduling  by @stevhliu in #11727\r\n* Fix dimensionalities in `apply_rotary_emb` functions' comments  by @tolgacangoz in #11717\r\n* enable deterministic in bnb 4 bit tests  by @jiqing-feng in #11738\r\n* enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU  by @yao-matrix in #11671\r\n* [tests] properly skip tests instead of `return`  by @sayakpaul in #11771\r\n* [CI] Skip ONNX Upscale tests  by @DN6 in #11774\r\n* [Wan] Fix mask padding in Wan VACE pipeline.  by @bennyguo in #11778\r\n* Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py  by @imbr92 in #11744\r\n* [docs] minor cleanups in the lora docs.  by @sayakpaul in #11770\r\n* [lora] only remove hooks that we add back  by @yiyixuxu in #11768\r\n* [tests] Fix HunyuanVideo Framepack device tests  by @a-r-r-o-w in #11789\r\n* [chore] raise as early as possible in group offloading  by @sayakpaul in #11792\r\n* [tests] Fix group offloading and layerwise casting test interaction  by @a-r-r-o-w in #11796\r\n* guard omnigen processor.  by @sayakpaul in #11799\r\n* Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @yao-matrix\r\n    * fix test_vanilla_funetuning failure on XPU and A100 (#11263)\r\n    * make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)\r\n    * make test_dict_tuple_outputs_equivalent pass on XPU (#11265)\r\n    * make test_instant_style_multiple_masks pass on XPU (#11266)\r\n    * make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)\r\n    * make test_stable_diffusion_karras_sigmas pass on XPU (#11310)\r\n    * fix CPU offloading related fail cases on XPU (#11288)\r\n    * enable 2 test cases on XPU (#11332)\r\n    * enable group_offload cases and quanto cases on XPU (#11405)\r\n    * enable test_layerwise_casting_memory cases on XPU (#11406)\r\n    * enable 28 GGUF test cases on XPU (#11404)\r\n    * enable marigold_intrinsics cases on XPU (#11445)\r\n    * enable consistency test cases on XPU, all passed (#11446)\r\n    * enable unidiffuser test cases on xpu (#11444)\r\n    * make safe diffusion test cases pass on XPU and A100 (#11458)\r\n    * make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)\r\n    * enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)\r\n    * enable lora cases on XPU (#11506)\r\n    * enable 7 cases on XPU (#11503)\r\n    * enable dit integration cases on xpu (#11523)\r\n    * enable print_env on xpu (#11507)\r\n    * enable several pipeline integration tests on XPU (#11526)\r\n    * enhance value guard of _device_agnostic_dispatch (#11553)\r\n    * enable pipeline test cases on xpu (#11527)\r\n    * enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)\r\n    * enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)\r\n    * enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)\r\n* @hlky\r\n    * Fix LTX 0.9.5 single file (#11271)\r\n    * HiDream Image (#11231)\r\n    * Use float32 on mps or npu in transformer_hidream_image's rope (#11316)\r\n    * Fix vae.Decoder prev_output_channel (#11280)\r\n* @quickjkee\r\n    * flow matching lcm scheduler (#11170)\r\n* @ishan-modi\r\n    * [ControlNet] Adds controlnet for SanaTransformer (#11040)\r\n    * [BUG] fixed _toctree.yml alphabetical ordering (#11277)\r\n    * [BUG] fixes in kadinsky pipeline (#11080)\r\n    * [Refactor] Minor Improvement for import utils (#11161)\r\n    * [Feature] Added Xlab Controlnet support (#11249)\r\n    * [BUG] fixed WAN docstring (#11226)\r\n    * [Feature] AutoModel can load components using model_index.json (#11401)\r\n* @linoytsaban\r\n    * [HiDream] code example (#11317)\r\n    * [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)\r\n    * [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)\r\n    * [HiDream LoRA] optimizations + small updates (#11381)\r\n    * [Hi-Dream LoRA] fix bug in validation (#11439)\r\n    * [LoRA] make lora alpha and dropout configurable  (#11467)\r\n    * [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)\r\n    * [LoRA] support non-diffusers LTX-Video loras (#11572)\r\n    * [LoRA] kijai wan lora support for I2V (#11588)\r\n    * [training docs] smol update to README files  (#11616)\r\n    * [Sana Sprint] add image-to-image pipeline  (#11602)\r\n    * [LoRA training] update metadata use for lora alpha + README (#11723)\r\n* @hameerabbasi\r\n    * [LoRA] Add LoRA support to AuraFlow (#10216)\r\n* @DN6\r\n    * Fix Hunyuan I2V for `transformers>4.47.1`   (#11293)\r\n    * Hunyuan I2V fast tests fix (#11341)\r\n    * [Single File] GGUF/Single File Support for HiDream (#11550)\r\n    * [Single File] Fix loading for LTX 0.9.7 transformer (#11578)\r\n    * Type annotation fix (#11597)\r\n    * Fix mixed variant downloading (#11611)\r\n    * [CI] Some improvements to Nightly reports summaries (#11166)\r\n    * Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)\r\n    * Chroma Follow Up  (#11725)\r\n    * [CI] Fix WAN VACE tests (#11757)\r\n    * [CI] Fix SANA tests (#11756)\r\n    * Fix HiDream pipeline test module  (#11754)\r\n    * Update Chroma Docs (#11753)\r\n    * Fix failing cpu offload test for LTX Latent Upscale (#11755)\r\n    * [CI] Skip ONNX Upscale tests (#11774)\r\n* @yiyixuxu\r\n    * [Hi Dream] follow-up (#11296)\r\n    * support Wan-FLF2V (#11353)\r\n    * update output for Hidream transformer (#11366)\r\n    * [Wan2.1-FLF2V] update conversion script (#11365)\r\n    * [HiDream] move deprecation to 0.35.0 (#11384)\r\n    * clean up the __Init__ for stable_diffusion (#11500)\r\n    * [lora] only remove hooks that we add back (#11768)\r\n* @Teriks\r\n    * Kolors additional pipelines, community contrib (#11372)\r\n* @co63oc\r\n    * Fix typos in strings and comments (#11407)\r\n    * Fix typos in docs and comments (#11416)\r\n    * Fix typos in strings and comments (#11476)\r\n* @xduzhangjiayu\r\n    * Add StableDiffusion3InstructPix2PixPipeline (#11378)\r\n* @scxue\r\n    * Add cross attention type for Sana-Sprint training in diffusers. (#11514)\r\n* @lzyhha\r\n    * Add VisualCloze (#11377)\r\n* @b-sai\r\n    * RegionalPrompting: Inherit from Stable Diffusion (#11525)\r\n* @Ednaordinary\r\n    * Chroma Pipeline (#11698)\r\n","publishedAt":"2025-06-24T15:13:19.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.34.0","media":[]},{"id":"rel_6xiTUXms4SEka8cLupkWu","version":"v0.33.1","title":"v0.33.1: fix ftfy import","summary":"## All commits\r\n* fix ftfy import for wan pipelines by @yiyixuxu  in #11262","content":"## All commits\r\n* fix ftfy import for wan pipelines by @yiyixuxu  in #11262","publishedAt":"2025-04-10T05:38:36.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.33.1","media":[]},{"id":"rel_0Xwo8msOca6Tk968FwEnH","version":"v0.33.0","title":"Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more","summary":"## New Pipelines for Video Generation\r\n\r\n### Wan 2.1\r\n\r\nWan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries...","content":"## New Pipelines for Video Generation\r\n\r\n### Wan 2.1\r\n\r\nWan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.\r\n\r\n- `Wan-AI/Wan2.1-T2V-1.3B-Diffusers`\r\n- `Wan-AI/Wan2.1-T2V-14B-Diffusers`\r\n- `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers`\r\n- `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers`\r\n\r\nCheck out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan) to learn more. \r\n\r\n### LTX Video 0.9.5\r\n\r\nLTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward). \r\n\r\nTo support these additional conditioning inputs, we’ve introduced the `LTXConditionPipeline` and `LTXVideoCondition` object. \r\n\r\nTo learn more about the usage, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).\r\n\r\n### Hunyuan Image to Video\r\n\r\nHunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder.  The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.\r\n\r\nTo learn more, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).\r\n\r\n### Others\r\n\r\n- [EasyAnimateV5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/easyanimate) (thanks to @bubbliiiing for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10626))\r\n- [ConsisID](https://huggingface.co/docs/diffusers/main/en/using-diffusers/consisid) (thanks to @SHYuanBest for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10140))\r\n\r\n## New Pipelines for Image Generation\r\n\r\n### Sana-Sprint\r\n\r\nSANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.\r\n\r\nShoutout to @lawrence-cj for their help and guidance on [this PR](https://github.com/huggingface/diffusers/pull/11074).\r\n\r\nCheck out the [pipeline docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_sprint) of SANA-Sprint to learn more.\r\n\r\n### Lumina2\r\n\r\nLumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license. \r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2) to learn more. Thanks to @zhuole1025 for contributing this through [this PR](https://github.com/huggingface/diffusers/pull/10642).\r\n\r\nOne can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out [the guide](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_lumina2.md) for more details.\r\n\r\n### Omnigen\r\n\r\nOmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/omnigen) to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in [this PR](https://github.com/huggingface/diffusers/pull/10148).\r\n\r\n### Others\r\n\r\n- [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4) (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in [this PR](https://github.com/huggingface/diffusers/pull/10649))\r\n\r\n## New Memory Optimizations\r\n\r\n### Layerwise Casting\r\n\r\nPyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage `dtypes`, but they can’t be used for computation on many devices due to unimplemented kernel support. \r\n\r\nHowever, you can still use these `dtypes` to store model weights in FP8 precision and upcast them to a widely supported dtype such as `torch.float16` or `torch.bfloat16` on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.   \r\n\r\n<details>\r\n<summary>Code</summary>\r\n    \r\n```py\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel\r\nfrom diffusers.utils import export_to_video\r\n\r\nmodel_id = \"THUDM/CogVideoX-5b\"\r\n\r\n# Load the model in bfloat16 and enable layerwise casting\r\ntransformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder=\"transformer\", torch_dtype=torch.bfloat16)\r\ntransformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16)\r\n\r\n# Load the pipeline\r\npipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16)\r\npipe.to(\"cuda\")\r\n\r\nprompt = (\r\n    \"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. \"\r\n    \"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other \"\r\n    \"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, \"\r\n    \"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. \"\r\n    \"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical \"\r\n    \"atmosphere of this unique musical performance.\"\r\n)\r\nvideo = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]\r\nexport_to_video(video, \"output.mp4\", fps=8)\r\n```\r\n\r\n</details>\r\n\r\n### Group Offloading\r\n\r\nGroup offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced. \r\n\r\nOn CUDA devices, we also have the option to enable using layer prefetching with [CUDA Streams](https://pytorch.org/docs/stable/generated/torch.cuda.Stream.html). The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.\r\n\r\nOne thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set `use_stream=True`. You can reduce CPU RAM usage by setting `low_cpu_mem_usage=True`. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.  \r\n\r\nYou can also use `record_stream=True` when using `use_stream=True` to obtain more speedups at the expense of slightly increased memory usage.\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline\r\nfrom diffusers.utils import export_to_video\r\n\r\n# Load the pipeline\r\nonload_device = torch.device(\"cuda\")\r\noffload_device = torch.device(\"cpu\")\r\npipe = CogVideoXPipeline.from_pretrained(\"THUDM/CogVideoX-5b\", torch_dtype=torch.bfloat16)\r\n\r\n# We can utilize the enable_group_offload method for Diffusers model implementations\r\npipe.transformer.enable_group_offload(\r\n\tonload_device=onload_device, \r\n\toffload_device=offload_device, \r\n\toffload_type=\"leaf_level\", \r\n\tuse_stream=True\r\n)\r\n\r\nprompt = (\r\n    \"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. \"\r\n    \"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other \"\r\n    \"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, \"\r\n    \"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. \"\r\n    \"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical \"\r\n    \"atmosphere of this unique musical performance.\"\r\n)\r\nvideo = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]\r\n# This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline.\r\nprint(f\"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB\")\r\nexport_to_video(video, \"output.mp4\", fps=8)\r\n```\r\n\r\n</details>\r\n    \r\n\r\nGroup offloading can also be applied to non-Diffusers models such as text encoders from the `transformers` library.    \r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline\r\nfrom diffusers.hooks import apply_group_offloading\r\nfrom diffusers.utils import export_to_video\r\n\r\n# Load the pipeline\r\nonload_device = torch.device(\"cuda\")\r\noffload_device = torch.device(\"cpu\")\r\npipe = CogVideoXPipeline.from_pretrained(\"THUDM/CogVideoX-5b\", torch_dtype=torch.bfloat16)\r\n\r\n# For any other model implementations, the apply_group_offloading function can be used\r\napply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type=\"block_level\", num_blocks_per_group=2)\r\n```\r\n\r\n</details>\r\n    \r\n\r\n## Remote Components\r\n\r\nRemote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:\r\n\r\n| Model               | Endpoint                                                            | Model                                |\r\n|---------------------|---------------------------------------------------------------------|--------------------------------------|\r\n| Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud    | stabilityai/sd-vae-ft-mse            |\r\n| Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud    | madebyollin/sdxl-vae-fp16-fix          |\r\n| Flux                | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud    | black-forest-labs/FLUX.1-schnell       |\r\n| HunyuanVideo        | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud    | hunyuanvideo-community/HunyuanVideo   |\r\n\r\n\r\nThis is an example of using remote decoding with the Hunyuan Video pipeline:\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nfrom diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel\r\n\r\nmodel_id = \"hunyuanvideo-community/HunyuanVideo\"\r\ntransformer = HunyuanVideoTransformer3DModel.from_pretrained(\r\n    model_id, subfolder=\"transformer\", torch_dtype=torch.bfloat16\r\n)\r\npipe = HunyuanVideoPipeline.from_pretrained(\r\n    model_id, transformer=transformer, vae=None, torch_dtype=torch.float16\r\n).to(\"cuda\")\r\n\r\nlatent = pipe(\r\n    prompt=\"A cat walks on the grass, realistic\",\r\n    height=320,\r\n    width=512,\r\n    num_frames=61,\r\n    num_inference_steps=30,\r\n    output_type=\"latent\",\r\n).frames\r\n\r\nvideo = remote_decode(\r\n    endpoint=\"https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/\",\r\n    tensor=latent,\r\n    output_type=\"mp4\",\r\n)\r\n\r\nif isinstance(video, bytes):\r\n    with open(\"video.mp4\", \"wb\") as f:\r\n        f.write(video) \r\n```\r\n\r\n</details>\r\n\r\nCheck out [the docs](https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview) to know more.\r\n\r\n## Introducing Cached Inference for DiTs\r\n\r\nCached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/main/en/api/cache) to learn more about the available caching methods.\r\n\r\n**Pyramind Attention Broadcast**\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig\r\n\r\npipe = CogVideoXPipeline.from_pretrained(\"THUDM/CogVideoX-5b\", torch_dtype=torch.bfloat16)\r\npipe.to(\"cuda\")\r\n\r\nconfig = PyramidAttentionBroadcastConfig(\r\n    spatial_attention_block_skip_range=2,\r\n    spatial_attention_timestep_skip_range=(100, 800),\r\n    current_timestep_callback=lambda: pipe.current_timestep,\r\n)\r\npipe.transformer.enable_cache(config)\r\n```\r\n\r\n</details>\r\n\r\n**FasterCache**\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline, FasterCacheConfig\r\n\r\npipe = CogVideoXPipeline.from_pretrained(\"THUDM/CogVideoX-5b\", torch_dtype=torch.bfloat16)\r\npipe.to(\"cuda\")\r\n\r\nconfig = FasterCacheConfig(\r\n        spatial_attention_block_skip_range=2,\r\n        spatial_attention_timestep_skip_range=(-1, 901),\r\n        unconditional_batch_skip_range=2,\r\n        attention_weight_callback=lambda _: 0.5,\r\n        is_guidance_distilled=True,\r\n)\r\npipe.transformer.enable_cache(config)\r\n```\r\n\r\n</details>\r\n    \r\n\r\n## Quantization\r\n\r\n### Quanto Backend\r\n\r\nDiffusers now has support for the [Quanto quantization backend](https://huggingface.co/docs/diffusers/main/en/quantization/quanto), which provides `float8` , `int8` , `int4` and `int2` quantization dtypes.  \r\n\r\n```python\r\nimport torch\r\nfrom diffusers import FluxTransformer2DModel, QuantoConfig\r\n\r\nmodel_id = \"black-forest-labs/FLUX.1-dev\"\r\nquantization_config = QuantoConfig(weights_dtype=\"float8\")\r\ntransformer = FluxTransformer2DModel.from_pretrained(\r\n      model_id,\r\n      subfolder=\"transformer\",\r\n      quantization_config=quantization_config,\r\n      torch_dtype=torch.bfloat16,\r\n)\r\n```\r\n\r\nQuanto `int8` models are also compatible with `torch.compile` :\r\n\r\n<details>\r\n<summary>Code</summary>\r\n\r\n```py\r\nimport torch\r\nfrom diffusers import FluxTransformer2DModel, QuantoConfig\r\n\r\nmodel_id = \"black-forest-labs/FLUX.1-dev\"\r\nquantization_config = QuantoConfig(weights_dtype=\"float8\")\r\ntransformer = FluxTransformer2DModel.from_pretrained(\r\n      model_id,\r\n      subfolder=\"transformer\",\r\n      quantization_config=quantization_config,\r\n      torch_dtype=torch.bfloat16,\r\n)\r\ntransformer.compile()\r\n```\r\n\r\n</details>    \r\n\r\n### Improved loading for `uintx` TorchAO checkpoints with `torch>=2.6`\r\n\r\nTorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the `uintx` format, such as `uint4wo` this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.    \r\n\r\nTorch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects. \r\n\r\n```diff\r\n- state_dict = torch.load(\"/path/to/flux_uint4wo/diffusion_pytorch_model.bin\", weights_only=False, map_location=\"cpu\")\r\n- with init_empty_weights():\r\n-     transformer = FluxTransformer2DModel.from_config(\"/path/to/flux_uint4wo/config.json\")\r\n- transformer.load_state_dict(state_dict, strict=True, assign=True)\r\n+ transformer = FluxTransformer2DModel.from_pretrained(\"/path/to/flux_uint4wo/\")  \r\n```\r\n\r\n## LoRAs\r\n\r\nWe have shipped a couple of improvements on the LoRA front in this release.\r\n\r\n**🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux**\r\n\r\nTake note of the breaking change introduced in [this PR](https://github.com/huggingface/diffusers/pull/10985) 🚨 We suggest you upgrade your `peft` installation to the latest version - `pip install -U peft` especially when dealing with Flux LoRAs.\r\n\r\n**`torch.compile()` support when hotswapping LoRAs without triggering recompilation**\r\n\r\nA common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling [load_lora_weights()](https://huggingface.co/docs/diffusers/v0.33.0/en/api/loaders/lora#diffusers.loaders.StableDiffusionLoraLoaderMixin.load_lora_weights), set_adapters(), and possibly [delete_adapters()](https://huggingface.co/docs/diffusers/v0.33.0/en/api/loaders/peft#diffusers.loaders.PeftAdapterMixin.delete_adapters) to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.\r\n\r\nTo better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.\r\n\r\nCheck out the [docs](https://huggingface.co/docs/diffusers/en/using-diffusers/loading_adapters#hotswapping-lora-adapters) to learn more about this feature.\r\n\r\nThe other major change is the support for\r\n\r\n- Loading LoRAs into quantized model checkpoints\r\n\r\n## `dtype` Maps for Pipelines\r\n\r\nSince various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:\r\n\r\n```python\r\nfrom diffusers import HunyuanVideoPipeline\r\nimport torch\r\n\r\npipe = HunyuanVideoPipeline.from_pretrained(\r\n    \"hunyuanvideo-community/HunyuanVideo\",\r\n    torch_dtype={\"transformer\": torch.bfloat16, \"default\": torch.float16},\r\n)\r\nprint(pipe.transformer.dtype, pipe.vae.dtype)  # (torch.bfloat16, torch.float16)\r\n```\r\n\r\n## AutoModel\r\n\r\nThis release includes an AutoModel object similar to the one found in `transformers` that automatically fetches the appropriate model class for the provided repo. \r\n\r\n```python\r\nfrom diffusers import AutoModel\r\n\r\nunet = AutoModel.from_pretrained(\"runwayml/stable-diffusion-v1-5\", subfolder=\"unet\")\r\n```\r\n\r\n## All commits\r\n\r\n* [Sana 4K] Add vae tiling option to avoid OOM  by @leisuzz in #10583\r\n* IP-Adapter for `StableDiffusion3Img2ImgPipeline`  by @guiyrt in #10589\r\n* [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16  by @chenjy2003 in #10595\r\n* Move buffers to device  by @hlky in #10523\r\n* [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint  by @guiyrt in #10597\r\n* Scheduling fixes on MPS  by @hlky in #10549\r\n* [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo  by @chengzeyi in #10544\r\n* NPU adaption for RMSNorm  by @leisuzz in #10534\r\n* implementing flux on TPUs with ptxla  by @entrpn in #10515\r\n* [core] ConsisID  by @SHYuanBest in #10140\r\n* [training] set rest of the blocks with `requires_grad` False.  by @sayakpaul in #10607\r\n* chore: remove redundant words  by @sunxunle in #10609\r\n* bugfix for npu not support float64  by @baymax591 in #10123\r\n* [chore] change licensing to 2025 from 2024.  by @sayakpaul in #10615\r\n* Enable dreambooth lora finetune example on other devices  by @jiqing-feng in #10602\r\n* Remove the FP32 Wrapper when evaluating  by @lmxyy in #10617\r\n* [tests] make tests device-agnostic (part 3)   by @faaany in #10437\r\n* fix offload gpu tests etc  by @yiyixuxu in #10366\r\n* Remove cache migration script  by @Wauplin in #10619\r\n* [core] Layerwise Upcasting  by @a-r-r-o-w in #10347\r\n* Improve TorchAO error message  by @a-r-r-o-w in #10627\r\n* [CI] Update HF_TOKEN in all workflows  by @DN6 in #10613\r\n* add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py  by @kahmed10 in #10624\r\n* [Tests] modify the test slices for the failing flax test  by @sayakpaul in #10630\r\n* [docs] fix image path in para attention docs  by @sayakpaul in #10632\r\n* [docs] uv installation  by @stevhliu in #10622\r\n* width and height are mixed-up  by @raulc0399 in #10629\r\n* Add IP-Adapter example to Flux docs  by @hlky in #10633\r\n* removing redundant requires_grad = False  by @YanivDorGalron in #10628\r\n* [chore] add a script to extract loras from full fine-tuned models  by @sayakpaul in #10631\r\n* Add pipeline_stable_diffusion_xl_attentive_eraser  by @Anonym0u3 in #10579\r\n* NPU Adaption for Sanna  by @leisuzz in #10409\r\n* Add sigmoid scheduler in `scheduling_ddpm.py` docs  by @JacobHelwig in #10648\r\n* create a script to train autoencoderkl  by @lavinal712 in #10605\r\n* Add community pipeline for semantic guidance for FLUX  by @Marlon154 in #10610\r\n* ControlNet Union controlnet_conditioning_scale for multiple control inputs  by @hlky in #10666\r\n* [training] Convert to ImageFolder script  by @hlky in #10664\r\n* Add provider_options to OnnxRuntimeModel  by @hlky in #10661\r\n* fix check_inputs func in LuminaText2ImgPipeline  by @victolee0 in #10651\r\n* SDXL ControlNet Union pipelines, make control_image argument immutible  by @Teriks in #10663\r\n* Revert RePaint scheduler 'fix'  by @GiusCat in #10644\r\n* [core] Pyramid Attention Broadcast  by @a-r-r-o-w in #9562\r\n* [fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode  by @hanchchch in #10600\r\n* Refactor gradient checkpointing  by @a-r-r-o-w in #10611\r\n* [Tests] conditionally check `fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory`   by @sayakpaul in #10669\r\n* Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode  by @dimitribarbot in #10670\r\n* [tests] update llamatokenizer in hunyuanvideo tests  by @sayakpaul in #10681\r\n* support StableDiffusionAdapterPipeline.from_single_file  by @Teriks in #10552\r\n* fix(hunyuan-video): typo in height and width input check  by @badayvedat in #10684\r\n* [FIX] check_inputs function in Auraflow Pipeline  by @SahilCarterr in #10678\r\n* Fix enable memory efficient attention on ROCm  by @tenpercent in #10564\r\n* Fix inconsistent random transform in instruct pix2pix  by @Luvata in #10698\r\n* feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling  by @badayvedat in #10699\r\n* Fixed grammar in \"write_own_pipeline\" readme  by @N0-Flux-given in #10706\r\n* Fix Documentation about Image-to-Image Pipeline  by @ParagEkbote in #10704\r\n* [bitsandbytes] Simplify bnb int8 dequant  by @sayakpaul in #10401\r\n* Fix train_text_to_image.py --help  by @nkthiebaut in #10711\r\n* Notebooks for Community Scripts-6  by @ParagEkbote in #10713\r\n* [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference  by @SahilCarterr in #10714\r\n* add provider_options in from_pretrained  by @xieofxie in #10719\r\n* [Community] Enhanced `Model Search`  by @suzukimain in #10417\r\n* [bugfix] NPU Adaption for Sana  by @leisuzz in #10724\r\n* Quantized Flux with IP-Adapter  by @hlky in #10728\r\n* EDMEulerScheduler accept sigmas, add final_sigmas_type  by @hlky in #10734\r\n* [LoRA] fix peft state dict parsing  by @sayakpaul in #10532\r\n* Add `Self` type hint to `ModelMixin`'s `from_pretrained`  by @hlky in #10742\r\n* [Tests] Test layerwise casting with training  by @sayakpaul in #10765\r\n* speedup hunyuan encoder causal mask generation  by @dabeschte in #10764\r\n* [CI] Fix Truffle Hog failure  by @DN6 in #10769\r\n* Add OmniGen  by @staoxiao in #10148\r\n* feat: new community mixture_tiling_sdxl pipeline for SDXL  by @elismasilva in #10759\r\n* Add support for lumina2  by @zhuole1025 in #10642\r\n* Refactor OmniGen  by @a-r-r-o-w in #10771\r\n* Faster set_adapters  by @Luvata in #10777\r\n* [Single File] Add Single File support for Lumina Image 2.0 Transformer  by @DN6 in #10781\r\n* Fix `use_lu_lambdas` and `use_karras_sigmas` with `beta_schedule=squaredcos_cap_v2` in `DPMSolverMultistepScheduler`  by @hlky in #10740\r\n* `MultiControlNetUnionModel` on SDXL  by @guiyrt in #10747\r\n* fix: [Community pipeline] Fix flattened elements on image   by @elismasilva in #10774\r\n* make tensors contiguous before passing to safetensors  by @faaany in #10761\r\n* Disable PEFT input autocast when using fp8 layerwise casting  by @a-r-r-o-w in #10685\r\n* Update FlowMatch docstrings to mention correct output classes  by @a-r-r-o-w in #10788\r\n* Refactor CogVideoX transformer forward  by @a-r-r-o-w in #10789\r\n* Module Group Offloading  by @a-r-r-o-w in #10503\r\n* Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791  by @puhuk in #10792\r\n* [FIX] check_inputs function in lumina2  by @SahilCarterr in #10784\r\n* follow-up refactor on lumina2  by @yiyixuxu in #10776\r\n* CogView4 (supports different length c and uc)  by @zRzRzRzRzRzRzR in #10649\r\n* typo fix  by @YanivDorGalron in #10802\r\n* Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines  by @ParagEkbote in #10746\r\n* [chore] update notes generation spaces  by @sayakpaul in #10592\r\n* [LoRA] improve lora support for flux.  by @sayakpaul in #10810\r\n* Fix max_shift value in flux and related functions to 1.15 (issue #10675)  by @puhuk in #10807\r\n* [docs] add missing entries to the lora docs.  by @sayakpaul in #10819\r\n* DiffusionPipeline mixin `to`+FromOriginalModelMixin/FromSingleFileMixin `from_single_file` type hint  by @hlky in #10811\r\n* [LoRA] make `set_adapters()` robust on silent failures.  by @sayakpaul in #9618\r\n* [FEAT] Model loading refactor  by @SunMarc in #10604\r\n* [misc] feat: introduce a style bot.  by @sayakpaul in #10274\r\n* Remove print statements  by @a-r-r-o-w in #10836\r\n* [tests] use proper gemma class and config in lumina2 tests.  by @sayakpaul in #10828\r\n* [LoRA] add LoRA support to Lumina2 and fine-tuning script  by @sayakpaul in #10818\r\n* [Utils] add utilities for checking if certain utilities are properly documented  by @sayakpaul in #7763\r\n* Add missing `isinstance` for arg checks in GGUFParameter  by @AstraliteHeart in #10834\r\n* [tests] test `encode_prompt()` in isolation  by @sayakpaul in #10438\r\n* store activation cls instead of function  by @SunMarc in #10832\r\n* fix: support transformer models' `generation_config` in pipeline  by @JeffersonQin in #10779\r\n* Notebooks for Community Scripts-7  by @ParagEkbote in #10846\r\n* [CI] install accelerate transformers from `main`  by @sayakpaul in #10289\r\n* [CI] run fast gpu tests conditionally on pull requests.  by @sayakpaul in #10310\r\n* SD3 IP-Adapter runtime checkpoint conversion  by @guiyrt in #10718\r\n* Some consistency-related fixes for HunyuanVideo  by @a-r-r-o-w in #10835\r\n* SkyReels Hunyuan T2V & I2V  by @a-r-r-o-w in #10837\r\n* fix: run tests from a pr workflow.  by @sayakpaul in #9696\r\n* [chore] template for remote vae.  by @sayakpaul in #10849\r\n* fix remote vae template  by @sayakpaul in #10852\r\n* [CI] Fix incorrectly named test module for Hunyuan DiT  by @DN6 in #10854\r\n* [CI] Update always test Pipelines list in Pipeline fetcher  by @DN6 in #10856\r\n* `device_map` in `load_model_dict_into_meta`  by @hlky in #10851\r\n* [Fix] Docs overview.md  by @SahilCarterr in #10858\r\n* remove format check for safetensors file  by @SunMarc in #10864\r\n* [docs] LoRA support  by @stevhliu in #10844\r\n* Comprehensive type checking for `from_pretrained` kwargs  by @guiyrt in #10758\r\n* Fix `torch_dtype` in Kolors text encoder with `transformers` v4.49  by @hlky in #10816\r\n* [LoRA] restrict certain keys to be checked for peft config update.  by @sayakpaul in #10808\r\n* Add SD3 ControlNet to AutoPipeline  by @hlky in #10888\r\n* [docs] Update prompt weighting docs  by @stevhliu in #10843\r\n* [docs] Flux group offload  by @stevhliu in #10847\r\n* [Fix] fp16 unscaling in train_dreambooth_lora_sdxl  by @SahilCarterr in #10889\r\n* [docs] Add CogVideoX Schedulers  by @a-r-r-o-w in #10885\r\n* [chore] correct qk norm list.  by @sayakpaul in #10876\r\n* [Docs] Fix toctree sorting   by @DN6 in #10894\r\n* [refactor] SD3 docs & remove additional code  by @a-r-r-o-w in #10882\r\n* [refactor] Remove additional Flux code  by @a-r-r-o-w in #10881\r\n* [CI] Improvements to conditional GPU PR tests  by @DN6 in #10859\r\n* Multi IP-Adapter for Flux pipelines  by @guiyrt in #10867\r\n* Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing \"controlnet_image\".  by @CyberVy in #10880\r\n* Security fix  by @ydshieh in #10905\r\n* Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation  by @toshas in #10884\r\n* [Tests] fix: lumina2 lora fuse_nan test  by @sayakpaul in #10911\r\n* Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements.  by @CyberVy in #10907\r\n* [CI] Fix Fast GPU tests on PR  by @DN6 in #10912\r\n* [CI] Fix for failing IP Adapter test in Fast GPU PR tests  by @DN6 in #10915\r\n* Experimental per control type scale for ControlNet Union  by @hlky in #10723\r\n* [style bot] improve security for the stylebot.  by @sayakpaul in #10908\r\n* [CI] Update Stylebot Permissions  by @DN6 in #10931\r\n* [Alibaba Wan Team] continue on #10921 Wan2.1  by @yiyixuxu in #10922\r\n* Support IPAdapter for more Flux pipelines  by @hlky in #10708\r\n* Add `remote_decode` to `remote_utils`  by @hlky in #10898\r\n* Update VAE Decode endpoints  by @hlky in #10939\r\n* [chore] fix-copies to flux pipelines  by @sayakpaul in #10941\r\n* [Tests] Remove more encode prompts tests  by @sayakpaul in #10942\r\n* Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model   by @bubbliiiing in #10626\r\n* Fix SD2.X clip single file load projection_dim  by @Teriks in #10770\r\n* add from_single_file to animatediff   by @<NOT FOUND> in #10924\r\n* Add Example of IPAdapterScaleCutoffCallback to Docs  by @ParagEkbote in #10934\r\n* Update pipeline_cogview4.py  by @zRzRzRzRzRzRzR in #10944\r\n* Fix redundant prev_output_channel assignment in UNet2DModel  by @ahmedbelgacem in #10945\r\n* Improve load_ip_adapter RAM Usage  by @CyberVy in #10948\r\n* [tests] make tests device-agnostic (part 4)  by @faaany in #10508\r\n* Update evaluation.md  by @sayakpaul in #10938\r\n* [LoRA] feat: support non-diffusers lumina2 LoRAs.  by @sayakpaul in #10909\r\n* [Quantization]  support pass MappingType for TorchAoConfig  by @a120092009 in #10927\r\n* Fix the missing parentheses when calling is_torchao_available in quantization_config.py.  by @CyberVy in #10961\r\n* [LoRA] Support Wan  by @a-r-r-o-w in #10943\r\n* Fix incorrect seed initialization when args.seed is 0  by @azolotenkov in #10964\r\n* feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL  by @elismasilva in #10951\r\n* [Docs] CogView4 comment fix  by @zRzRzRzRzRzRzR in #10957\r\n* update check_input for cogview4  by @yiyixuxu in #10966\r\n* Add VAE Decode endpoint slow test  by @hlky in #10946\r\n* [flux lora training] fix t5 training bug  by @linoytsaban in #10845\r\n* use style bot GH Action from `huggingface_hub`  by @hanouticelina in #10970\r\n* [train_dreambooth_lora.py] Fix the LR Schedulers when `num_train_epochs` is passed in a distributed training env  by @flyxiv in #10973\r\n* [tests] fix tests for save load components  by @sayakpaul in #10977\r\n* Fix loading OneTrainer Flux LoRA  by @hlky in #10978\r\n* fix default values of Flux guidance_scale in docstrings  by @catwell in #10982\r\n* [CI] remove synchornized.  by @sayakpaul in #10980\r\n* Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill  by @dependabot[bot] in #10984\r\n* Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements  by @CyberVy in #10974\r\n* [Single File] Add user agent to SF download requests.   by @DN6 in #10979\r\n* Add CogVideoX DDIM Inversion to Community Pipelines  by @LittleNyima in #10956\r\n* fix wan i2v pipeline bugs  by @yupeng1111 in #10975\r\n* Hunyuan I2V  by @a-r-r-o-w in #10983\r\n* Fix Graph Breaks When Compiling CogView4  by @chengzeyi in #10959\r\n* Wan VAE move scaling to pipeline  by @hlky in #10998\r\n* [LoRA] remove full key prefix from peft.  by @sayakpaul in #11004\r\n* [Single File] Add single file support for Wan T2V/I2V  by @DN6 in #10991\r\n* Add STG to community pipelines  by @kinam0252 in #10960\r\n* [LoRA] Improve copied from comments in the LoRA loader classes  by @sayakpaul in #10995\r\n* Fix for fetching variants only  by @DN6 in #10646\r\n* [Quantization] Add Quanto backend  by @DN6 in #10756\r\n* [Single File] Add single file loading for SANA Transformer  by @ishan-modi in #10947\r\n* [LoRA] Improve warning messages when LoRA loading becomes a no-op  by @sayakpaul in #10187\r\n* [LoRA] CogView4  by @a-r-r-o-w in #10981\r\n* [Tests] improve quantization tests by additionally measuring the inference memory savings  by @sayakpaul in #11021\r\n* [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing  by @tolgacangoz in #8998\r\n* [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6   by @DN6 in #11018\r\n* fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings   by @elismasilva in #11012\r\n* [LoRA] support wan i2v loras from the world.  by @sayakpaul in #11025\r\n* Fix SD3 IPAdapter feature extractor  by @hlky in #11027\r\n* chore: fix help messages in advanced diffusion examples  by @wonderfan in #10923\r\n* Fix missing **kwargs in lora_pipeline.py  by @CyberVy in #11011\r\n* Fix for multi-GPU WAN inference  by @AmericanPresidentJimmyCarter in #10997\r\n* [Refactor] Clean up import utils boilerplate  by @DN6 in #11026\r\n* Use `output_size` in `repeat_interleave`  by @hlky in #11030\r\n* [hybrid inference 🍯🐝] Add VAE encode  by @hlky in #11017\r\n* Wan Pipeline scaling fix, type hint warning, multi generator fix  by @hlky in #11007\r\n* [LoRA] change to warning from info when notifying the users about a LoRA no-op  by @sayakpaul in #11044\r\n* Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline  by @hlky in #10827\r\n* making ```formatted_images``` initialization compact  by @YanivDorGalron in #10801\r\n* Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed  by @ZhengKai91 in #10820\r\n* [Tests] restrict memory tests for quanto for certain schemes.  by @sayakpaul in #11052\r\n* [LoRA] feat: support non-diffusers wan t2v loras.  by @sayakpaul in #11059\r\n* [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch  by @andjoer in #11051\r\n* reverts accidental change that removes attn_mask in attn. Improves fl…  by @entrpn in #11065\r\n* Fix deterministic issue when getting pipeline dtype and device  by @dimitribarbot in #10696\r\n* [Tests] add requires peft decorator.  by @sayakpaul in #11037\r\n* CogView4 Control Block  by @zRzRzRzRzRzRzR in #10809\r\n* [CI] pin transformers version for benchmarking.  by @sayakpaul in #11067\r\n* Fix Wan I2V Quality  by @chengzeyi in #11087\r\n* LTX 0.9.5  by @a-r-r-o-w in #10968\r\n* make PR GPU tests conditioned on styling.  by @sayakpaul in #11099\r\n* Group offloading improvements  by @a-r-r-o-w in #11094\r\n* Fix pipeline_flux_controlnet.py  by @co63oc in #11095\r\n* update readme instructions.  by @entrpn in #11096\r\n* Resolve stride mismatch in UNet's ResNet to support Torch DDP  by @jinc7461 in #11098\r\n* Fix Group offloading behaviour when using streams  by @a-r-r-o-w in #11097\r\n* Quality options in `export_to_video`  by @hlky in #11090\r\n* [CI] uninstall deps properly from pr gpu tests.  by @sayakpaul in #11102\r\n* [BUG] Fix Autoencoderkl train script  by @lavinal712 in #11113\r\n* [Wan LoRAs] make T2V LoRAs compatible with Wan I2V  by @linoytsaban in #11107\r\n* [tests] enable bnb tests on xpu  by @faaany in #11001\r\n* [fix bug] PixArt inference_steps=1  by @lawrence-cj in #11079\r\n* Flux with Remote Encode  by @hlky in #11091\r\n* [tests] make cuda only tests device-agnostic   by @faaany in #11058\r\n* Provide option to reduce CPU RAM usage in Group Offload  by @DN6 in #11106\r\n* remove F.rms_norm for now  by @yiyixuxu in #11126\r\n* Notebooks for Community Scripts-8  by @ParagEkbote in #11128\r\n* fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements  by @CyberVy in #11073\r\n* [core] FasterCache  by @a-r-r-o-w in #10163\r\n* add sana-sprint  by @yiyixuxu in #11074\r\n* Don't override `torch_dtype` and don't use when `quantization_config` is set  by @hlky in #11039\r\n* Update README and example code for AnyText usage  by @tolgacangoz in #11028\r\n* Modify the implementation of retrieve_timesteps in CogView4-Control.  by @zRzRzRzRzRzRzR in #11125\r\n* [fix SANA-Sprint]  by @lawrence-cj in #11142\r\n* New HunyuanVideo-I2V  by @a-r-r-o-w in #11066\r\n* [doc] Fix Korean Controlnet Train doc  by @flyxiv in #11141\r\n* Improve information about group offloading and layerwise casting  by @a-r-r-o-w in #11101\r\n* add a timestep scale for sana-sprint teacher model  by @lawrence-cj in #11150\r\n* [Quantization] dtype fix for GGUF + fix BnB tests  by @DN6 in #11159\r\n* Set self._hf_peft_config_loaded to True when LoRA is loaded using `load_lora_adapter` in PeftAdapterMixin class  by @kentdan3msu in #11155\r\n* WanI2V encode_image  by @hlky in #11164\r\n* [Docs] Update Wan Docs with memory optimizations  by @DN6 in #11089\r\n* Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions  by @hlky in #11139\r\n* Raise warning and round down if Wan num_frames is not 4k + 1  by @a-r-r-o-w in #11167\r\n* [Docs] Fix environment variables in `installation.md`  by @remarkablemark in #11179\r\n* Add `latents_mean` and `latents_std` to `SDXLLongPromptWeightingPipeline`  by @hlky in #11034\r\n* Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set  by @kakukakujirori in #10918\r\n* [tests] no hard-coded cuda   by @faaany in #11186\r\n* [WIP] Add Wan Video2Video  by @DN6 in #11053\r\n* map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU  by @yao-matrix in #11191\r\n* fix autocast  by @jiqing-feng in #11190\r\n* fix: for checking mandatory and optional pipeline components  by @elismasilva in #11189\r\n* remove unnecessary call to `F.pad`  by @bm-synth in #10620\r\n* allow models to run with a user-provided dtype map instead of a single dtype  by @hlky in #10301\r\n* [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU  by @faaany in #11197\r\n* Revert `save_model` in ModelMixin save_pretrained and use safe_serialization=False in test  by @hlky in #11196\r\n* [docs] `torch_dtype` map  by @hlky in #11194\r\n* Fix enable_sequential_cpu_offload in CogView4Pipeline  by @hlky in #11195\r\n* SchedulerMixin from_pretrained and ConfigMixin Self type annotation  by @hlky in #11192\r\n* Update import_utils.py  by @Lakshaysharma048 in #10329\r\n* Add CacheMixin to Wan and LTX Transformers  by @DN6 in #11187\r\n* feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline  by @elismasilva in #11188\r\n* [Model Card] standardize advanced diffusion training sdxl lora  by @chiral-carbon in #7615\r\n* Change KolorsPipeline LoRA Loader to StableDiffusion  by @BasileLewan in #11198\r\n* Update Style Bot workflow  by @hanouticelina in #11202\r\n* Fixed requests.get function call by adding timeout parameter.  by @kghamilton89 in #11156\r\n* Fix Single File loading for LTX VAE   by @DN6 in #11200\r\n* [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill)  by @Suprhimp in #10603\r\n* [LTX0.9.5] Refactor `LTXConditionPipeline` for text-only conditioning  by @tolgacangoz in #11174\r\n* Add Wan with STG as a community pipeline  by @Ednaordinary in #11184\r\n* Add missing MochiEncoder3D.gradient_checkpointing attribute  by @mjkvaak-amd in #11146\r\n* enable 1 case on XPU  by @yao-matrix in #11219\r\n* ensure dtype match between diffused latents and vae weights  by @heyalexchoi in #8391\r\n* [docs] MPS update  by @stevhliu in #11212\r\n* Add support to pass image embeddings to the WAN I2V pipeline.  by @goiri in #11175\r\n* [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env  by @Bhavay-2001 in #8461\r\n* [Training] Better image interpolation in training scripts  by @asomoza in #11206\r\n* [LoRA] Implement hot-swapping of LoRA  by @BenjaminBossan in #9453\r\n* introduce compute arch specific expectations and fix test_sd3_img2img_inference failure  by @yao-matrix in #11227\r\n* [Flux LoRA] fix issues in flux lora scripts  by @linoytsaban in #11111\r\n* Flux quantized with lora  by @hlky in #10990\r\n* [feat] implement `record_stream` when using CUDA streams during group offloading  by @sayakpaul in #11081\r\n* [bistandbytes] improve replacement warnings for bnb  by @sayakpaul in #11132\r\n* minor update to sana sprint docs.  by @sayakpaul in #11236\r\n* [docs] minor updates to dtype map docs.  by @sayakpaul in #11237\r\n* [LoRA] support more comyui loras for Flux 🚨  by @sayakpaul in #10985\r\n* fix: SD3 ControlNet validation so that it runs on a A100.  by @sayakpaul in #11238\r\n* AudioLDM2 Fixes  by @hlky in #11244\r\n* AutoModel  by @hlky in #11115\r\n* fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU  by @yao-matrix in #11245\r\n* [docs] AutoModel  by @hlky in #11250\r\n* Update Ruff to latest Version  by @DN6 in #10919\r\n* fix flux controlnet bug  by @free001style in #11152\r\n* fix timeout constant  by @sayakpaul in #11252\r\n* fix consisid imports  by @sayakpaul in #11254\r\n* Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @guiyrt\r\n    * IP-Adapter for `StableDiffusion3Img2ImgPipeline` (#10589)\r\n    * [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)\r\n    * `MultiControlNetUnionModel` on SDXL (#10747)\r\n    * SD3 IP-Adapter runtime checkpoint conversion (#10718)\r\n    * Comprehensive type checking for `from_pretrained` kwargs (#10758)\r\n    * Multi IP-Adapter for Flux pipelines (#10867)\r\n* @chengzeyi\r\n    * [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)\r\n    * Fix Graph Breaks When Compiling CogView4 (#10959)\r\n    * Fix Wan I2V Quality (#11087)\r\n* @entrpn\r\n    * implementing flux on TPUs with ptxla (#10515)\r\n    * reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)\r\n    * update readme instructions. (#11096)\r\n* @SHYuanBest\r\n    * [core] ConsisID (#10140)\r\n* @faaany\r\n    * [tests] make tests device-agnostic (part 3)  (#10437)\r\n    * make tensors contiguous before passing to safetensors (#10761)\r\n    * [tests] make tests device-agnostic (part 4) (#10508)\r\n    * [tests] enable bnb tests on xpu (#11001)\r\n    * [tests] make cuda only tests device-agnostic  (#11058)\r\n    * [tests] no hard-coded cuda  (#11186)\r\n    * [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)\r\n* @yiyixuxu\r\n    * fix offload gpu tests etc (#10366)\r\n    * follow-up refactor on lumina2 (#10776)\r\n    * [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)\r\n    * update check_input for cogview4 (#10966)\r\n    * remove F.rms_norm for now (#11126)\r\n    * add sana-sprint (#11074)\r\n* @DN6\r\n    * [CI] Update HF_TOKEN in all workflows (#10613)\r\n    * [CI] Fix Truffle Hog failure (#10769)\r\n    * [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)\r\n    * [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)\r\n    * [CI] Update always test Pipelines list in Pipeline fetcher (#10856)\r\n    * [Docs] Fix toctree sorting  (#10894)\r\n    * [CI] Improvements to conditional GPU PR tests (#10859)\r\n    * [CI] Fix Fast GPU tests on PR (#10912)\r\n    * [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)\r\n    * [CI] Update Stylebot Permissions (#10931)\r\n    * [Single File] Add user agent to SF download requests.  (#10979)\r\n    * [Single File] Add single file support for Wan T2V/I2V (#10991)\r\n    * Fix for fetching variants only (#10646)\r\n    * [Quantization] Add Quanto backend (#10756)\r\n    * [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6  (#11018)\r\n    * [Refactor] Clean up import utils boilerplate (#11026)\r\n    * Provide option to reduce CPU RAM usage in Group Offload (#11106)\r\n    * [Quantization] dtype fix for GGUF + fix BnB tests (#11159)\r\n    * [Docs] Update Wan Docs with memory optimizations (#11089)\r\n    * [WIP] Add Wan Video2Video (#11053)\r\n    * Add CacheMixin to Wan and LTX Transformers (#11187)\r\n    * Fix Single File loading for LTX VAE  (#11200)\r\n    * Update Ruff to latest Version (#10919)\r\n* @Anonym0u3\r\n    * Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)\r\n* @lavinal712\r\n    * create a script to train autoencoderkl (#10605)\r\n    * [BUG] Fix Autoencoderkl train script (#11113)\r\n* @Marlon154\r\n    * Add community pipeline for semantic guidance for FLUX (#10610)\r\n* @ParagEkbote\r\n    * Fix Documentation about Image-to-Image Pipeline (#10704)\r\n    * Notebooks for Community Scripts-6 (#10713)\r\n    * Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)\r\n    * Notebooks for Community Scripts-7 (#10846)\r\n    * Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)\r\n    * Notebooks for Community Scripts-8 (#11128)\r\n* @suzukimain\r\n    * [Community] Enhanced `Model Search` (#10417)\r\n* @staoxiao\r\n    * Add OmniGen (#10148)\r\n* @elismasilva\r\n    * feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)\r\n    * fix: [Community pipeline] Fix flattened elements on image  (#10774)\r\n    * feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)\r\n    * fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings  (#11012)\r\n    * fix: for checking mandatory and optional pipeline components (#11189)\r\n    * feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)\r\n* @zhuole1025\r\n    * Add support for lumina2 (#10642)\r\n* @zRzRzRzRzRzRzR\r\n    * CogView4 (supports different length c and uc) (#10649)\r\n    * Update pipeline_cogview4.py (#10944)\r\n    * [Docs] CogView4 comment fix (#10957)\r\n    * CogView4 Control Block (#10809)\r\n    * Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)\r\n* @toshas\r\n    * Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)\r\n* @bubbliiiing\r\n    * Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model  (#10626)\r\n* @LittleNyima\r\n    * Add CogVideoX DDIM Inversion to Community Pipelines (#10956)\r\n* @kinam0252\r\n    * Add STG to community pipelines (#10960)\r\n* @tolgacangoz\r\n    * [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)\r\n    * Update README and example code for AnyText usage (#11028)\r\n    * [LTX0.9.5] Refactor `LTXConditionPipeline` for text-only conditioning (#11174)\r\n* @Ednaordinary\r\n    * Add Wan with STG as a community pipeline (#11184)","publishedAt":"2025-04-09T13:37:21.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.33.0","media":[]},{"id":"rel_h6fVteyB5dOgeakqsDTLb","version":"v0.32.2","title":"v0.32.2","summary":"# Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video \r\n\r\nThis patch release\r\n\r\n- Fixes a regression in loading Comfy UI...","content":"# Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video \r\n\r\nThis patch release\r\n\r\n- Fixes a regression in loading Comfy UI format single file checkpoints for Flux\r\n- Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models\r\n- Adds `unload_lora_weights` for Flux Control\r\n- Fixes a bug that prevents Hunyuan Video from running with batch size > 1\r\n- Allow Hunyuan Video to load LoRAs created from the original repository code \r\n\r\n## All commits\r\n* [Single File] Fix loading Flux Dev finetunes with Comfy Prefix  by @DN6 in #10545\r\n* [CI] Update HF Token on Fast GPU Model Tests by @DN6  #10570\r\n* [CI] Update HF Token in Fast GPU Tests by @DN6 #10568\r\n* Fix batch > 1 in HunyuanVideo  by @hlky in #10548\r\n* Fix HunyuanVideo produces NaN on PyTorch<2.5  by @hlky in #10482\r\n* Fix hunyuan video attention mask dim  by @a-r-r-o-w in #10454\r\n* [LoRA] Support original format loras for HunyuanVideo  by @a-r-r-o-w in #10376\r\n* [LoRA] feat: support loading loras into 4bit quantized Flux models.  by @sayakpaul in #10578\r\n* [LoRA] clean up `load_lora_into_text_encoder()` and `fuse_lora()` copied from  by @sayakpaul in #10495 \r\n* [LoRA] feat: support `unload_lora_weights()` for Flux Control.  by @sayakpaul in #10206\r\n* Fix Flux multiple Lora loading bug  by @maxs-kan in #10388\r\n* [LoRA] fix: lora unloading when using expanded Flux LoRAs.  by @sayakpaul in #10397\r\n\r\n\r\n\r\n\r\n ","publishedAt":"2025-01-15T16:46:08.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.32.2","media":[]},{"id":"rel_XbfqQqPIMzvPSUut9peIP","version":"v0.32.1","title":"v0.32.1","summary":"# TorchAO Quantizer fixes\r\n\r\nThis patch release fixes a few bugs related to the TorchAO Quantizer introduced in [v0.32.0]().\r\n\r\n- Importing Diffusers ...","content":"# TorchAO Quantizer fixes\r\n\r\nThis patch release fixes a few bugs related to the TorchAO Quantizer introduced in [v0.32.0]().\r\n\r\n- Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.\r\n- Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.\r\n- Quantization was not performed due to faulty logic. This is now fixed and better tested.\r\n\r\nRefer to our [documentation](https://huggingface.co/docs/diffusers/) to learn more about how to use different quantization backends.\r\n\r\n## All commits\r\n\r\n* make style for https://github.com/huggingface/diffusers/pull/10368  by @yiyixuxu in #10370\r\n* fix test pypi installation in the release workflow  by @sayakpaul in #10360\r\n* Fix TorchAO related bugs; revert device_map changes  by @a-r-r-o-w in #10371","publishedAt":"2024-12-25T12:34:54.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.32.1","media":[]},{"id":"rel_D_iX4wMgv0tc2RG_OOLMv","version":"v0.32.0","title":"Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more","summary":"https://github.com/user-attachments/assets/34d5f7ca-8e33-4401-8109-5c245ce7595f\r\n\r\nThis release took a while, but it has many exciting updates. It con...","content":"https://github.com/user-attachments/assets/34d5f7ca-8e33-4401-8109-5c245ce7595f\r\n\r\nThis release took a while, but it has many exciting updates. It contains several new pipelines for image and video generation, new quantization backends, and more. \r\n\r\nGoing forward, to provide more transparency to the community about ongoing developments and releases in Diffusers, we will be making use of a [roadmap tracker](https://github.com/orgs/huggingface/projects/61/views/1). \r\n\r\n## New Video Generation Pipelines 📹\r\n\r\nOpen video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:\r\n\r\n- [Mochi-1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi)\r\n- [Allegro](https://huggingface.co/docs/diffusers/main/en/api/pipelines/allegro)\r\n- [LTXVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)\r\n- [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video)\r\n\r\nCheck out [this section](https://www.notion.so/Diffusers-0-32-0-release-15f1384ebcac8091ac5bf18c128639ab?pvs=21) to learn more about the fine-tuning options available for these new video models. \r\n\r\n## New Image Generation Pipelines\r\n\r\n- SANA\r\n    - [Text-to-image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPipeline)\r\n    - [PAG](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPAGPipeline)\r\n- Flux Control (including Control LoRA)\r\n    - [Depth Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control)\r\n    - [Canny Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control)\r\n- [Flux Redux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#redux)\r\n- [Flux Fill Inpainting / Outpainting](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#fill-inpaintingoutpainting)\r\n- [Flux RF-Inversion](https://github.com/huggingface/diffusers/pull/9816)\r\n- [SD3.5 ControlNet](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_3#image-prompting-with-ip-adapters)\r\n- [ControlNet Union XL](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_union)\r\n- [SD3.5 IP Adapter](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3#image-prompting-with-ip-adapters)\r\n- [Flux IP adapter](https://github.com/huggingface/diffusers/pull/10261)\r\n\r\n**Important Note about the new Flux Models**\r\n\r\nWe can combine the regular Flux.1 Dev LoRAs with Flux Control LoRAs, Flux Control, and Flux Fill. For example, you can enable few-steps inference with Flux Fill using:\r\n\r\n```python\r\nfrom diffusers import FluxFillPipeline\r\nfrom diffusers.utils import load_image\r\nimport torch\r\n\r\npipe = FluxFillPipeline.from_pretrained(\r\n    \"black-forest-labs/FLUX.1-Fill-dev\", torch_dtype=torch.bfloat16\r\n).to(\"cuda\")\r\n\r\nadapter_id = \"alimama-creative/FLUX.1-Turbo-Alpha\"\r\npipe.load_lora_weights(adapter_id)\r\n\r\nimage = load_image(\"https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png\")\r\nmask = load_image(\"https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png\")\r\n\r\nimage = pipe(\r\n    prompt=\"a white paper cup\",\r\n    image=image,\r\n    mask_image=mask,\r\n    height=1632,\r\n    width=1232,\r\n    guidance_scale=30,\r\n    num_inference_steps=8,\r\n    max_sequence_length=512,\r\n    generator=torch.Generator(\"cpu\").manual_seed(0)\r\n).images[0]\r\nimage.save(\"flux-fill-dev.png\")\r\n```\r\n\r\nTo learn more, check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#combining-flux-turbo-loras-with-flux-control-fill-and-redux). \r\n\r\n> [!NOTE]  \r\n> SANA is a small model compared to other models like Flux and Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image.  We support LoRA fine-tuning of SANA. Check out [this section](https://www.notion.so/Diffusers-0-32-0-release-15f1384ebcac8091ac5bf18c128639ab?pvs=21) for more details.\r\n\r\n### Acknowledgements\r\n\r\n- Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in [this PR](https://github.com/huggingface/diffusers/pull/9982). SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in [this PR](https://github.com/huggingface/diffusers/pull/9708). \r\n- Shoutout to @guiyrt for contributing SD3.5 IP Adapter in [this PR](https://github.com/huggingface/diffusers/pull/9987).\r\n\r\n## New Quantization Backends\r\n\r\n- [TorchAO](https://huggingface.co/docs/diffusers/main/en/quantization/torchao)\r\n- [GGUF](https://huggingface.co/docs/diffusers/main/en/quantization/gguf)\r\n\r\nPlease be aware of the following caveats:\r\n\r\n- TorchAO quantized checkpoints cannot be serialized in `safetensors` currently. This may change in the future.\r\n- GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.\r\n\r\n## New training scripts\r\n\r\nThis release features many new training scripts for the community to play:\r\n\r\n- [Flux Control](https://github.com/huggingface/diffusers/tree/main/examples/flux-control)\r\n- [Mochi-1](https://github.com/a-r-r-o-w/finetrainers)\r\n- [LTXVideo](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)\r\n- [SANA](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sana.md)\r\n- [Hunyuan Video](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)\r\n\r\n## All commits\r\n\r\n* post-release 0.31.0  by @sayakpaul in #9742\r\n* fix bug in `require_accelerate_version_greater`  by @faaany in #9746\r\n* [Official callbacks] SDXL Controlnet CFG Cutoff  by @asomoza in #9311\r\n* [SD3-5 dreambooth lora] update model cards  by @linoytsaban in #9749\r\n* config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved  by @rshah240 in #9586\r\n* Some minor updates to the nightly and push workflows  by @sayakpaul in #9759\r\n* [Docs] fix docstring typo in SD3 pipeline  by @shenzhiy21 in #9765\r\n* [bugfix] bugfix for npu free memory  by @leisuzz in #9640\r\n* [research_projects] add flux training script with quantization  by @sayakpaul in #9754\r\n* Add a doc for AWS Neuron in Diffusers  by @JingyaHuang in #9766\r\n* [refactor] enhance readability of flux related pipelines  by @Luciennnnnnn in #9711\r\n* Added Support of Xlabs controlnet to FluxControlNetInpaintPipeline  by @SahilCarterr in #9770\r\n* [research_projects] Update README.md to include a note about NF5 T5-xxl  by @sayakpaul in #9775\r\n* [Fix] train_dreambooth_lora_flux_advanced ValueError: unexpected save model: <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>  by @rootonchair in #9777\r\n* [Fix] remove setting lr for T5 text encoder when using prodigy in flux dreambooth lora script  by @biswaroop1547 in #9473\r\n* [SD 3.5 Dreambooth LoRA] support configurable training block & layers  by @linoytsaban in #9762\r\n* [flux dreambooth lora training] make LoRA target modules configurable + small bug fix  by @linoytsaban in #9646\r\n* adds the pipeline for pixart alpha controlnet  by @raulc0399 in #8857\r\n* [core] Allegro T2V  by @a-r-r-o-w in #9736\r\n* Allegro VAE fix  by @a-r-r-o-w in #9811\r\n* [CI] add new runner for testing  by @sayakpaul in #9699\r\n* [training] fixes to the quantization training script and add AdEMAMix optimizer as an option  by @sayakpaul in #9806\r\n* [training] use the lr when using 8bit adam.  by @sayakpaul in #9796\r\n* [Tests] clean up and refactor gradient checkpointing tests  by @sayakpaul in #9494\r\n* [CI] add a big GPU marker to run memory-intensive tests separately on CI  by @sayakpaul in #9691\r\n* [LoRA] fix: lora loading when using with a device_mapped model.  by @sayakpaul in #9449\r\n* Revert \"[LoRA] fix: lora loading when using with a device_mapped mode…  by @yiyixuxu in #9823\r\n* [Model Card] standardize advanced diffusion training sd15 lora  by @chiral-carbon in #7613\r\n* NPU Adaption for FLUX  by @leisuzz in #9751\r\n* Fixes EMAModel \"from_pretrained\" method  by @SahilCarterr in #9779\r\n* Update train_controlnet_flux.py,Fix size mismatch issue in validation  by @ScilenceForest in #9679\r\n* Handling mixed precision for dreambooth flux lora training  by @icsl-Jeon in #9565\r\n* Reduce Memory Cost in Flux Training  by @leisuzz in #9829\r\n* Add Diffusion Policy for Reinforcement Learning  by @DorsaRoh in #9824\r\n* [feat] add `load_lora_adapter()` for compatible models  by @sayakpaul in #9712\r\n* Refac training utils.py  by @RogerSinghChugh in #9815\r\n* [core] Mochi T2V   by @a-r-r-o-w in #9769\r\n* [Fix] Test of sd3 lora  by @SahilCarterr in #9843\r\n* Fix: Remove duplicated comma in distributed_inference.md  by @vahidaskari in #9868\r\n* Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA  by @jellyheadandrew in #9228\r\n* Updated _encode_prompt_with_clip and encode_prompt in train_dreamboth_sd3  by @SahilCarterr in #9800\r\n* [Core] introduce `controlnet` module  by @sayakpaul in #8768\r\n* [Flux] reduce explicit device transfers and typecasting in flux.  by @sayakpaul in #9817\r\n* Improve downloads of sharded variants  by @DN6 in #9869\r\n* [fix] Replaced shutil.copy with shutil.copyfile  by @SahilCarterr in #9885\r\n* Enabling gradient checkpointing in eval() mode  by @MikeTkachuk in #9878\r\n* [FIX] Fix TypeError in DreamBooth SDXL when use_dora is False  by @SahilCarterr in #9879\r\n* [Advanced LoRA v1.5] fix: gradient unscaling problem  by @sayakpaul in #7018\r\n* Revert \"[Flux] reduce explicit device transfers and typecasting in flux.\"  by @sayakpaul in #9896\r\n* Feature IP Adapter Xformers Attention Processor  by @elismasilva in #9881\r\n*  Notebooks for Community Scripts Examples  by @ParagEkbote in #9905\r\n* Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline  by @painebenjamin in #9925\r\n* Update pipeline_flux_img2img.py  by @example-git in #9928\r\n* add depth controlnet sd3 pre-trained checkpoints to docs  by @pureexe in #9937\r\n* Move Wuerstchen Dreambooth to research_projects  by @ParagEkbote in #9935\r\n* Update ip_adapter.py  by @mkknightr in #8882\r\n* Modify apply_overlay for inpainting with padding_mask_crop (Inpainting area: \"Only Masked\")  by @clarkkent0618 in #8793\r\n* Correct pipeline_output.py to the type Mochi  by @twobob in #9945\r\n* Add all AttnProcessor classes in `AttentionProcessor` type  by @Prgckwb in #9909\r\n* Fixed Nits in Docs and Example Script  by @ParagEkbote in #9940\r\n* Add server example  by @thealmightygrant in #9918\r\n* CogVideoX 1.5  by @zRzRzRzRzRzRzR in #9877\r\n* Notebooks for Community Scripts-2  by @ParagEkbote in #9952\r\n* [advanced flux training] bug fix + reduce memory cost as in #9829  by @linoytsaban in #9838\r\n* [LoRA] feat: `save_lora_adapter()`  by @sayakpaul in #9862\r\n* Make CogVideoX RoPE implementation consistent  by @a-r-r-o-w in #9963\r\n* [CI] Unpin torch<2.5 in CI  by @DN6 in #9961\r\n* Move IP Adapter Scripts to research project  by @ParagEkbote in #9960\r\n* add skip_layers argument to SD3 transformer model class  by @bghira in #9880\r\n* Fix beta and exponential sigmas + add tests  by @hlky in #9954\r\n* Flux latents fix  by @DN6 in #9929\r\n* [LoRA] enable LoRA for Mochi-1  by @sayakpaul in #9943\r\n* Improve control net block index for sd3  by @linjiapro in #9758\r\n* Update handle single blocks on _convert_xlabs_flux_lora_to_diffusers  by @raulmosa in #9915\r\n* fix controlnet module refactor   by @yiyixuxu in #9968\r\n* Fix prepare latent image ids and vae sample generators for flux  by @a-r-r-o-w in #9981\r\n* [Tests] skip nan lora tests on PyTorch 2.5.1 CPU.  by @sayakpaul in #9975\r\n* make `pipelines` tests device-agnostic (part1)  by @faaany in #9399\r\n* ControlNet from_single_file when already converted  by @hlky in #9978\r\n* Flux Fill, Canny, Depth, Redux  by @a-r-r-o-w in #9985\r\n* [SD3 dreambooth lora] smol fix to checkpoint saving  by @linoytsaban in #9993\r\n* [Docs] add: missing pipelines from the spec.  by @sayakpaul in #10005\r\n* Add prompt about wandb in examples/dreambooth/readme.  by @SkyCol in #10014\r\n* [docs] Fix CogVideoX table  by @a-r-r-o-w in #10008\r\n* Notebooks for Community Scripts-3  by @ParagEkbote in #10032\r\n* Sd35 controlnet  by @yiyixuxu in #10020\r\n* Add `beta`, `exponential` and `karras` sigmas to `FlowMatchEulerDiscreteScheduler`  by @hlky in #10001\r\n* Update sdxl reference pipeline to latest sdxl pipeline  by @dimitribarbot in #9938\r\n* [Community Pipeline] Add some feature for regional prompting pipeline  by @cjkangme in #9874\r\n* Add sdxl controlnet reference community pipeline  by @dimitribarbot in #9893\r\n* Change image_gen_aux repository URL  by @asomoza in #10048\r\n* make `pipelines` tests device-agnostic (part2)  by @faaany in #9400\r\n* [Mochi-1] ensuring to compute the fourier features in FP32 in Mochi encoder  by @sayakpaul in #10031\r\n* [Fix] Syntax error  by @SahilCarterr in #10068\r\n* [CI] Add quantization  by @sayakpaul in #9832\r\n* Add `sigmas` to Flux pipelines  by @hlky in #10081\r\n* Fixed Nits in Evaluation Docs   by @ParagEkbote in #10063\r\n* fix link in the docs  by @coding-famer in #10058\r\n* fix offloading for sd3.5 controlnets  by @yiyixuxu in #10072\r\n* [Single File] Fix SD3.5 single file loading  by @DN6 in #10077\r\n* Fix `num_images_per_prompt>1` with Skip Guidance Layers in `StableDiffusion3Pipeline`  by @hlky in #10086\r\n* [Single File] Pass token when fetching interpreted config  by @DN6 in #10082\r\n* Interpolate fix on cuda for large output tensors  by @pcuenca in #10067\r\n* Convert `sigmas` to `np.array` in FlowMatch set_timesteps  by @hlky in #10088\r\n* fix: missing AutoencoderKL lora adapter  by @beniz in #9807\r\n* Let server decide default repo visibility  by @Wauplin in #10047\r\n* Fix some documentation in ./src/diffusers/models/embeddings.py for demo  by @DTG2005 in #9579\r\n* Don't stale close-to-merge  by @pcuenca in #10096\r\n* Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG  by @painebenjamin in #9932\r\n* Notebooks for Community Scripts-4  by @ParagEkbote in #10094\r\n* Fix Broken Link in Optimization Docs  by @ParagEkbote in #10105\r\n* DPM++ third order fixes  by @StAlKeR7779 in #9104\r\n* update  by @aihao2000 in #7067\r\n* Avoid compiling a progress bar.  by @lsb in #10098\r\n* [Bug fix] \"previous_timestep()\" in DDPM scheduling compatible with \"trailing\" and \"linspace\" options  by @AnandK27 in #9384\r\n* Fix multi-prompt inference  by @hlky in #10103\r\n* Test `skip_guidance_layers` in SD3 pipeline  by @hlky in #10102\r\n* Use parameters + buffers when deciding upscale_dtype  by @universome in #9882\r\n* [tests] refactor vae tests  by @sayakpaul in #9808\r\n* add torch_xla support in pipeline_stable_audio.py  by @<NOT FOUND> in #10109\r\n* Fix `pipeline_stable_audio` formating  by @hlky in #10114\r\n* [bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components  by @sayakpaul in #9840\r\n* Fix Broken Links in ReadMe  by @ParagEkbote in #10117\r\n* Add `sigmas` to pipelines using FlowMatch  by @hlky in #10116\r\n* [Flux Redux] add prompt & multiple image input   by @linoytsaban in #10056\r\n* Fix a bug in the state dict judgment in ip_adapter.py.  by @zhangp365 in #10095\r\n* Fix a bug for SD35 control net training and improve control net block index  by @linjiapro in #10065\r\n* pass attn mask arg for flux  by @yiyixuxu in #10122\r\n* [docs] load_lora_adapter  by @stevhliu in #10119\r\n* Use torch.device instead of current device index for BnB quantizer  by @a-r-r-o-w in #10069\r\n* [Tests] fix condition argument in xfail.  by @sayakpaul in #10099\r\n* [Tests] xfail incompatible SD configs.  by @sayakpaul in #10127\r\n* [FIX] Bug in FluxPosEmbed  by @SahilCarterr in #10115\r\n* [Guide] Quantize your Diffusion Models with `bnb`  by @ariG23498 in #10012\r\n* Remove duplicate checks for len(generator) != batch_size when generator is a list  by @a-r-r-o-w in #10134\r\n* [community] Load Models from Sources like `Civitai` into Existing Pipelines  by @suzukimain in #9986\r\n* [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio);  by @lawrence-cj in #9708\r\n* fixed a dtype bfloat16 bug in torch_utils.py  by @zhangp365 in #10125\r\n* [LoRA] depcrecate save_attn_procs().  by @sayakpaul in #10126\r\n* Update ptxla training  by @entrpn in #9864\r\n* support sd3.5 for controlnet example  by @DavyMorgan in #9860\r\n* [Single file] Support `revision` argument when loading single file config  by @a-r-r-o-w in #10168\r\n* [community pipeline] Add RF-inversion Flux pipeline  by @linoytsaban in #9816\r\n* Improve post-processing performance  by @soof-golan in #10170\r\n* Use `torch` in `get_3d_rotary_pos_embed`/`_allegro`  by @hlky in #10161\r\n* Flux Control LoRA  by @a-r-r-o-w in #9999\r\n* Add PAG Support for Stable Diffusion Inpaint Pipeline  by @darshil0805 in #9386\r\n* [community pipeline rf-inversion] - fix example in doc  by @linoytsaban in #10179\r\n* Fix Nonetype attribute error when loading multiple Flux loras  by @jonathanyin12 in #10182\r\n* Added Error when len(gligen_images ) is not equal to len(gligen_phrases) in StableDiffusionGLIGENTextImagePipeline  by @SahilCarterr in #10176\r\n* [Single File] Add single file support for AutoencoderDC  by @DN6 in #10183\r\n* Add ControlNetUnion  by @hlky in #10131\r\n* fix min-snr implementation  by @ethansmith2000 in #8466\r\n* Add support for XFormers in SD3  by @CanvaChen in #8583\r\n* [LoRA] add a test to ensure `set_adapters()` and attn kwargs outs match  by @sayakpaul in #10110\r\n* [CI] merge peft pr workflow into the main pr workflow.  by @sayakpaul in #10042\r\n* [WIP][Training] Flux Control LoRA training script  by @sayakpaul in #10130\r\n* [core] LTX Video  by @a-r-r-o-w in #10021\r\n* Ci update tpu  by @paulinebm in #10197\r\n* Remove `negative_*` from SDXL callback  by @hlky in #10203\r\n* refactor  StableDiffusionXLControlNetUnion  by @hlky in #10200\r\n* update StableDiffusion3Img2ImgPipeline.add image size validation  by @ZHJ19970917 in #10166\r\n* Remove mps workaround for fp16 GELU, which is now supported natively  by @skotapati in #10133\r\n* [RF inversion community pipeline] add eta_decay   by @linoytsaban in #10199\r\n* Allow image resolutions multiple of 8 instead of 64 in SVD pipeline  by @mlfarinha in #6646\r\n* Use `torch` in `get_2d_sincos_pos_embed` and `get_3d_sincos_pos_embed`  by @hlky in #10156\r\n* add reshape to fix use_memory_efficient_attention in flax  by @entrpn in #7918\r\n* Add offload option in flux-control training  by @Adenialzz in #10225\r\n* Test error raised when loading normal and expanding loras together in Flux  by @a-r-r-o-w in #10188\r\n* [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on.  by @lawrence-cj in #9982\r\n* [Tests] update always test pipelines list.  by @sayakpaul in #10143\r\n* Update sana.md with minor corrections  by @sayakpaul in #10232\r\n* [docs] minor stuff to ltx video docs.  by @sayakpaul in #10229\r\n* Fix format issue in push_test yml  by @DN6 in #10235\r\n* [core] Hunyuan Video  by @a-r-r-o-w in #10136\r\n* Update pipeline_controlnet.py add support for pytorch_xla  by @<NOT FOUND> in #10222\r\n* [Docs] add rest of the lora loader mixins to the docs.  by @sayakpaul in #10230\r\n* Use `t` instead of `timestep` in `_apply_perturbed_attention_guidance`  by @hlky in #10243\r\n* Add `dynamic_shifting` to SD3  by @hlky in #10236\r\n* Fix `use_flow_sigmas`  by @hlky in #10242\r\n* Fix ControlNetUnion _callback_tensor_inputs  by @hlky in #10218\r\n* Use non-human subject in StableDiffusion3ControlNetPipeline example  by @hlky in #10214\r\n* Add enable_vae_tiling to AllegroPipeline, fix example  by @hlky in #10212\r\n* Fix checkpoint in CogView3PlusPipeline example  by @hlky in #10211\r\n* Fix RePaint Scheduler  by @hlky in #10185\r\n* Add ControlNetUnion to AutoPipeline from_pretrained  by @hlky in #10219\r\n* fix downsample bug in MidResTemporalBlock1D  by @holmosaint in #10250\r\n* [core] TorchAO Quantizer  by @a-r-r-o-w in #10009\r\n* [docs] Add missing AttnProcessors  by @stevhliu in #10246\r\n* [chore] add contribution note for lawrence.  by @sayakpaul in #10253\r\n* Fix copied from comment in Mochi lora loader  by @a-r-r-o-w in #10255\r\n* [LoRA] Support LTX Video  by @a-r-r-o-w in #10228\r\n* [docs] Clarify dtypes for Sana  by @a-r-r-o-w in #10248\r\n* [Single File] Add GGUF support  by @DN6 in #9964\r\n* Fix Mochi Quality Issues  by @DN6 in #10033\r\n* [tests] Remove/rename unsupported quantization torchao type  by @a-r-r-o-w in #10263\r\n* [docs] delete_adapters()  by @stevhliu in #10245\r\n* [Community Pipeline] Fix typo that cause error on regional prompting pipeline  by @cjkangme in #10251\r\n* Add `set_shift` to FlowMatchEulerDiscreteScheduler  by @hlky in #10269\r\n* [LoRA] feat: lora support for SANA.  by @sayakpaul in #10234\r\n* [chore] fix: licensing headers in mochi and ltx  by @sayakpaul in #10275\r\n* Use `torch` in `get_2d_rotary_pos_embed`  by @hlky in #10155\r\n* [chore] fix: reamde -> readme  by @sayakpaul in #10276\r\n* Make `time_embed_dim` of `UNet2DModel` changeable  by @Bichidian in #10262\r\n* Support pass kwargs to sd3 custom attention processor  by @Matrix53 in #9818\r\n* Flux Control(Depth/Canny) + Inpaint  by @affromero in #10192\r\n* Fix sigma_last with use_flow_sigmas  by @hlky in #10267\r\n* Fix Doc links in GGUF and Quantization overview docs   by @DN6 in #10279\r\n* Make zeroing prompt embeds for Mochi Pipeline configurable  by @DN6 in #10284\r\n* [Single File] Add single file support for Flux Canny, Depth and Fill  by @DN6 in #10288\r\n* [tests] Fix broken cuda, nightly and lora tests on main for CogVideoX  by @a-r-r-o-w in #10270\r\n* Rename Mochi integration test correctly  by @a-r-r-o-w in #10220\r\n* [tests] remove nullop import checks from lora tests  by @a-r-r-o-w in #10273\r\n* [chore] Update README_sana.md to update the default model  by @sayakpaul in #10285\r\n* Hunyuan VAE tiling fixes and transformer docs  by @a-r-r-o-w in #10295\r\n* Add Flux Control to AutoPipeline  by @hlky in #10292\r\n* Update lora_conversion_utils.py  by @zhaowendao30 in #9980\r\n* Check correct model type is passed to `from_pretrained`  by @hlky in #10189\r\n* [LoRA] Support HunyuanVideo  by @SHYuanBest in #10254\r\n* [Single File] Add single file support for Mochi Transformer  by @DN6 in #10268\r\n* Allow Mochi Transformer to be split across multiple GPUs  by @DN6 in #10300\r\n* Fix `local_files_only` for checkpoints with shards  by @hlky in #10294\r\n* Fix failing lora tests after HunyuanVideo lora  by @a-r-r-o-w in #10307\r\n* unet's `sample_size` attribute is to accept tuple(h, w) in `StableDiffusionPipeline`  by @Foundsheep in #10181\r\n* Enable Gradient Checkpointing for UNet2DModel (New)  by @dg845 in #7201\r\n* [WIP] SD3.5 IP-Adapter Pipeline Integration  by @guiyrt in #9987\r\n* Add support for sharded models when TorchAO quantization is enabled  by @a-r-r-o-w in #10256\r\n* Make tensors in ResNet contiguous for Hunyuan VAE  by @a-r-r-o-w in #10309\r\n* [Single File] Add GGUF support for LTX  by @DN6 in #10298\r\n* [LoRA] feat: support loading regular Flux LoRAs into Flux Control, and Fill  by @sayakpaul in #10259\r\n* [Tests] add integration tests for lora expansion stuff in Flux.  by @sayakpaul in #10318\r\n* Mochi docs  by @DN6 in #9934\r\n* [Docs] Update ltx_video.md to remove generator from `from_pretrained()`  by @sayakpaul in #10316\r\n* docs: fix a mistake in docstring  by @Leojc in #10319\r\n* [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length  by @syntaxticsugr in #10306\r\n* [docs] Fix quantization links  by @stevhliu in #10323\r\n* [Sana]add 2K related model for Sana  by @lawrence-cj in #10322\r\n* [Docs] Update gguf.md to remove generator from the pipeline from_pretrained  by @sayakpaul in #10299\r\n* Fix push_tests_mps.yml  by @hlky in #10326\r\n* Fix EMAModel test_from_pretrained  by @hlky in #10325\r\n* Support Flux IP Adapter  by @hlky in #10261\r\n* flux controlnet inpaint config bug  by @yigitozgenc in #10291\r\n* Community hosted weights for diffusers format HunyuanVideo weights  by @a-r-r-o-w in #10344\r\n* Fix enable_sequential_cpu_offload in test_kandinsky_combined  by @hlky in #10324\r\n* update `get_parameter_dtype`  by @yiyixuxu in #10342\r\n* [Single File] Add Single File support for HunYuan video  by @DN6 in #10320\r\n* [Sana bug] bug fix for 2K model config  by @lawrence-cj in #10340\r\n* `.from_single_file()` - Add missing `.shape`  by @gau-nernst in #10332\r\n* Bump minimum TorchAO version to 0.7.0  by @a-r-r-o-w in #10293\r\n* [docs] fix: torchao example.  by @sayakpaul in #10278\r\n* [tests] Refactor TorchAO serialization fast tests  by @a-r-r-o-w in #10271\r\n* [SANA LoRA] sana lora training tests and misc.  by @sayakpaul in #10296\r\n* [Single File] Fix loading  by @DN6 in #10349\r\n* [Tests] QoL improvements to the LoRA test suite  by @sayakpaul in #10304\r\n* Fix FluxIPAdapterTesterMixin  by @hlky in #10354\r\n* Fix failing CogVideoX LoRA fuse test  by @a-r-r-o-w in #10352\r\n* Rename LTX blocks and docs title  by @a-r-r-o-w in #10213\r\n* [LoRA] test fix  by @sayakpaul in #10351\r\n* [Tests] Fix more tests sayak  by @sayakpaul in #10359\r\n* [core] LTX Video 0.9.1  by @a-r-r-o-w in #10330\r\n* Release: v0.32.0 by @sayakpaul (direct commit on v0.32.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @faaany\r\n    * fix bug in `require_accelerate_version_greater` (#9746)\r\n    * make `pipelines` tests device-agnostic (part1) (#9399)\r\n    * make `pipelines` tests device-agnostic (part2) (#9400)\r\n* @linoytsaban\r\n    * [SD3-5 dreambooth lora] update model cards (#9749)\r\n    * [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)\r\n    * [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)\r\n    * [advanced flux training] bug fix + reduce memory cost as in #9829 (#9838)\r\n    * [SD3 dreambooth lora] smol fix to checkpoint saving (#9993)\r\n    * [Flux Redux] add prompt & multiple image input  (#10056)\r\n    * [community pipeline] Add RF-inversion Flux pipeline (#9816)\r\n    * [community pipeline rf-inversion] - fix example in doc (#10179)\r\n    * [RF inversion community pipeline] add eta_decay  (#10199)\r\n* @raulc0399\r\n    * adds the pipeline for pixart alpha controlnet (#8857)\r\n* @yiyixuxu\r\n    * Revert \"[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)\r\n    * fix controlnet module refactor  (#9968)\r\n    * Sd35 controlnet (#10020)\r\n    * fix offloading for sd3.5 controlnets (#10072)\r\n    * pass attn mask arg for flux (#10122)\r\n    * update `get_parameter_dtype` (#10342)\r\n* @jellyheadandrew\r\n    * Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)\r\n* @DN6\r\n    * Improve downloads of sharded variants (#9869)\r\n    * [CI] Unpin torch<2.5 in CI (#9961)\r\n    * Flux latents fix (#9929)\r\n    * [Single File] Fix SD3.5 single file loading (#10077)\r\n    * [Single File] Pass token when fetching interpreted config (#10082)\r\n    * [Single File] Add single file support for AutoencoderDC (#10183)\r\n    * Fix format issue in push_test yml (#10235)\r\n    * [Single File] Add GGUF support (#9964)\r\n    * Fix Mochi Quality Issues (#10033)\r\n    * Fix Doc links in GGUF and Quantization overview docs  (#10279)\r\n    * Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)\r\n    * [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)\r\n    * [Single File] Add single file support for Mochi Transformer (#10268)\r\n    * Allow Mochi Transformer to be split across multiple GPUs (#10300)\r\n    * [Single File] Add GGUF support for LTX (#10298)\r\n    * Mochi docs (#9934)\r\n    * [Single File] Add Single File support for HunYuan video (#10320)\r\n    * [Single File] Fix loading (#10349)\r\n* @ParagEkbote\r\n    *  Notebooks for Community Scripts Examples (#9905)\r\n    * Move Wuerstchen Dreambooth to research_projects (#9935)\r\n    * Fixed Nits in Docs and Example Script (#9940)\r\n    * Notebooks for Community Scripts-2 (#9952)\r\n    * Move IP Adapter Scripts to research project (#9960)\r\n    * Notebooks for Community Scripts-3 (#10032)\r\n    * Fixed Nits in Evaluation Docs  (#10063)\r\n    * Notebooks for Community Scripts-4 (#10094)\r\n    * Fix Broken Link in Optimization Docs (#10105)\r\n    * Fix Broken Links in ReadMe (#10117)\r\n* @painebenjamin\r\n    * Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline (#9925)\r\n    * Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)\r\n* @hlky\r\n    * Fix beta and exponential sigmas + add tests (#9954)\r\n    * ControlNet from_single_file when already converted (#9978)\r\n    * Add `beta`, `exponential` and `karras` sigmas to `FlowMatchEulerDiscreteScheduler` (#10001)\r\n    * Add `sigmas` to Flux pipelines (#10081)\r\n    * Fix `num_images_per_prompt>1` with Skip Guidance Layers in `StableDiffusion3Pipeline` (#10086)\r\n    * Convert `sigmas` to `np.array` in FlowMatch set_timesteps (#10088)\r\n    * Fix multi-prompt inference (#10103)\r\n    * Test `skip_guidance_layers` in SD3 pipeline (#10102)\r\n    * Fix `pipeline_stable_audio` formating (#10114)\r\n    * Add `sigmas` to pipelines using FlowMatch (#10116)\r\n    * Use `torch` in `get_3d_rotary_pos_embed`/`_allegro` (#10161)\r\n    * Add ControlNetUnion (#10131)\r\n    * Remove `negative_*` from SDXL callback (#10203)\r\n    * refactor  StableDiffusionXLControlNetUnion (#10200)\r\n    * Use `torch` in `get_2d_sincos_pos_embed` and `get_3d_sincos_pos_embed` (#10156)\r\n    * Use `t` instead of `timestep` in `_apply_perturbed_attention_guidance` (#10243)\r\n    * Add `dynamic_shifting` to SD3 (#10236)\r\n    * Fix `use_flow_sigmas` (#10242)\r\n    * Fix ControlNetUnion _callback_tensor_inputs (#10218)\r\n    * Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)\r\n    * Add enable_vae_tiling to AllegroPipeline, fix example (#10212)\r\n    * Fix checkpoint in CogView3PlusPipeline example (#10211)\r\n    * Fix RePaint Scheduler (#10185)\r\n    * Add ControlNetUnion to AutoPipeline from_pretrained (#10219)\r\n    * Add `set_shift` to FlowMatchEulerDiscreteScheduler (#10269)\r\n    * Use `torch` in `get_2d_rotary_pos_embed` (#10155)\r\n    * Fix sigma_last with use_flow_sigmas (#10267)\r\n    * Add Flux Control to AutoPipeline (#10292)\r\n    * Check correct model type is passed to `from_pretrained` (#10189)\r\n    * Fix `local_files_only` for checkpoints with shards (#10294)\r\n    * Fix push_tests_mps.yml (#10326)\r\n    * Fix EMAModel test_from_pretrained (#10325)\r\n    * Support Flux IP Adapter (#10261)\r\n    * Fix enable_sequential_cpu_offload in test_kandinsky_combined (#10324)\r\n    * Fix FluxIPAdapterTesterMixin (#10354)\r\n* @dimitribarbot\r\n    * Update sdxl reference pipeline to latest sdxl pipeline (#9938)\r\n    * Add sdxl controlnet reference community pipeline (#9893)\r\n* @suzukimain\r\n    * [community] Load Models from Sources like `Civitai` into Existing Pipelines (#9986)\r\n* @lawrence-cj\r\n    * [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)\r\n    * [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982)\r\n    * [Sana]add 2K related model for Sana (#10322)\r\n    * [Sana bug] bug fix for 2K model config (#10340)\r\n* @darshil0805\r\n    * Add PAG Support for Stable Diffusion Inpaint Pipeline (#9386)\r\n* @affromero\r\n    * Flux Control(Depth/Canny) + Inpaint (#10192)\r\n* @SHYuanBest\r\n    * [LoRA] Support HunyuanVideo (#10254)\r\n* @guiyrt\r\n    * [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)\r\n","publishedAt":"2024-12-23T16:00:11.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.32.0","media":[]},{"id":"rel_N3kbPB7f8qERPuT-bbHI5","version":"v0.31.0","title":"v0.31.0","summary":"# v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more\r\n\r\n## Stable Diffusion 3.5 Large\r\n\r\nStability AI’s latest te...","content":"# v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more\r\n\r\n## Stable Diffusion 3.5 Large\r\n\r\nStability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):\r\n\r\n- A regular one\r\n- A timestep-distilled one enabling few-step inference\r\n\r\nMake sure to fill up the form by going to the [model page](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), and then run `huggingface-cli login` before running the code below. \r\n\r\n```python\r\n# make sure to update diffusers\r\n# pip install -U diffusers\r\nimport torch\r\nfrom diffusers import StableDiffusion3Pipeline\r\n\r\npipe = StableDiffusion3Pipeline.from_pretrained(\r\n\t\"stabilityai/stable-diffusion-3.5-large\", torch_dtype=torch.bfloat16\r\n).to(\"cuda\")\r\n\r\nimage = pipe(\r\n    prompt=\"a photo of a cat holding a sign that says hello world\",\r\n    negative_prompt=\"\",\r\n    num_inference_steps=40,\r\n    height=1024,\r\n    width=1024,\r\n    guidance_scale=4.5,\r\n).images[0]\r\n\r\nimage.save(\"sd3_hello_world.png\")\r\n```\r\n\r\nFollow the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) to know more. \r\n\r\n## Cogview3-plus\r\n\r\nWe added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it! \r\n\r\n```python\r\nfrom diffusers import CogView3PlusPipeline\r\nimport torch\r\n\r\npipe = CogView3PlusPipeline.from_pretrained(\"THUDM/CogView3-Plus-3B\", torch_dtype=torch.float16).to(\"cuda\")\r\n\r\n# Enable it to reduce GPU memory usage\r\npipe.enable_model_cpu_offload()\r\npipe.vae.enable_slicing()\r\npipe.vae.enable_tiling()\r\n\r\nprompt = \"A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background.\"\r\n\r\nimage = pipe(\r\n    prompt=prompt,\r\n    guidance_scale=7.0,\r\n    num_images_per_prompt=1,\r\n    num_inference_steps=50,\r\n    width=1024,\r\n    height=1024,\r\n).images[0]\r\n\r\nimage.save(\"cogview3.png\")\r\n```\r\n\r\nRefer to the [documentation](https://huggingface.co/docs/diffusers/en/api/pipelines/cogview3) to know more. \r\n\r\n## Quantization\r\n\r\nWe have landed native quantization support in Diffusers, starting with `bitsandbytes` as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware. \r\n\r\nThe example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:\r\n\r\n```bash\r\npip install -Uq git+https://github.com/huggingface/transformers@main\r\npip install -Uq bitsandbytes\r\npip install -Uq diffusers\r\n```\r\n\r\n```python\r\nfrom diffusers import BitsAndBytesConfig, FluxTransformer2DModel\r\nimport torch\r\n\r\nckpt_id = \"black-forest-labs/FLUX.1-dev\"\r\nnf4_config = BitsAndBytesConfig(\r\n    load_in_4bit=True,\r\n    bnb_4bit_quant_type=\"nf4\",\r\n    bnb_4bit_compute_dtype=torch.bfloat16\r\n)\r\nmodel_nf4 = FluxTransformer2DModel.from_pretrained(\r\n    ckpt_id,\r\n    subfolder=\"transformer\",\r\n    quantization_config=nf4_config,\r\n    torch_dtype=torch.bfloat16\r\n)\r\n```\r\n\r\nThen, we use `model_nf4` to instantiate the `FluxPipeline`:\r\n\r\n```python\r\n\r\nfrom diffusers import FluxPipeline\r\n\r\npipeline = StableDiffusion3Pipeline.from_pretrained(\r\n    ckpt_id, \r\n    transformer=model_nf4,\r\n    torch_dtype=torch.bfloat16\r\n)\r\npipeline.enable_model_cpu_offload()\r\n\r\nprompt = \"A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight\"\r\n\r\nimage = pipeline(\r\n    prompt=prompt,\r\n    negative_prompt=\"\",\r\n    num_inference_steps=50,\r\n    guidance_scale=4.5,\r\n    max_sequence_length=512,\r\n).images[0]\r\nimage.save(\"whimsical.png\")\r\n```\r\n\r\nFollow the documentation [here](https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes) to know more. Additionally, check out this [Colab Notebook](https://colab.research.google.com/gist/sayakpaul/c76bd845b48759e11687ac550b99d8b4/potato-flux-dev.ipynb) that runs Flux.1 Dev in an end-to-end manner with NF4 quantization. \r\n\r\n## Training scripts\r\n\r\nWe have a fresh bucket of training scripts with this release:\r\n\r\n- [Advanced Flux.1 trainer](https://huggingface.co/blog/linoyts/new-advanced-flux-dreambooth-lora)\r\n- [CogVideoX trainer](https://github.com/huggingface/diffusers/tree/main/examples/cogvideo)\r\n\r\nVideo model fine-tuning can be quite expensive. So, we have worked on a repository, [cogvideox-factory](https://github.com/a-r-r-o-w/cogvideox-factory), which provides memory-optimized scripts to fine-tune the Cog family of models. \r\n\r\n## Misc\r\n\r\n- We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.\r\n- Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it! \r\n\r\n## All commits\r\n\r\n* Feature flux controlnet img2img and inpaint pipeline  by @ighoshsubho in #9408\r\n* Remove CogVideoX mentions from single file docs; Test updates  by @a-r-r-o-w in #9444\r\n* set max_shard_size to None for pipeline save_pretrained  by @a-r-r-o-w in #9447\r\n* adapt masked im2im pipeline for SDXL  by @noskill in #7790\r\n* [Flux] add lora integration tests.  by @sayakpaul in #9353\r\n* [training] CogVideoX Lora  by @a-r-r-o-w in #9302\r\n* Several fixes to Flux ControlNet pipelines  by @vladmandic in #9472\r\n* [refactor] LoRA tests  by @a-r-r-o-w in #9481\r\n* [CI] fix nightly model tests  by @sayakpaul in #9483\r\n* [Cog] some minor fixes and nits  by @sayakpaul in #9466\r\n* [Tests] Reduce the model size in the lumina test   by @saqlain2204 in #8985\r\n* Fix the bug of sd3 controlnet training when using gradient checkpointing.  by @pibbo88 in #9498\r\n* [Schedulers] Add exponential sigmas / exponential noise schedule  by @hlky in #9499\r\n* Allow DDPMPipeline half precision  by @sbinnee in #9222\r\n* Add Noise Schedule/Schedule Type to Schedulers Overview documentation  by @hlky in #9504\r\n* fix bugs for sd3 controlnet training  by @xduzhangjiayu in #9489\r\n* [Doc] Fix path and and also import imageio  by @LukeLIN-web in #9506\r\n* [CI] allow faster downloads from the Hub in CI.  by @sayakpaul in #9478\r\n* a few fix for SingleFile tests  by @yiyixuxu in #9522\r\n* Add exponential sigmas to other schedulers and update docs  by @hlky in #9518\r\n* [Community Pipeline] Batched implementation of Flux with CFG  by @sayakpaul in #9513\r\n* Update community_projects.md  by @lee101 in #9266\r\n* [docs] Model sharding  by @stevhliu in #9521\r\n* update get_parameter_dtype  by @yiyixuxu in #9526\r\n* [Doc] Improved level of clarity for latents_to_rgb.  by @LagPixelLOL in #9529\r\n* [Schedulers] Add beta sigmas / beta noise schedule  by @hlky in #9509\r\n* flux controlnet fix (control_modes batch & others)  by @yiyixuxu in #9507\r\n* [Tests] Fix ChatGLMTokenizer  by @asomoza in #9536\r\n* [bug] Precedence of operations in VAE should be slicing -> tiling  by @a-r-r-o-w in #9342\r\n* [LoRA] make set_adapters() method more robust.  by @sayakpaul in #9535\r\n* [examples] add train flux-controlnet scripts in example.  by @PromeAIpro in #9324\r\n* [Tests] [LoRA] clean up the serialization stuff.  by @sayakpaul in #9512\r\n* [Core] fix variant-identification.  by @sayakpaul in #9253\r\n* [refactor] remove conv_cache from CogVideoX VAE  by @a-r-r-o-w in #9524\r\n* [train_instruct_pix2pix.py]Fix the LR schedulers when `num_train_epochs` is passed in a distributed training env  by @AnandK27 in #9316\r\n* [chore] fix: retain memory utility.  by @sayakpaul in #9543\r\n* [LoRA] support Kohya Flux LoRAs that have text encoders as well  by @sayakpaul in #9542\r\n* Add beta sigmas to other schedulers and update docs  by @hlky in #9538\r\n* Add PAG support to StableDiffusionControlNetPAGInpaintPipeline   by @juancopi81 in #8875\r\n* Support bfloat16 for Upsample2D  by @darhsu in #9480\r\n* fix cogvideox autoencoder decode  by @Xiang-cd in #9569\r\n* [sd3] make sure height and size are divisible by `16`  by @yiyixuxu in #9573\r\n* fix xlabs FLUX lora conversion typo  by @Clement-Lelievre in #9581\r\n* [Chore] add a note on the versions in Flux LoRA integration tests  by @sayakpaul in #9598\r\n* fix vae dtype when accelerate config using --mixed_precision=\"fp16\"  by @xduzhangjiayu in #9601\r\n* refac: docstrings in import_utils.py  by @yijun-lee in #9583\r\n* Fix for use_safetensors parameters, allow use of parameter on loading submodels  by @elismasilva in #9576) \r\n* Update distributed_inference.md to include `transformer.device_map`  by @sayakpaul in #9553\r\n* fix: CogVideox train dataset _preprocess_data crop video  by @glide-the in #9574\r\n* [LoRA] Handle DoRA better  by @sayakpaul in #9547\r\n* Fixed noise_pred_text referenced before assignment.  by @LagPixelLOL in #9537\r\n* Fix the bug that `joint_attention_kwargs` is not passed to the FLUX's transformer attention processors  by @HorizonWind2004 in #9517\r\n* refac/pipeline_output  by @yijun-lee in #9582\r\n* [LoRA] allow loras to be loaded with low_cpu_mem_usage.  by @sayakpaul in #9510\r\n* add PAG support for SD Img2Img  by @SahilCarterr in #9463\r\n* make controlnet support interrupt  by @pureexe in #9620\r\n* [LoRA] fix dora test to catch the warning properly.  by @sayakpaul in #9627\r\n* flux controlnet control_guidance_start and control_guidance_end implement  by @ighoshsubho in #9571\r\n* fix IsADirectoryError when running the training code for sd3_dreambooth_lora_16gb.ipynb  by @alaister123 in #9634\r\n* Add Differential Diffusion to Kolors  by @saqlain2204 in #9423\r\n* FluxMultiControlNetModel  by @hlky in #9647\r\n* [CI] replace ubuntu version to 22.04.  by @sayakpaul in #9656\r\n* [docs] Fix xDiT doc image damage  by @Eigensystem in #9655\r\n* [Tests] increase transformers version in `test_low_cpu_mem_usage_with_loading`  by @sayakpaul in #9662\r\n* Flux - soft inpainting via differential diffusion  by @ryanlyn in #9268\r\n* CogView3Plus DiT  by @zRzRzRzRzRzRzR in #9570\r\n* Improve the performance and suitable for NPU computing  by @leisuzz in #9642\r\n* [`Community Pipeline`] Add 🪆Matryoshka Diffusion Models  by @tolgacangoz in #9157\r\n* Added Lora Support to SD3 Img2Img Pipeline  by @SahilCarterr in #9659\r\n* Add pred_original_sample to `if not return_dict` path  by @hlky in #9649\r\n* Convert list/tuple of `SD3ControlNetModel` to `SD3MultiControlNetModel`  by @hlky in #9652\r\n* Convert list/tuple of `HunyuanDiT2DControlNetModel` to `HunyuanDiT2DMultiControlNetModel`  by @hlky in #9651\r\n* Refactor SchedulerOutput and add pred_original_sample in `DPMSolverSDE`, `Heun`, `KDPM2Ancestral` and `KDPM2`  by @hlky in #9650\r\n* Slight performance improvement to `Euler`, `EDMEuler`, `FlowMatchHeun`, `KDPM2Ancestral`  by @hlky in #9616\r\n* [Fix] when run load pretain with local_files_only, local variable 'cached_folder' referenced before assignment  by @RobinXL in #9376\r\n* [Chore] fix import of EntryNotFoundError.  by @sayakpaul in #9676\r\n* Dreambooth lora flux bug 3dtensor to 2dtensor  by @0x-74 in #9653\r\n* refactor image_processor.py file  by @charchit7 in #9608\r\n* [doc] Fix some docstrings in `src/diffusers/training_utils.py`  by @mreraser in #9606\r\n* [docs] refactoring docstrings in `community/hd_painter.py`  by @Jwaminju in #9593\r\n* [docs] refactoring docstrings in `models/embeddings_flax.py`  by @Jwaminju in #9592\r\n* Fix some documentation in ./src/diffusers/models/adapter.py  by @ahnjj in #9591\r\n* [training] CogVideoX-I2V LoRA  by @a-r-r-o-w in #9482\r\n* [authored by @Anghellia) Add support of Xlabs Controlnets #9638  by @yiyixuxu in #9687\r\n* Docs: CogVideoX  by @glide-the in #9578\r\n* Resolves [BUG] 'GatheredParameters' object is not callable  by @charchit7 in #9614\r\n* [LoRA] log a warning when there are missing keys in the LoRA loading.  by @sayakpaul in #9622\r\n* [SD3 dreambooth-lora training] small updates + bug fixes   by @linoytsaban in #9682\r\n* [peft] simple update when unscale   by @sweetcocoa in #9689\r\n* [pipeline] CogVideoX-Fun Control  by @a-r-r-o-w in #9671\r\n* [core] improve VAE encode/decode framewise batching  by @a-r-r-o-w in #9684\r\n* [tests] fix name and unskip CogI2V integration test  by @a-r-r-o-w in #9683\r\n* [Flux] Add advanced training script + support textual inversion inference  by @linoytsaban in #9434\r\n* [refactor] DiffusionPipeline.download  by @a-r-r-o-w in #9557\r\n* [advanced flux lora script] minor updates to readme  by @linoytsaban in #9705\r\n* Fix bug in Textual Inversion Unloading  by @bonlime in #9304\r\n* Add prompt scheduling callback to community scripts  by @hlky in #9718\r\n* [CI] pin max torch version to fix CI errors  by @a-r-r-o-w in #9709\r\n* [Docker] pin torch versions in the dockerfiles.  by @sayakpaul in #9721\r\n* `make  deps_table_update` to fix CI tests  by @a-r-r-o-w in #9720\r\n* [Quantization] Add quantization support for `bitsandbytes`  by @sayakpaul in #9213\r\n* Fix typo in cogvideo pipeline  by @lichenyu20 in #9722\r\n* [Docs] docs to xlabs controlnets.  by @sayakpaul in #9688\r\n* [docs] add docstrings in `pipline_stable_diffusion.py`  by @jeongiin in #9590\r\n* minor doc/test update  by @yiyixuxu in #9734\r\n* [bugfix] reduce float value error when adding noise  by @gameofdimension in #9004\r\n* fix singlestep dpm tests  by @yiyixuxu in #9716\r\n* Fix `schedule_shifted_power` usage in 🪆Matryoshka Diffusion Models  by @tolgacangoz in #9723\r\n* Update sd3 controlnet example  by @DavyMorgan in #9735\r\n* [Fix] Using sharded checkpoints with gated repositories  by @asomoza in #9737\r\n* [bitsandbbytes] follow-ups  by @sayakpaul in #9730\r\n* Fix typos  by @DN6 in #9739\r\n* is_safetensors_compatible fix  by @DN6 in #9741\r\n* Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ighoshsubho\r\n    * Feature flux controlnet img2img and inpaint pipeline (#9408)\r\n    * flux controlnet control_guidance_start and control_guidance_end implement (#9571)\r\n* @noskill\r\n    * adapt masked im2im pipeline for SDXL (#7790)\r\n* @saqlain2204\r\n    * [Tests] Reduce the model size in the lumina test  (#8985)\r\n    * Add Differential Diffusion to Kolors (#9423)\r\n* @hlky\r\n    * [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)\r\n    * Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)\r\n    * Add exponential sigmas to other schedulers and update docs (#9518)\r\n    * [Schedulers] Add beta sigmas / beta noise schedule (#9509)\r\n    * Add beta sigmas to other schedulers and update docs (#9538)\r\n    * FluxMultiControlNetModel (#9647)\r\n    * Add pred_original_sample to `if not return_dict` path (#9649)\r\n    * Convert list/tuple of `SD3ControlNetModel` to `SD3MultiControlNetModel` (#9652)\r\n    * Convert list/tuple of `HunyuanDiT2DControlNetModel` to `HunyuanDiT2DMultiControlNetModel` (#9651)\r\n    * Refactor SchedulerOutput and add pred_original_sample in `DPMSolverSDE`, `Heun`, `KDPM2Ancestral` and `KDPM2` (#9650)\r\n    * Slight performance improvement to `Euler`, `EDMEuler`, `FlowMatchHeun`, `KDPM2Ancestral` (#9616)\r\n    * Add prompt scheduling callback to community scripts (#9718)\r\n* @yiyixuxu\r\n    * a few fix for SingleFile tests (#9522)\r\n    * update get_parameter_dtype (#9526)\r\n    * flux controlnet fix (control_modes batch & others) (#9507)\r\n    * [sd3] make sure height and size are divisible by `16` (#9573)\r\n    * [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)\r\n    * minor doc/test update (#9734)\r\n    * fix singlestep dpm tests (#9716)\r\n* @PromeAIpro\r\n    * [examples] add train flux-controlnet scripts in example. (#9324)\r\n* @juancopi81\r\n    * Add PAG support to StableDiffusionControlNetPAGInpaintPipeline  (#8875)\r\n* @glide-the\r\n    * fix: CogVideox train dataset _preprocess_data crop video (#9574)\r\n    * Docs: CogVideoX (#9578)\r\n* @SahilCarterr\r\n    * add PAG support for SD Img2Img (#9463)\r\n    * Added Lora Support to SD3 Img2Img Pipeline (#9659)\r\n* @ryanlyn\r\n    * Flux - soft inpainting via differential diffusion (#9268)\r\n* @zRzRzRzRzRzRzR\r\n    * CogView3Plus DiT (#9570)\r\n* @tolgacangoz\r\n    * [`Community Pipeline`] Add 🪆Matryoshka Diffusion Models (#9157)\r\n    * Fix `schedule_shifted_power` usage in 🪆Matryoshka Diffusion Models (#9723)\r\n* @linoytsaban\r\n    * [SD3 dreambooth-lora training] small updates + bug fixes  (#9682)\r\n    * [Flux] Add advanced training script + support textual inversion inference (#9434)\r\n    * [advanced flux lora script] minor updates to readme (#9705)","publishedAt":"2024-10-22T14:15:27.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.31.0","media":[]},{"id":"rel_5oAnnLT_-Sin1jKnBr4Ua","version":"v0.30.3","title":"v0.30.3: CogVideoX Image-to-Video and Video-to-Video","summary":"This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be av...","content":"This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at `THUDM/CogVideoX-5b-I2V` ([Link](https://huggingface.co/THUDM/CogVideoX-5b-I2V)). Stay tuned for the release!\r\n\r\nThis release features two new pipelines:\r\n\r\n- CogVideoXImageToVideoPipeline\r\n- CogVideoXVideoToVideoPipeline\r\n\r\nAdditionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the `vae.enable_tiling()` method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.\r\n\r\n## CogVideoXImageToVideoPipeline\r\n\r\nThe code below demonstrates how to use the new image-to-video pipeline:\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import CogVideoXImageToVideoPipeline\r\nfrom diffusers.utils import export_to_video, load_image\r\n\r\npipe = CogVideoXImageToVideoPipeline.from_pretrained(\"THUDM/CogVideoX-5b-I2V\", torch_dtype=torch.bfloat16)\r\npipe.to(\"cuda\")\r\n\r\n# Optionally, enable memory optimizations.\r\n# If enabling CPU offloading, remember to remove `pipe.to(\"cuda\")` above\r\npipe.enable_model_cpu_offload()\r\npipe.vae.enable_tiling()\r\n\r\nprompt = \"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.\"\r\nimage = load_image(\r\n    \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg\"\r\n)\r\nvideo = pipe(image, prompt, use_dynamic_cfg=True)\r\nexport_to_video(video.frames[0], \"output.mp4\", fps=8)\r\n```\r\n\r\n<table align=center>\r\n<tr>\r\n  <td align=center colspan=1><img src=\"https://github.com/user-attachments/assets/1c7c1d86-f97e-44dd-9b17-4fec2bbc2b1a\" /></td>\r\n  <td align=center colspan=1><video src=\"https://github.com/user-attachments/assets/a115372e-c539-4ca0-b0d0-770d62862257\"> Your broswer does not support the video tag. </video></td>\r\n</tr>\r\n</table>\r\n\r\n## CogVideoXVideoToVideoPipeline\r\n\r\nThe code below demonstrates how to use the new video-to-video pipeline:\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline\r\nfrom diffusers.utils import export_to_video, load_video\r\n\r\n# Models: \"THUDM/CogVideoX-2b\" or \"THUDM/CogVideoX-5b\"\r\npipe = CogVideoXVideoToVideoPipeline.from_pretrained(\"THUDM/CogVideoX-5b-trial\", torch_dtype=torch.bfloat16)\r\npipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)\r\npipe.to(\"cuda\")\r\n\r\ninput_video = load_video(\r\n    \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4\"\r\n)\r\nprompt = (\r\n    \"An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and \"\r\n    \"valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in \"\r\n    \"the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, \"\r\n    \"moons, but the remainder of the scene is mostly realistic.\"\r\n)\r\n\r\nvideo = pipe(\r\n    video=input_video, prompt=prompt, strength=0.8, guidance_scale=6, num_inference_steps=50\r\n).frames[0]\r\nexport_to_video(video, \"output.mp4\", fps=8)\r\n```\r\n\r\n<table align=center>\r\n<tr>\r\n<td align=center><video src=\"https://github.com/user-attachments/assets/bc9273ff-e459-42f9-af1e-c9b084b28f4d\"> Your browser does not support the video tag. </video></td>\r\n</tr>\r\n</table>\r\n\r\nShoutout to @tin2tin for the awesome demonstration!\r\n\r\nRefer to our [documentation](https://huggingface.co/docs/diffusers/api/pipelines/cogvideox) to learn more about it.\r\n\r\n## All commits\r\n\r\n* [core] Support VideoToVideo with CogVideoX  by @a-r-r-o-w in #9333\r\n* [core] CogVideoX memory optimizations in VAE encode  by @a-r-r-o-w in #9340\r\n* [CI] Quick fix for Cog Video Test  by @DN6 in #9373\r\n* [refactor] move positional embeddings to patch embed layer for CogVideoX  by @a-r-r-o-w in #9263\r\n* CogVideoX-5b-I2V support  by @zRzRzRzRzRzRzR in #9418","publishedAt":"2024-09-17T06:22:05.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.30.3","media":[]},{"id":"rel_CeQ7MMqzC9UHGMDF7Lb_1","version":"v0.30.2","title":"v0.30.2: Update from single file default repository","summary":"## All commits\r\n\r\n* update runway repo for single_file  by @yiyixuxu in #9323\r\n* Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1  by ...","content":"## All commits\r\n\r\n* update runway repo for single_file  by @yiyixuxu in #9323\r\n* Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1  by @DN6  in #9280\r\n* [IP Adapter] Fix cache_dir and local_files_only for image encoder by @asomoza in #9272","publishedAt":"2024-08-31T00:23:16.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.30.2","media":[]},{"id":"rel_I9lO4y_2gGLZqD85QByl3","version":"v0.30.1","title":"V0.30.1: CogVideoX-5B & Bug fixes","summary":"## CogVideoX-5B\r\n\r\nThis patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on t...","content":"## CogVideoX-5B\r\n\r\nThis patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on the Huggingface Hub at `THUDM/CogVideoX-5b`. Stay tuned for the release!\r\n\r\nAdditionally, we have implemented VAE tiling feature, which reduces the memory requirement for CogVideoX models. With this update, the total memory requirement is now 12GB for CogVideoX-2B and 21GB for CogVideoX-5B (with CPU offloading). To Enable this feature, simply call `enable_tiling()` on the VAE.\r\n\r\nThe code below shows how to generate a video with CogVideoX-5B\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import CogVideoXPipeline\r\nfrom diffusers.utils import export_to_video\r\n\r\nprompt = \"Tracking shot,late afternoon light casting long shadows,a cyclist in athletic gear pedaling down a scenic mountain road,winding path with trees and a lake in the background,invigorating and adventurous atmosphere.\"\r\n\r\npipe = CogVideoXPipeline.from_pretrained(\r\n    \"THUDM/CogVideoX-5b\",\r\n    torch_dtype=torch.bfloat16\r\n)\r\n\r\npipe.enable_model_cpu_offload()\r\npipe.vae.enable_tiling()\r\n\r\nvideo = pipe(\r\n    prompt=prompt,\r\n    num_videos_per_prompt=1,\r\n    num_inference_steps=50,\r\n    num_frames=49,\r\n    guidance_scale=6,\r\n).frames[0]\r\n\r\nexport_to_video(video, \"output.mp4\", fps=8)\r\n```\r\n\r\nhttps://github.com/user-attachments/assets/c2d4f7e8-ef86-4da6-8085-cb9f83f47f34\r\n\r\nRefer to our [documentation](https://huggingface.co/docs/diffusers/api/pipelines/cogvideox) to learn more about it. \r\n\r\n## All commits\r\n\r\n- Update Video Loading/Export to use `imageio` by @DN6 in #9094\r\n- [refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150\r\n- Add Learned PE selection for Auraflow by @cloneofsimo in #9182\r\n- [Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229\r\n- [Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236\r\n- [tests] fix broken xformers tests by @a-r-r-o-w in #9206\r\n- Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203\r\n- [Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243\r\n\r\n","publishedAt":"2024-08-24T07:26:30.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.30.1","media":[]},{"id":"rel_eyJowIxxxMrmqOUvuuTy6","version":"v0.30.0","title":"v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactors","summary":"## New pipelines\r\n\r\n![Untitled](https://github.com/user-attachments/assets/a313ceba-248b-4c09-9f0e-85050b4c3df7)\r\n\r\nImage taken from the [Lumina’s Git...","content":"## New pipelines\r\n\r\n![Untitled](https://github.com/user-attachments/assets/a313ceba-248b-4c09-9f0e-85050b4c3df7)\r\n\r\nImage taken from the [Lumina’s GitHub](https://github.com/Alpha-VLLM/Lumina-T2X/blob/main/assets/lumina-next.pdf). \r\n\r\nThis release features many new pipelines. Below, we provide a list:\r\n\r\n**Audio pipelines 🎼**\r\n\r\n- [Stable Audio](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_audio)\r\n\r\n**Video pipelines 📹**\r\n\r\n- [Latte](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latte) (thanks to @maxin-cn for the contribution through #8404)\r\n- [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox) (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)\r\n\r\n**Image pipelines 🎇**\r\n\r\n- [Lumina](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina) (thanks to @PommesPeter for the contribution through #8652)\r\n- [Kolors](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kolors)\r\n- [AuraFlow](https://huggingface.co/docs/diffusers/main/en/api/pipelines/aura_flow)\r\n- [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux)\r\n\r\nBe sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:\r\n\r\n- Lumina introduces a new DiT architecture that is multilingual in nature.\r\n- Kolors is inspired by SDXL and is also multilingual in nature.\r\n- Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date.  For efficient DreamBooth + LoRA training, we recommend @bghira’s guide [here](https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md).\r\n- We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with `optimum.quanto`. Check it out [here](https://huggingface.co/blog/quanto-diffusers).\r\n- CogVideoX introduces a novel and truly 3D VAE into Diffusers.\r\n\r\n## Perturbed Attention Guidance (PAG)\r\n\r\n| Without PAG | With PAG |\r\n|-------------|----------|\r\n| ![](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/pag_0.0_cfg_7.0_mid.png)   | ![](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/pag_3.0_cfg_7.0_mid.png)|\r\n\r\nWe already had community pipelines for [PAG](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/), but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG [here](https://huggingface.co/docs/diffusers/main/en/using-diffusers/pag), which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently [support](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pag) the following pipelines with PAG:\r\n\r\n- `StableDiffusionPAGPipeline`\r\n- `StableDiffusion3PAGPipeline`\r\n- `StableDiffusionControlNetPAGPipeline`\r\n- `StableDiffusionXLPAGPipeline`\r\n- `StableDiffusionXLPAGImg2ImgPipeline`\r\n- `StableDiffusionXLPAGInpaintPipeline`\r\n- `StableDiffusionXLControlNetPAGPipeline`\r\n- `StableDiffusion3PAGPipeline`\r\n- `PixArtSigmaPAGPipeline`\r\n- `HunyuanDiTPAGPipeline`\r\n- `AnimateDiffPAGPipeline`\r\n- `KolorsPAGPipeline`\r\n\r\nIf you’re interested in helping us extend our PAG support for other pipelines, please check out [this thread](https://github.com/huggingface/diffusers/issues/8785). \r\nSpecial thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.\r\n\r\n## AnimateDiff with SparseCtrl\r\n\r\nSparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by [ControlNet](https://arxiv.org/abs/2302.05543), to process these signals in the [AnimateDiff](https://arxiv.org/abs/2307.04725) framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933).\r\n\r\nThere are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:\r\n\r\n- [SparseCtrl Scribble](https://huggingface.co/guoyww/animatediff-sparsectrl-scribble)\r\n- [SparseCtrl RGB](https://huggingface.co/guoyww/animatediff-sparsectrl-rgb)\r\n- [Motion LoRA v1-5-3](https://huggingface.co/guoyww/animatediff-motion-lora-v1-5-3)\r\n\r\n**Scribble Interpolation Example:**\r\n\r\n<table>\r\n    <tr>\r\n        <td><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png\" alt=\"Image 1\"></td>\r\n        <td><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png\" alt=\"Image 2\"></td>\r\n        <td><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png\" alt=\"Image 3\"></td>\r\n    </tr>\r\n    <tr>\r\n        <td colspan=\"3\" style=\"text-align: center; vertical-align: middle;\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-sparsectrl-scribble-results.gif\" alt=\"Image 4\"></td>\r\n    </tr>\r\n</table>\r\n\r\n```python\r\nimport torch\r\n\r\nfrom diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel\r\nfrom diffusers.schedulers import DPMSolverMultistepScheduler\r\nfrom diffusers.utils import export_to_gif, load_image\r\n\r\nmotion_adapter = MotionAdapter.from_pretrained(\"guoyww/animatediff-motion-adapter-v1-5-3\", torch_dtype=torch.float16).to(device)\r\ncontrolnet = SparseControlNetModel.from_pretrained(\"guoyww/animatediff-sparsectrl-scribble\", torch_dtype=torch.float16).to(device)\r\nvae = AutoencoderKL.from_pretrained(\"stabilityai/sd-vae-ft-mse\", torch_dtype=torch.float16).to(device)\r\npipe = AnimateDiffSparseControlNetPipeline.from_pretrained(\r\n    \"SG161222/Realistic_Vision_V5.1_noVAE\",\r\n    motion_adapter=motion_adapter,\r\n    controlnet=controlnet,\r\n    vae=vae,\r\n    scheduler=scheduler,\r\n    torch_dtype=torch.float16,\r\n).to(device)\r\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, beta_schedule=\"linear\", algorithm_type=\"dpmsolver++\", use_karras_sigmas=True)\r\npipe.load_lora_weights(\"guoyww/animatediff-motion-lora-v1-5-3\", adapter_name=\"motion_lora\")\r\npipe.fuse_lora(lora_scale=1.0)\r\n\r\nprompt = \"an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality\"\r\nnegative_prompt = \"low quality, worst quality, letterboxed\"\r\n\r\nimage_files = [\r\n    \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png\",\r\n    \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png\",\r\n    \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png\"\r\n]\r\ncondition_frame_indices = [0, 8, 15]\r\nconditioning_frames = [load_image(img_file) for img_file in image_files]\r\n\r\nvideo = pipe(\r\n    prompt=prompt,\r\n    negative_prompt=negative_prompt,\r\n    num_inference_steps=25,\r\n    conditioning_frames=conditioning_frames,\r\n    controlnet_conditioning_scale=1.0,\r\n    controlnet_frame_indices=condition_frame_indices,\r\n    generator=torch.Generator().manual_seed(1337),\r\n).frames[0]\r\nexport_to_gif(video, \"output.gif\")\r\n```\r\n\r\n📜 Check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/animatediff).\r\n\r\n## FreeNoise for AnimateDiff\r\n\r\nFreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.  \r\n\r\nInstead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.   \r\n \r\n\r\n```python\r\nimport torch\r\nfrom diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler\r\nfrom diffusers.utils import export_to_gif\r\n\r\nadapter = MotionAdapter.from_pretrained(\"guoyww/animatediff-motion-adapter-v1-5-2\", torch_dtype=torch.float16)\r\npipe = AnimateDiffPipeline.from_pretrained(\"SG161222/Realistic_Vision_V6.0_B1_noVAE\", motion_adapter=adapter, torch_dtype=torch.float16)\r\npipe.scheduler = EulerAncestralDiscreteScheduler(\r\n    beta_schedule=\"linear\",\r\n    beta_start=0.00085,\r\n    beta_end=0.012,\r\n)\r\n\r\npipe.enable_free_noise()\r\npipe.vae.enable_slicing()\r\n\r\npipe.enable_model_cpu_offload()\r\nframes = pipe(\r\n    \"An astronaut riding a horse on Mars.\",\r\n    num_frames=64,\r\n    num_inference_steps=20,\r\n    guidance_scale=7.0,\r\n    decode_chunk_size=2,\r\n).frames[0]\r\n\r\nexport_to_gif(frames, \"freenoise-64.gif\")\r\n```\r\n\r\n## LoRA refactor\r\n\r\nWe have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a `LoraBaseMixin` class which is subclassed by the different pipeline-level LoRA loading classes such as `StableDiffusionXLLoraLoaderMixin`. [This document](https://huggingface.co/docs/diffusers/main/en/api/loaders/lora) provides an overview of the available classes. \r\n\r\nAdditionally, we have increased the coverage of methods within the [`PeftAdapterMixin` class](https://huggingface.co/docs/diffusers/main/en/api/loaders/peft).  This refactoring allows all the supported models to share common LoRA functionalities such `set_adapter()`, `add_adapter()`, and so on. \r\n\r\nTo learn more details, please follow [this PR](https://github.com/huggingface/diffusers/pull/8774). If you see any LoRA-related issues stemming from these refactors, please open an issue. \r\n\r\n## 🚨 Fixing attention projection fusion\r\n\r\nWe discovered that the implementation of [`fuse_qkv_projections()`](https://github.com/huggingface/diffusers/pull/8774) was broken. This was fixed in [this PR](https://github.com/huggingface/diffusers/pull/8829). Additionally, [this PR](https://github.com/huggingface/diffusers/pull/8952) added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available [here](https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834). \r\n\r\n## All commits\r\n\r\n* [Release notification] add some info when there is an error.  by @sayakpaul in #8718\r\n* Modify FlowMatch Scale Noise  by @asomoza in #8678\r\n* Fix json WindowsPath crash  by @vincedovy in #8662\r\n* Motion Model / Adapter versatility  by @Arlaz in #8301\r\n* [Chore] perform better deprecation for vqmodeloutput  by @sayakpaul in #8719\r\n* [Advanced dreambooth lora] adjustments to align with canonical script  by @linoytsaban in #8406\r\n* [Tests] Fix precision related issues in slow pipeline tests  by @DN6 in #8720\r\n* fix: ValueError when using FromOriginalModelMixin in subclasses #8440  by @fkcptlst in #8454\r\n* [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline  by @asomoza in #8679\r\n* Benchmarking workflow fix  by @sayakpaul in #8389\r\n* add PAG support for SD architecture  by @shauray8 in #8725\r\n* shift cache in benchmarking.  by @sayakpaul in #8740\r\n* [train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env  by @Bhavay-2001 in #8476\r\n* fix the LR schedulers  for `dreambooth_lora`  by @WenheLI in #8510\r\n* [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support  by @gnobitab in #8747\r\n* Always raise from previous error  by @Wauplin in #8751\r\n* [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart  by @yiyixuxu in #8735\r\n* Remove legacy single file model loading mixins  by @DN6 in #8754\r\n* Allow from_transformer in SD3ControlNetModel  by @haofanwang in #8749\r\n* [SD3 LoRA Training] Fix errors when not training text encoders  by @asomoza in #8743\r\n* [Tests] add test suite for SD3 DreamBooth  by @sayakpaul in #8650\r\n* [hunyuan-dit] refactor `HunyuanCombinedTimestepTextSizeStyleEmbedding`  by @yiyixuxu in #8761\r\n* Enforce ordering when running Pipeline slow tests   by @DN6 in #8763\r\n* Fix warning in UNetMotionModel  by @DN6 in #8756\r\n* Fix indent in dreambooth lora advanced  SD 15 script   by @DN6 in #8753\r\n* Fix mistake in Single File Docs page  by @DN6 in #8765\r\n* Reflect few contributions on `philosophy.md` that were not reflected on #8294  by @mreraser in #8690\r\n* correct `attention_head_dim` for `JointTransformerBlock`  by @yiyixuxu in #8608\r\n* [LoRA] introduce `LoraBaseMixin` to promote reusability.  by @sayakpaul in #8670\r\n* Revert \"[LoRA] introduce `LoraBaseMixin` to promote reusability.\"  by @sayakpaul in #8773\r\n* Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab  by @sayakpaul in #8762\r\n* Update README.md to include Colab link  by @sayakpaul in #8775\r\n* [Chore] add dummy lora attention processors to prevent failures in other libs  by @sayakpaul in #8777\r\n* [advanced dreambooth lora] add clip_skip arg  by @linoytsaban in #8715\r\n* [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet  by @gnobitab in #8783\r\n* Fix minor bug in SD3 img2img test  by @a-r-r-o-w in #8779\r\n* [Tests] fix sharding tests  by @sayakpaul in #8764\r\n* Add vae_roundtrip.py example  by @thomaseding in #7104\r\n* [Single File] Allow loading T5 encoder in mixed precision   by @DN6 in #8778\r\n* Fix saving text encoder weights and kohya weights in advanced dreambooth lora script  by @DN6 in #8766\r\n* Improve model card for `push_to_hub` trainers  by @apolinario in #8697\r\n* fix loading sharded checkpoints from subfolder  by @yiyixuxu in #8798\r\n* [Alpha-VLLM Team] Add Lumina-T2X to diffusers  by @PommesPeter in #8652\r\n* Fix static typing and doc typos  by @zhuoqun-chen in #8807\r\n* Remove unnecessary lines  by @tolgacangoz in #8569\r\n* Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference  by @IrohXu in #8709\r\n* [Tests] fix more sharding tests  by @sayakpaul in #8797\r\n* Reformat docstring for `get_timestep_embedding`  by @alanhdu in #8811\r\n* Latte: Latent Diffusion Transformer for Video Generation  by @maxin-cn in #8404\r\n* [Core] Add Kolors  by @asomoza in #8812\r\n* [Core] Add AuraFlow  by @sayakpaul in #8796\r\n* Add VAE tiling option for SD3  by @DN6 in #8791\r\n* Add single file loading support for AnimateDiff   by @DN6 in #8819\r\n* [Docs] add AuraFlow docs  by @sayakpaul in #8851\r\n* [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU  by @ustcuna in #8643\r\n* add PAG support sd15 controlnet  by @tuanh123789 in #8820\r\n* [tests] fix typo in pag tests  by @a-r-r-o-w in #8845\r\n* [Docker] include python3.10 dev and solve header missing problem  by @sayakpaul in #8865\r\n* [`Cont'd`] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step  by @tolgacangoz in #8269\r\n* modify pocs.  by @sayakpaul in #8867\r\n* [Core] fix: shard loading and saving when variant is provided.  by @sayakpaul in #8869\r\n* [Chore] allow auraflow latest to be torch compile compatible.  by @sayakpaul in #8859\r\n* Add AuraFlowPipeline and KolorsPipeline to auto map  by @Beinsezii in #8849\r\n* Fix multi-gpu case for `train_cm_ct_unconditional.py`  by @tolgacangoz in #8653\r\n* [docs] pipeline docs for latte  by @a-r-r-o-w in #8844\r\n* [Chore] add disable forward chunking to SD3 transformer.  by @sayakpaul in #8838\r\n* [Core] remove `resume_download` from Hub related stuff  by @sayakpaul in #8648\r\n* Add option to SSH into CPU runner.   by @DN6 in #8884\r\n* SSH into cpu runner fix  by @DN6 in #8888\r\n* SSH into cpu runner additional fix  by @DN6 in #8893\r\n* [SDXL] Fix uncaught error with image to image  by @asomoza in #8856\r\n* fix loop bug in SlicedAttnProcessor  by @shinetzh in #8836\r\n* [fix code annotation] Adjust the dimensions of the rotary positional embedding.  by @wangqixun in #8890\r\n* allow tensors in several schedulers step() call  by @catwell in #8905\r\n* Use model_info.id instead of model_info.modelId  by @Wauplin in #8912\r\n* [Training] SD3 training fixes  by @sayakpaul in #8917\r\n* 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc)  by @Snailpong in #8804\r\n* [Docs] small fixes to pag guide.  by @sayakpaul in #8920\r\n* Reflect few contributions on `ethical_guidelines.md` that were not reflected on #8294  by @mreraser in #8914\r\n* [Tests] proper skipping of request caching test  by @sayakpaul in #8908\r\n* Add attentionless VAE support  by @Gothos in #8769\r\n* [Benchmarking] check if runner helps to restore benchmarking  by @sayakpaul in #8929\r\n* Update pipeline test fetcher  by @DN6 in #8931\r\n* [Tests] reduce the model size in the audioldm2 fast test  by @ariG23498 in #7846\r\n* fix: checkpoint save issue in advanced dreambooth lora sdxl script  by @akbaig in #8926\r\n* [Tests] Improve transformers model test suite coverage - Temporal Transformer  by @rootonchair in #8932\r\n* Fix Colab and Notebook checks for `diffusers-cli env`  by @tolgacangoz in #8408\r\n* Fix name when saving text inversion embeddings in dreambooth advanced scripts  by @DN6 in #8927\r\n* [Core] fix QKV fusion for attention  by @sayakpaul in #8829\r\n* remove residual i from auraflow.  by @sayakpaul in #8949\r\n* [CI] Skip flaky download tests in PR CI  by @DN6 in #8945\r\n* [AuraFlow] fix long prompt handling  by @sayakpaul in #8937\r\n* Added Code for Gradient Accumulation to work for basic_training  by @RandomGamingDev in #8961\r\n* [AudioLDM2] Fix cache pos for GPT-2 generation  by @sanchit-gandhi in #8964\r\n* [Tests] fix slices of 26 tests (first half)  by @sayakpaul in #8959\r\n* [CI] Slow Test Updates  by @DN6 in #8870\r\n* [tests] speed up animatediff tests  by @a-r-r-o-w in #8846\r\n* [LoRA] introduce LoraBaseMixin to promote reusability.  by @sayakpaul in #8774\r\n* Update TensorRT img2img community pipeline  by @asfiyab-nvidia in #8899\r\n* Enable CivitAI SDXL Inpainting Models Conversion  by @mazharosama in #8795\r\n* Revert \"[LoRA] introduce LoraBaseMixin to promote reusability.\"  by @yiyixuxu in #8976\r\n* fix guidance_scale value not equal to the value in comments  by @efwfe in #8941\r\n* [Chore] remove all is from auraflow.  by @sayakpaul in #8980\r\n* [Chore] add `LoraLoaderMixin` to the inits  by @sayakpaul in #8981\r\n* Added `accelerator` based gradient accumulation for basic_example  by @RandomGamingDev in #8966\r\n* [CI] Fix parallelism in nightly tests  by @DN6 in #8983\r\n* [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix   by @DN6 in #8986\r\n* [fix] FreeInit step index out of bounds  by @a-r-r-o-w in #8969\r\n* [core] AnimateDiff SparseCtrl  by @a-r-r-o-w in #8897\r\n* remove unused code from pag attn procs  by @a-r-r-o-w in #8928\r\n* [Kolors] Add IP Adapter  by @asomoza in #8901\r\n* [CI] Update runner configuration for setup and nightly tests  by @XciD in #9005\r\n* [Docs] credit where it's due for Lumina and Latte.  by @sayakpaul in #9000\r\n* handle lora scale and clip skip in lpw sd and sdxl community pipelines  by @noskill in #8988\r\n* [LoRA] fix: animate diff lora stuff.  by @sayakpaul in #8995\r\n* Stable Audio integration  by @ylacombe in #8716\r\n* [core] Move community AnimateDiff ControlNet to core  by @a-r-r-o-w in #8972\r\n* Fix Stable Audio repository id  by @ylacombe in #9016\r\n* PAG variant for AnimateDiff  by @a-r-r-o-w in #8789\r\n* Updates deps for pipeline test fetcher  by @DN6 in #9033\r\n* fix load sharded checkpoint from a subfolder (local path)   by @yiyixuxu in #8913\r\n* [docs] fix pia example  by @a-r-r-o-w in #9015\r\n* Flux pipeline  by @sayakpaul in #9043\r\n* [Core] Add PAG support for PixArtSigma   by @sayakpaul in #8921\r\n* [Flux] allow tests to run  by @sayakpaul in #9050\r\n* Fix Nightly Deps  by @DN6 in #9036\r\n* Update transformer_flux.py  by @haofanwang in #9060\r\n* Errata: Fix typos & `\\s+$`  by @tolgacangoz in #9008\r\n* [refactor] create modeling blocks specific to AnimateDiff  by @a-r-r-o-w in #8979\r\n* Fix grammar mistake.  by @prideout in #9072\r\n* [Flux] minor documentation fixes for flux.  by @sayakpaul in #9048\r\n* Update TensorRT txt2img and inpaint community pipelines  by @asfiyab-nvidia in #9037\r\n* type `get_attention_scores` as optional in `get_attention_scores`  by @psychedelicious in #9075\r\n* [refactor] apply qk norm in attention processors  by @a-r-r-o-w in #9071\r\n* [FLUX] support LoRA  by @sayakpaul in #9057\r\n* [Tests] Improve transformers model test suite coverage - Latte  by @rootonchair in #8919\r\n* PAG variant for HunyuanDiT, PAG refactor  by @a-r-r-o-w in #8936\r\n* [Docs] add stable cascade unet doc.  by @sayakpaul in #9066\r\n* add sentencepiece as a soft dependency   by @yiyixuxu in #9065\r\n* Fix typos  by @omahs in #9077\r\n* Update `CLIPFeatureExtractor` to `CLIPImageProcessor` and `DPTFeatureExtractor` to `DPTImageProcessor`  by @tolgacangoz in #9002\r\n* [Core] add QKV fusion to AuraFlow and PixArt Sigma  by @sayakpaul in #8952\r\n* [bug] remove unreachable norm_type=ada_norm_continuous from norm3 initialization conditions  by @a-r-r-o-w in #9006\r\n* [Tests] Improve transformers model test suite coverage - Hunyuan DiT  by @rootonchair in #8916\r\n* update by @DN6 (direct commit on v0.30.0-release)\r\n* [Docs] Add community projects section to docs  by @DN6 in #9013\r\n* add PAG support for Stable Diffusion 3  by @sunovivid in #8861\r\n* Fix loading sharded checkpoints when we have variants  by @SunMarc in #9061\r\n* [Single File] Add single file support for Flux Transformer  by @DN6 in #9083\r\n* [Kolors] Add PAG  by @asomoza in #8934\r\n* fix train_dreambooth_lora_sd3.py loading hook  by @sayakpaul in #9107\r\n* [core] FreeNoise  by @a-r-r-o-w in #8948\r\n* Flux fp16 inference fix  by @latentCall145 in #9097\r\n* [feat] allow sparsectrl to be loaded from single file  by @a-r-r-o-w in #9073\r\n* Freenoise change `vae_batch_size` to `decode_chunk_size`  by @DN6 in #9110\r\n* Add CogVideoX text-to-video generation model  by @zRzRzRzRzRzRzR in #9082\r\n* Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @DN6\r\n    * [Tests] Fix precision related issues in slow pipeline tests (#8720)\r\n    * Remove legacy single file model loading mixins (#8754)\r\n    * Enforce ordering when running Pipeline slow tests  (#8763)\r\n    * Fix warning in UNetMotionModel (#8756)\r\n    * Fix indent in dreambooth lora advanced  SD 15 script  (#8753)\r\n    * Fix mistake in Single File Docs page (#8765)\r\n    * [Single File] Allow loading T5 encoder in mixed precision  (#8778)\r\n    * Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)\r\n    * Add VAE tiling option for SD3 (#8791)\r\n    * Add single file loading support for AnimateDiff  (#8819)\r\n    * Add option to SSH into CPU runner.  (#8884)\r\n    * SSH into cpu runner fix (#8888)\r\n    * SSH into cpu runner additional fix (#8893)\r\n    * Update pipeline test fetcher (#8931)\r\n    * Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)\r\n    * [CI] Skip flaky download tests in PR CI (#8945)\r\n    * [CI] Slow Test Updates (#8870)\r\n    * [CI] Fix parallelism in nightly tests (#8983)\r\n    * [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix  (#8986)\r\n    * Updates deps for pipeline test fetcher (#9033)\r\n    * Fix Nightly Deps (#9036)\r\n    * update\r\n    * [Docs] Add community projects section to docs (#9013)\r\n    * [Single File] Add single file support for Flux Transformer (#9083)\r\n    * Freenoise change `vae_batch_size` to `decode_chunk_size` (#9110)\r\n* @shauray8\r\n    * add PAG support for SD architecture (#8725)\r\n* @gnobitab\r\n    * [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)\r\n    * [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)\r\n* @yiyixuxu\r\n    * [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)\r\n    * [hunyuan-dit] refactor `HunyuanCombinedTimestepTextSizeStyleEmbedding` (#8761)\r\n    * correct `attention_head_dim` for `JointTransformerBlock` (#8608)\r\n    * fix loading sharded checkpoints from subfolder (#8798)\r\n    * Revert \"[LoRA] introduce LoraBaseMixin to promote reusability.\" (#8976)\r\n    * fix load sharded checkpoint from a subfolder (local path)  (#8913)\r\n    * add sentencepiece as a soft dependency  (#9065)\r\n* @PommesPeter\r\n    * [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)\r\n* @IrohXu\r\n    * Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)\r\n* @maxin-cn\r\n    * Latte: Latent Diffusion Transformer for Video Generation (#8404)\r\n* @ustcuna\r\n    * [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)\r\n* @tuanh123789\r\n    * add PAG support sd15 controlnet (#8820)\r\n* @Snailpong\r\n    * 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)\r\n* @asfiyab-nvidia\r\n    * Update TensorRT img2img community pipeline (#8899)\r\n    * Update TensorRT txt2img and inpaint community pipelines (#9037)\r\n* @ylacombe\r\n    * Stable Audio integration (#8716)\r\n    * Fix Stable Audio repository id (#9016)\r\n* @sunovivid\r\n    * add PAG support for Stable Diffusion 3 (#8861)\r\n* @zRzRzRzRzRzRzR\r\n    * Add CogVideoX text-to-video generation model (#9082)","publishedAt":"2024-08-07T07:47:28.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.30.0","media":[]},{"id":"rel_G8eR-RaQlm-7LKIPfJafv","version":"v0.29.2","title":"v0.29.2: fix deprecation and LoRA bugs 🐞","summary":"## All commits\r\n\r\n* [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None)  by @Dalanke in #8558\r\n* [LoRA] ...","content":"## All commits\r\n\r\n* [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None)  by @Dalanke in #8558\r\n* [LoRA] refactor lora conversion utility.  by @sayakpaul in #8295\r\n* [LoRA] fix conversion utility so that lora dora loads correctly  by @sayakpaul in #8688\r\n* [Chore] remove deprecation from transformer2d regarding the output class.  by @sayakpaul in #8698\r\n* [LoRA] fix vanilla fine-tuned lora loading.  by @sayakpaul in #8691\r\n* Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)","publishedAt":"2024-06-27T03:59:48.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.29.2","media":[]},{"id":"rel_W3ifHDOYXjofu8hIzTP4T","version":"v0.29.1","title":"v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixes","summary":"## SD3 CntrolNet\r\n<img width=\"624\" alt=\"image\" src=\"https://github.com/huggingface/diffusers/assets/46553287/db384753-cfbb-488c-bc74-8280f9bee24e\">\r\n\r...","content":"## SD3 CntrolNet\r\n<img width=\"624\" alt=\"image\" src=\"https://github.com/huggingface/diffusers/assets/46553287/db384753-cfbb-488c-bc74-8280f9bee24e\">\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import StableDiffusion3ControlNetPipeline\r\nfrom diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel\r\nfrom diffusers.utils import load_image\r\n\r\ncontrolnet = SD3ControlNetModel.from_pretrained(\"InstantX/SD3-Controlnet-Canny\", torch_dtype=torch.float16)\r\n\r\npipe = StableDiffusion3ControlNetPipeline.from_pretrained(\r\n    \"stabilityai/stable-diffusion-3-medium-diffusers\", controlnet=controlnet, torch_dtype=torch.float16\r\n)\r\npipe.to(\"cuda\")\r\ncontrol_image = load_image(\"https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg\")\r\nprompt = \"A girl holding a sign that says InstantX\"\r\nimage = pipe(prompt, control_image=control_image, controlnet_conditioning_scale=0.7).images[0]\r\nimage.save(\"sd3.png\")\r\n```\r\n📜 Refer to the official docs [here](https://huggingface.co/docs/diffusers/api/pipelines/controlnet_sd3) to learn more about it. \r\n\r\nThanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!\r\n\r\n\r\n## Expanded single file support \r\nWe now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import StableDiffusion3Pipeline\r\n\r\npipe = StableDiffusion3Pipeline.from_single_file(\r\n    \"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors\",\r\n    torch_dtype=torch.float16,\r\n)\r\npipe.enable_model_cpu_offload()\r\n\r\nimage = pipe(\"a picture of a cat holding a sign that says hello world\").images[0]\r\nimage.save('sd3-single-file-t5-fp8.png')\r\n```\r\n\r\n## Using Long Prompts with the T5 Text Encoder\r\nWe increased the default sequence length for the T5 Text Encoder from a maximum of `77` to `256`!  It can be adjusted to accept fewer or more tokens by setting the `max_sequence_length` to a maximum of `512`. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.\r\n\r\n```python\r\nprompt = \"A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. It’s set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy.\"\r\n\r\nimage = pipe(\r\n    prompt=prompt,\r\n    negative_prompt=\"\",\r\n    num_inference_steps=28,\r\n    guidance_scale=4.5,\r\n    max_sequence_length=512,\r\n).images[0]\r\n```\r\n\r\n|Before|max_sequence_length=256|max_sequence_length=512\r\n|---|---|---|\r\n|![20240612204503_2888268196](https://github.com/huggingface/diffusers/assets/5442875/e5ab1053-f819-4314-b676-80bef759aa71)|![20240612204440_2888268196](https://github.com/huggingface/diffusers/assets/5442875/6bda088f-8ee4-42ff-88bc-ac3129a92d31)|![20240613195139_569754043](https://github.com/huggingface/diffusers/assets/5442875/ca6940d4-7459-451f-80f9-c591c611aba0)\r\n\r\n## All commits\r\n\r\n* Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)\r\n* prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)\r\n* fix warning log for Transformer SD3  by @sayakpaul in #8496\r\n* Add SD3 AutoPipeline mappings  by @Beinsezii in #8489\r\n* Add Hunyuan AutoPipe mapping  by @Beinsezii in #8505\r\n* Expand Single File support in SD3 Pipeline    by @DN6 in #8517\r\n* [Single File Loading] Handle unexpected keys in CLIP models when `accelerate` isn't installed.   by @DN6 in #8462\r\n* Fix sharding when no device_map is passed  by @SunMarc in #8531\r\n* [SD3 Inference] T5 Token limit  by @asomoza in #8506\r\n* Fix gradient checkpointing issue for Stable Diffusion 3  by @Carolinabanana in #8542\r\n* Support SD3 ControlNet and Multi-ControlNet.  by @wangqixun in #8566\r\n* fix from_single_file for checkpoints with t5  by @yiyixuxu in #8631\r\n* [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None)  by @Dalanke in #8558\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @wangqixun\r\n    * Support SD3 ControlNet and Multi-ControlNet. (#8566)\r\n","publishedAt":"2024-06-21T01:50:52.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.29.1","media":[]},{"id":"rel_dELGZpmNkqr7DMeT0vhgb","version":"v0.29.0","title":"v0.29.0: Stable Diffusion 3","summary":"This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in [Scaling Re...","content":"This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206) by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. \r\n\r\nAs the model is gated, before using it with `diffusers`, you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate.\r\n\r\n```bash\r\nhuggingface-cli login\r\n```\r\n\r\nThe code below shows how to perform text-to-image generation with SD3:\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import StableDiffusion3Pipeline\r\n\r\npipe = StableDiffusion3Pipeline.from_pretrained(\"stabilityai/stable-diffusion-3-medium-diffusers\", torch_dtype=torch.float16)\r\npipe = pipe.to(\"cuda\")\r\n\r\nimage = pipe(\r\n    \"A cat holding a sign that says hello world\",\r\n    negative_prompt=\"\",\r\n    num_inference_steps=28,\r\n    guidance_scale=7.0,\r\n).images[0]\r\nimage\r\n```\r\n\r\n![image](https://github.com/huggingface/diffusers/assets/22957388/30917935-6649-447e-8bf2-c4c9378562de)\r\n\r\nRefer to [our documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline. \r\n\r\nAdditionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out [this directory](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md) for more details. \r\n","publishedAt":"2024-06-12T20:14:03.000Z","url":"https://github.com/huggingface/diffusers/releases/tag/v0.29.0","media":[]}],"pagination":{"page":1,"pageSize":20,"totalPages":5,"totalItems":90},"summaries":{"rolling":{"windowDays":90,"summary":"Diffusers shifted toward compositional pipeline architecture with Modular Diffusers, letting developers build custom workflows by mixing reusable blocks instead of writing monolithic pipelines from scratch. The release added support for new image and video generation models including Z Image Omni Base, while subsequent patches fixed type hint handling in modular pipelines and LoRA loading for Flux Klein.","releaseCount":2,"generatedAt":"2026-04-07T17:27:30.394Z"},"monthly":[{"year":2026,"month":3,"summary":"Modular Diffusers shipped as the month's centerpiece, introducing composable pipeline building blocks as an alternative to monolithic `DiffusionPipeline` implementations. Alongside this architectural addition, the release brought new image and video pipelines including Z Image Omni Base, expanded core library improvements, and follow-up fixes for type hints in modular pipelines and Flux Klein LoRA loading.","releaseCount":2,"generatedAt":"2026-04-07T17:27:33.943Z"}]}}