{"id":"src_IUEmstpqhDfKSwUnkELwV","slug":"transformers","name":"Transformers","type":"github","url":"https://github.com/huggingface/transformers","orgId":"org_GDdYeYynEgCEBNBwy-m6s","org":{"slug":"hugging-face","name":"Hugging Face"},"isPrimary":false,"metadata":"{\"evaluatedMethod\":\"github\",\"evaluatedAt\":\"2026-04-07T17:19:13.059Z\",\"changelogDetectedAt\":\"2026-04-07T17:27:13.693Z\"}","releaseCount":104,"releasesLast30Days":6,"avgReleasesPerWeek":0.9,"latestVersion":"v5.5.4","latestDate":"2026-04-13T16:58:06.000Z","changelogUrl":null,"hasChangelogFile":false,"lastFetchedAt":"2026-04-19T03:01:57.719Z","trackingSince":"2024-04-23T22:01:20.000Z","releases":[{"id":"rel_JoQaDjz-aM6aKFvyCAL71","version":"v5.5.4","title":"Patch release v5.5.4","summary":"# Patch release v5.5.4\r\n\r\nThis is mostly some fixes that are good to have asap, mostly for tokenizers;\r\n** Fix Kimi-K2.5 tokenizer regression and _pat...","content":"# Patch release v5.5.4\r\n\r\nThis is mostly some fixes that are good to have asap, mostly for tokenizers;\r\n** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker\r\n\r\nFor training:\r\n** Fix #45305 + add regression test GAS (#45349) by florian6973, SunMarc\r\n** Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#…) by ArthurZucker\r\n\r\nAnd for Qwen2.5-VL :\r\n** Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchini-nlp","publishedAt":"2026-04-13T16:58:06.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.5.4","media":[]},{"id":"rel_pn6XCnvJ5am5LE5HwbTZk","version":"v5.5.3","title":"Patch release: v5.5.3","summary":"Small patch release to fix `device_map` support for Gemma4! It contains the following commit:\r\n\r\n- [gemma4] Fix device map auto (#45347) by @Cyrilvall...","content":"Small patch release to fix `device_map` support for Gemma4! It contains the following commit:\r\n\r\n- [gemma4] Fix device map auto (#45347) by @Cyrilvallez ","publishedAt":"2026-04-09T15:53:11.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.5.3","media":[]},{"id":"rel_q371LXouKNf6h3rN0c09v","version":"v5.5.2","title":"Patch release: v5.5.2","summary":"Small patch dedicated to optimizing gemma4, fixing inference with `use_cache=False` due to k/v states sharing between layers, as well as conversion ma...","content":"Small patch dedicated to optimizing gemma4, fixing inference with `use_cache=False` due to k/v states sharing between layers, as well as conversion mappings for some models that would inconsistently serialize their weight names. It contains the following PRs:\r\n\r\n- Add MoE to Gemma4 TP plan (#45219) by @sywangyi and @Cyrilvallez\r\n- [gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez\r\n- [gemma4] Remove all shared weights, and silently skip them during loading (#45336) by @Cyrilvallez\r\n- Fix conversion mappings for vlms (#45340) by @Cyrilvallez","publishedAt":"2026-04-09T14:05:16.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.5.2","media":[]},{"id":"rel_zo_AQBxxofaZRZpRZfonO","version":"v5.5.1","title":"Patch release v5.5.1","summary":"# Patch release v5.5.1\r\n\r\nThis patch is very small and focuses on vLLM and Gemma4! \r\n\r\n** Fix export for gemma4 and add Integration tests (#45285) by ...","content":"# Patch release v5.5.1\r\n\r\nThis patch is very small and focuses on vLLM and Gemma4! \r\n\r\n** Fix export for gemma4 and add Integration tests (#45285) by @Cyrilvallez \r\n** Fix vllm cis (#45139) by @ArthurZucker ","publishedAt":"2026-04-09T05:53:03.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.5.1","media":[]},{"id":"rel_HdyELjk8F7P3GXSsJhBYf","version":"v5.5.0","title":"Release v5.5.0","summary":"# Release v5.5.0\r\n\r\n<img width=\"2786\" height=\"1504\" alt=\"image\" src=\"https://github.com/user-attachments/assets/6c8c878f-042b-4858-9f64-73fd9ccd7e4b\" ...","content":"# Release v5.5.0\r\n\r\n<img width=\"2786\" height=\"1504\" alt=\"image\" src=\"https://github.com/user-attachments/assets/6c8c878f-042b-4858-9f64-73fd9ccd7e4b\" />\r\n\r\n## New Model additions\r\n\r\n### Gemma4\r\n\r\n[Gemma 4](INSET_PAPER_LINK) is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.\r\n\r\n<img width=\"1478\" height=\"1374\" alt=\"image\" src=\"https://github.com/user-attachments/assets/9d88bd1b-02ea-4829-b7d0-fac0e347d436\" />\r\n\r\n\r\nYou can find all the original Gemma 4 checkpoints under the [Gemma 4](https://huggingface.co/collections/google/gemma-4-release-67c6c6f89c4f76621268bb6d) release.\r\n\r\nThe key difference from previous Gemma releases is the new design to process **images of different sizes** using a **fixed-budget number of tokens**. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:\r\n- The total number of pixels must fit within a patch budget\r\n- Both height and width must be divisible by **48** (= patch size 16 × pooling kernel 3)\r\n\r\n> [!IMPORTANT]\r\n> Gemma 4 does **not** apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).\r\n\r\nThe number of \"soft tokens\" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is **280 soft tokens** per image.\r\n\r\n\r\n| Soft Tokens | Patches (before pooling) | Approx. Image Area |\r\n|:-----------:|:------------------------:|:-------------------:|\r\n| 70          | 630                      | ~161K pixels        |\r\n| 140         | 1,260                    | ~323K pixels        |\r\n| **280**     | **2,520**                | **~645K pixels**    |\r\n| 560         | 5,040                    | ~1.3M pixels        |\r\n| 1,120       | 10,080                   | ~2.6M pixels        |\r\n\r\n\r\nTo encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like \"above,\" \"below,\" \"left of,\" and \"right of.\"\r\n\r\n### NomicBERT\r\n\r\nNomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/nomic_bert) | [Paper](https://arxiv.org/abs/2402.01613)\r\n* Internalise the NomicBERT model (#43067) by @ed22699 in [#43067](https://github.com/huggingface/transformers/pull/43067)\r\n\r\n### MusicFlamingo\r\n\r\nMusic Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/musicflamingo) | [Paper](https://huggingface.co/papers/2511.10289)\r\n* Add Music Flamingo (#43538) by @lashahub in [#43538](https://github.com/huggingface/transformers/pull/43538)\r\n\r\n\r\n\r\n## Breaking changes\r\n\r\nMamba and hybrid model caches are now first-class native citizens in the library, so users working with Mamba-based or hybrid (Mamba + attention) models should update their code to use the new native cache classes instead of any previous workarounds.\r\n* 🚨 [Cache] Native mamba & hybrid cache (#44950) by @Cyrilvallez\r\n\r\nRemote code execution support has been removed from the native `LightGlue` integration, so users who were loading `LightGlue` with `trust_remote_code=True` must remove that argument and use the model directly through the standard native API.\r\n* :rotating_light: [`LightGlue`] Remove remote code execution (#45122) by @vasqu\r\n\r\n\r\n\r\n## Vision\r\n\r\nSeveral vision-related bugs were fixed in this release, including correcting the Gemma vision mask to support video inputs, resolving a dependency issue that incorrectly required torchvision for PIL-based image processors, and patching bugs in the Janus image generation model and image loading. Local code resolution for tokenizers and image processors was also corrected.\r\n\r\n\r\n* Generalize gemma vision mask to videos (#45185) by @zucchini-nlp in [#45185]\r\n* Fix explicit local code resolution for tokenizers and image processors (#45169) by @hmellor in [#45169]\r\n* fix bug for janus model image generation (#45044) by @kaixuanliu in [#45044]\r\n* [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045) by @Lidang-Jiang in [#45045]\r\n* Avoid `Image.open` failure (#44645) by @sywangyi in [#44645]\r\n\r\n\r\n## Cache\r\n\r\nImproved the performance of repository checks (`check-repo`) by introducing file-level and AST-level disk caching, achieving up to a 27x speedup (from ~46s to ~1.6s with a warm cache), and fixed the mlinter cache location in `.gitignore`.\r\n\r\n\r\n* refactoring: speedup static checks with disk cache (#44992) by @tarekziade in [#44992]\r\n* refactor: added cache in check_repo (#45012) by @tarekziade in [#45012]\r\n* chore: Fix mlinter cache location (#45052) by @tarekziade in [#45052]\r\n\r\n\r\n## Bugfixes and improvements\r\n\r\n* Fix resized LM head weights being overwritten by post_init (#45079) by @javierdejesusda in [#45079]\r\n* [Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124) by @danielquintas8 in [#45124]\r\n* fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#45074) by @harshaljanjani in [#45074]\r\n* [misc] fix qwen35 tests: correct the text model type and skip reverse_mapping (#45173) by @JJJYmmm in [#45173]\r\n* 🔒 Pin GitHub Actions to commit SHAs (#45180) by @paulinebm in [#45180]\r\n* Use doc-builder runnable example for GLM-ASR (#44277) by @tarekziade in [#44277]\r\n* CI] Small T5 expectations updated (#45138) by @Abdennacer-Badaoui in [#45138]\r\n* fix: correct type annotations across config classes for @strict validation (#45007) by @Krishnachaitanyakc in [#45007]\r\n* Fix T5Attention shape mismatch under Tensor Parallelism (#45109) by @aws-zhanxun in [#45109]\r\n* [refactor] Serving into proper modules (#44796) by @SunMarc in [#44796]\r\n* Re-add regex substitutions to the response parsing spec (#45166) by @Rocketknight1 in [#45166]\r\n* Fix incorrect TrainingArguments example in training.md (#45150) by @maanas1234 in [#45150]\r\n* Add parse_response to Processor, make it a bit more official (#45143) by @Rocketknight1 in [#45143]\r\n* DeepGEMM (#44832) by @IlyasMoutawwakil in [#44832]\r\n* fix: prefer registered config over remote code in AutoConfig.from_pretrained (#45094) by @HanFa in [#45094]\r\n* [serving] Fix continuous batching JSON response serialization (#45057) by @NathanHB in [#45057]\r\n* Fix stupid test fetcher (#45140) by @ydshieh in [#45140]\r\n* [CB] Add warmup feature (#45112) by @remi-or in [#45112]\r\n* feature: added import complexity checker (#45013) by @tarekziade in [#45013]\r\n* Fix tests for `janus` model (#44739) by @kaixuanliu in [#44739]\r\n* CB improvements for serving  (#45063) by @SunMarc in [#45063]\r\n* [docs] continuous batching (#44896) by @stevhliu in [#44896]\r\n* Fix few issues in Qwen_3_Omni_Moe (#44848) by @Sai-Suraj-27 in [#44848]\r\n* Fix TypeError in rope validation when ignore_keys is a list (#45069) by @Fr0do in [#45069]\r\n* Remove unused TensorFlow env var (#45065) by @Sai-Suraj-27 in [#45065]\r\n* fix: add identity reverse_op to dequantize ops for save_pretrained (#44983) by @Hyungkeun-Park-Nota in [#44983]\r\n* Fix when RoPE params are in kwargs (#45049) by @zucchini-nlp in [#45049]\r\n* chore: update update_metdata.yml (#45054) by @hf-security-analysis[bot] in [#45054]\r\n* [`FA`] Fix BC support for a few versions + add deprecation cycle (#45061) by @vasqu in [#45061]\r\n* fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI (#45004) by @harshaljanjani in [#45004]\r\n* Allow advanced users to override `model_type` in `AutoConfig.from_pretrained` (#45058) by @hmellor in [#45058]\r\n* Fix failing `SmolLM3IntegrationTest` (#45048) by @Sai-Suraj-27 in [#45048]\r\n* chore: remove old extras (#45024) by @tarekziade in [#45024]\r\n* Embedding VLMs don't need a head (#45000) by @zucchini-nlp in [#45000]\r\n* Fix GraniteConfig type hints to accept int for multiplier fields (#45019) by @javierdejesusda in [#45019]\r\n* fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985) by @Krishnachaitanyakc in [#44985]\r\n\r\n\r\n\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ed22699\r\n    * Internalise the NomicBERT model (#43067)\r\n* @tarekziade\r\n    * Use doc-builder runnable example for GLM-ASR (#44277)\r\n    * refactoring: speedup static checks with disk cache (#44992)\r\n    * feature: added import complexity checker (#45013)\r\n    * refactor: added cache in check_repo (#45012)\r\n    * chore: remove old extras (#45024)\r\n    * chore: Fix mlinter cache location (#45052)\r\n    * refactor: speed up docstring checker (#45009)\r\n* @Krishnachaitanyakc\r\n    * fix: correct type annotations across config classes for @strict validation (#45007)\r\n    * fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985)\r\n* @lashahub\r\n    * Add Music Flamingo (#43538)\r\n* @Lidang-Jiang\r\n    * [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045)\r\n","publishedAt":"2026-04-02T16:15:33.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.5.0","media":[]},{"id":"rel_0KtMxzjUWjYA3a2jSH5qi","version":"v5.4.0","title":"Release v5.4.0: PaddlePaddle models 🙌, Mistral 4, PI0, VidEoMT, UVDoc, SLANeXt, Jina Embeddings v3","summary":"## New Model additions\r\n\r\n### VidEoMT\r\n\r\n<img width=\"1480\" height=\"460\" alt=\"image\" src=\"https://github.com/user-attachments/assets/bec6fc25-b0ab-4227...","content":"## New Model additions\r\n\r\n### VidEoMT\r\n\r\n<img width=\"1480\" height=\"460\" alt=\"image\" src=\"https://github.com/user-attachments/assets/bec6fc25-b0ab-4227-8c2b-a838554f37f3\" />\r\n\r\nVideo Encoder-only Mask Transformer (VidEoMT) is a lightweight encoder-only model for online video segmentation built on a plain Vision Transformer (ViT). It eliminates the need for dedicated tracking modules by introducing a lightweight query propagation mechanism that carries information across frames and employs a query fusion strategy that combines propagated queries with temporally-agnostic learned queries. VidEoMT achieves competitive accuracy while being 5x-10x faster than existing approaches, running at up to 160 FPS with a ViT-L backbone.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/videomt) | [Paper](https://huggingface.co/papers/2602.17807)\r\n* Add VidEoMT (#44285) by @NielsRogge in [#44285](https://github.com/huggingface/transformers/pull/44285)\r\n\r\n### UVDoc\r\n\r\n<img width=\"1765\" height=\"875\" alt=\"image\" src=\"https://github.com/user-attachments/assets/365e510e-8fb8-46cb-8f4b-e8b7082f0ae2\" />\r\n\r\nUVDoc is a machine learning model designed for document image rectification and correction. The main purpose of this model is to carry out geometric transformation on images to correct document distortion, inclination, perspective deformation and other problems in document images. It provides both single input and batched inference capabilities for processing distorted document images.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/uvdoc)\r\n* [Model] Add UVDoc Model Support (#43385) by @XingweiDeng in [#43385](https://github.com/huggingface/transformers/pull/43385)\r\n\r\n### Jina Embeddings v3\r\n\r\n<img width=\"595\" height=\"513\" alt=\"image\" src=\"https://github.com/user-attachments/assets/2aee0692-8286-4c6b-98db-847b95ab2d40\" />\r\n\r\nThe Jina-Embeddings-v3 is a multilingual, multi-task text embedding model designed for a variety of NLP applications. Based on the XLM-RoBERTa architecture, this model supports Rotary Position Embeddings (RoPE) replacing absolute position embeddings to support long input sequences up to 8192 tokens. Additionally, it features 5 built-in Task-Specific LoRA Adapters that allow the model to generate task-specific embeddings (e.g., for retrieval vs. classification) without increasing inference latency significantly.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/jina_embeddings_v3) | [Paper](https://huggingface.co/papers/2409.10173)\r\n* Add `Jina-Embeddings-V3` Model (#44251) by @Sai-Suraj-27 in [#44251](https://github.com/huggingface/transformers/pull/44251)\r\n\r\n### Mistral4\r\n\r\n<img width=\"2429\" height=\"1787\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a6feb0da-8504-4eab-be65-22d6c676336f\" />\r\n\r\nMistral 4 is a powerful hybrid model with the capability of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families - Instruct, Reasoning (previously called Magistral), and Devstral - into a single, unified model. The model features a MoE architecture with 128 experts and 4 active, 119B parameters with 6.5B activated per token, 256k context length, and supports multimodal input with both text and image processing capabilities.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mistral4)\r\n* Add Mistral 4 (#44760) by @juliendenize in [#44760](https://github.com/huggingface/transformers/pull/44760)\r\n\r\n### PI0\r\n\r\nPI0 is a vision-language-action model for robotics manipulation that jointly processes visual observations and language instructions to generate robot actions. It uses a novel flow matching architecture built on top of a pre-trained vision-language model to inherit Internet-scale semantic knowledge. The model can perform complex dexterous tasks like laundry folding, table cleaning, and assembling boxes across multiple robot platforms including single-arm robots, dual-arm robots, and mobile manipulators.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pi0) | [Paper](https://huggingface.co/papers/2410.24164)\r\n* Add model lerobot PI0 to transformers (#44160) by @molbap in [#44160](https://github.com/huggingface/transformers/pull/44160)\r\n\r\n### SLANeXt\r\n\r\nSLANeXt is a series of dedicated lightweight models for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The SLANeXt series is a new generation of table structure recognition models independently developed by the Baidu PaddlePaddle Vision Team, with dedicated weights trained separately for wired and wireless tables. The recognition ability for all types of tables has been significantly improved, especially for wired tables.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/slanext)\r\n* [Model] Add SLANeXt Model Support (#43707) by @liu-jiaxuan in [#43707](https://github.com/huggingface/transformers/pull/43707)\r\n\r\n### PP-OCRv5_mobile_rec\r\n\r\nPP-OCRv5_mobile_rec is a dedicated lightweight model for text recognition, focusing specifically on efficient recognition and understanding of text elements in multi-language documents and natural scenes. It is designed to efficiently and accurately support the recognition of Simplified Chinese, Traditional Chinese, English, Japanese, as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters with a single model. While maintaining recognition performance, it also balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_ocrv5_mobile_rec)\r\n* [Model] Add PP-OCRv5_server_rec and  PP-OCRv5_mobile_rec models Support (#44808) by @zhang-prog in [#44808](https://github.com/huggingface/transformers/pull/44808)\r\n\r\n### PP-OCRv5_server_rec\r\n\r\nPP-OCRv5_server_rec is a dedicated lightweight model for text recognition, focusing specifically on efficient recognition and understanding of text elements in multi-language documents and natural scenes. It is designed to efficiently and accurately support the recognition of Simplified Chinese, Traditional Chinese, English, Japanese, as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters with a single model. While maintaining recognition performance, it also balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_ocrv5_server_rec)\r\n* [Model] Add PP-OCRv5_server_rec and  PP-OCRv5_mobile_rec models Support (#44808) by @zhang-prog in [#44808](https://github.com/huggingface/transformers/pull/44808)\r\n\r\n### PP-OCRv5_mobile_det\r\n\r\nPP-OCRv5_mobile_det is a dedicated lightweight model for text detection, focusing specifically on efficient detection and understanding of text elements in multi-language documents and natural scenes. It is part of the latest generation of text detection models developed by the PaddleOCR team that efficiently and accurately supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. The model features robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_ocrv5_mobile_det)\r\n* [Model] Add PP-OCRV5_mobile_det Model Support  (#43247) by @XingweiDeng in [#43247](https://github.com/huggingface/transformers/pull/43247)\r\n\r\n### PPLCNet\r\n\r\nPP-LCNet is a family of efficient, lightweight convolutional neural networks designed for real-world document understanding and OCR tasks. It balances accuracy, speed, and model size, making it ideal for both server-side and edge deployment. The model has three main variants optimized for specific tasks: document image orientation classification, table classification, and text line orientation classification.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_lcnet)\r\n* [Model] Add PP-OCRV5_mobile_det Model Support  (#43247) by @XingweiDeng in [#43247](https://github.com/huggingface/transformers/pull/43247)\r\n\r\n### PPLCNetV3\r\n\r\nPPLCNetV3 is a lightweight CPU-optimized convolutional backbone designed for efficient image classification and downstream vision tasks. It builds on the PP-LCNet architecture with improved training strategies and structural refinements for better accuracy-latency tradeoffs on CPU hardware.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_lcnet_v3) | [Paper](https://huggingface.co/papers/2109.15099)\r\n* [Model] Add PP-OCRV5_mobile_det Model Support  (#43247) by @XingweiDeng in [#43247](https://github.com/huggingface/transformers/pull/43247)\r\n\r\n### PP-OCRv5_server_det\r\n\r\nPP-OCRv5_server_det is a high-performance text detection model optimized for server-side applications, focusing on accurate detection of multi-language text in documents and natural scenes. It supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. The model features robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_ocrv5_server_det)\r\n* [Model] Add PP-OCRV5_server_det Model Support (#43274) by @XingweiDeng in [#43274](https://github.com/huggingface/transformers/pull/43274)\r\n\r\n### CHMv2\r\n\r\nCHMv2 is a global, meter-resolution canopy height mapping model that uses DINOv3 to estimate forest canopy heights from high-resolution optical satellite imagery. Building on the original canopy height maps released in 2024, CHMv2 delivers substantial improvements in accuracy, detail, and global consistency by leveraging Meta's self-supervised vision model. The model is trained against airborne laser scanning data and provides essential information for quantifying forest carbon, monitoring restoration and degradation, and assessing habitat structure.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/chmv2) | [Paper](https://huggingface.co/papers/2603.06382) | [Blog Post](http://ai.meta.com/blog/world-resources-institute-dino-canopy-height-maps-v2)\r\n* Add CHMv2 (#44595) by @yonigozlan in [#44595](https://github.com/huggingface/transformers/pull/44595)\r\n\r\n## Breaking changes\r\n\r\nThe dual `BaseImageProcessor`/`BaseImageProcessorFast` design has been replaced with a unified backend architecture, and the `image_processing_utils_fast` module has been removed — users should migrate to the new unified `image_processing_utils` module.\r\n\r\n* 🚨🚨 Refactor Image Processors to support different backends (#43514) by @yonigozlan\r\n\r\n`PreTrainedConfig` and model config classes have been refactored to use `@dataclass` and no longer accept positional arguments — users must update any config instantiation calls to use keyword arguments only.\r\n\r\n* :rotating_light: Validate config attributes (#41250) by @zucchini-nlp\r\n\r\nFlash Attention 2 (FA2) support now requires version 2.3.3 or newer, and initial Flash Attention 4 (FA4) support has been added — users on older FA2 versions must upgrade to at least 2.3.3.\r\n\r\n* :rotating_light: [`FA4`] Initial support (#42435) by @vasqu\r\n\r\nWeight tying behavior has changed so that weights are now tied even when both keys are already present in a checkpoint — users relying on the previous behavior (e.g., with `.bin` checkpoints containing duplicate keys) should verify their models load as expected.\r\n\r\n* [tie weights] 🚨 If both weights are present with same weights, still tie them (#44497) by @Cyrilvallez\r\n\r\nThe `cache_position` argument has been removed from the forward signatures of most major models — users passing `cache_position` directly to these models should remove it, as it is now handled internally by `generate`.\r\n\r\n* [core] 🚨 Completely remove cache positions (#44181) by @Cyrilvallez\r\n\r\n## Parallelization\r\n\r\nSeveral bug fixes and improvements were made to pipeline parallel (PP) and tensor parallel (TP) support, including fixing `supports_tp/pp_plan` detection, resolving attribute errors in PP for Qwen2VL-based models, correcting FSDP loading with meta devices, and ensuring TP weight sharding properly updates parent module attributes (e.g., `in_features`/`out_features`) to improve compatibility with libraries like PEFT.\r\n\r\n* Fix several based models' pipeline parallel support (#44699) by @hmellor in [#44699]\r\n* [Model] Add PP-Chart2Table Model Support (#43767) by @XingweiDeng in [#43767]\r\n* enable tp for benchmark (#43750) by @sywangyi in [#43750]\r\n* Fix `supports_{tp/pp}_plan` (#44696) by @hmellor in [#44696]\r\n* Allow to disable stdout hiding for TP (#44608) by @michaelbenayoun in [#44608]\r\n* fix FSDP loading with meta devices (#44473) by @winglian in [#44473]\r\n* Fix: Conditionally import `torch.distributed.fsdp` in `trainer_seq2seq.py` (#44507) by @0xDELUXA in [#44507]\r\n* Supplement skip logic for XPU in the CPU-only tp tests (#44536) by @YangKai0616 in [#44536]\r\n* Update parent module attributes when sharding with TP (#44421) by @michaelbenayoun in [#44421]\r\n* trigger tensor parallel utils test in the CI (#44460) by @3outeille in [#44460]\r\n\r\n## Quantization\r\n\r\nQuantization support was improved with up to 30x faster FP8 grouped and batched matmuls, static FP8 expert support for multi-GPU setups, and a torchao minimum version bump to 0.15.0. Additionally, MXFP4 dependency error messages were made more actionable, and AWQ tests were updated to align with the GPTQModel migration.\r\n\r\n* fix: split MXFP4 dependency checks for specific error messages (#44930) by @javierdejesusda in [#44930]\r\n* Add static FP8 expert support  (#44895) by @SunMarc in [#44895]\r\n* Bump torchao >=0.15 and fix quantization CI (#44604) by @SunMarc in [#44604]\r\n* Fix AWQ tests for GPTQModel migration (#44654) by @jiqing-feng in [#44654]\r\n* [Performance] FP8 Grouped and Batched Matmuls (#44231) by @IlyasMoutawwakil in [#44231]\r\n* Fix PR comment CI for quantization job (#44579) by @ydshieh in [#44579]\r\n\r\n\r\n## Tokenization\r\n\r\nSeveral performance improvements were made to tokenizer loading and saving, including eliminating redundant file parsing and unnecessary deep copies of large vocabularies that caused significant overhead. Additionally, bug fixes were applied for incorrect tokenizer class names on the Hub (DeepSeek V2/V3, ModernBERT), a `clean_up_tokenization_spaces` misconfiguration in Llama 3 tokenizer conversion, and a string replacement issue in `AutoTokenizer` class name resolution.\r\n\r\n\r\n* fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927) by @ydshieh in [#44927]\r\n* fix `processing_utils.py`: avoid deepcopying tokenizer in `ProcessorMixin` to improve performance (#44894) by @ydshieh in [#44894]\r\n* fix: set `clean_up_tokenization_spaces=False` in Llama 3 tokenizer conversion (#44914) by @maxsloef-goodfire in [#44914]\r\n* deepseek_v2, deepseek_v3, and modernbert fix for having incorrect tokenizer class on the hub (#44801) by @itazap in [#44801]\r\n* Add XPU Expectations for vibe voice acoustic tokenizer tests (#44428) by @kaixuanliu in [#44428]\r\n* fix(tokenizer): Only strip Fast from class names in AutoTokenizer if used as a suffix (#44443) by @harshaljanjani in [#44443]\r\n\r\n\r\n## Kernels\r\n\r\nKernel support has been expanded with Flash Attention 4 fallback integration, a `paged_attention` kernel for continuous batching, and Neuron device support for custom kernels. Several stability fixes were also made, including bumping the kernels version dependency to prevent crashes and correcting the LFM2 kernel path.\r\n\r\n\r\n* [`FA4`] Add kernels fallback (#44797) by @vasqu in [#44797]\r\n* Bump kernels version dependency to avoid crashes (#44887) by @Cyrilvallez in [#44887]\r\n* Fix lfm2 kernel path (#44634) by @Cyrilvallez in [#44634]\r\n* [CB] Add paged_attention kernel (#44379) by @remi-or in [#44379]\r\n* Neuron kernels integration (#44417) by @michaelbenayoun in [#44417]\r\n\r\n\r\n## Cache\r\n\r\nSeveral cache-related fixes and improvements were made, including aligning LFM2's cache implementation with other Mamba caches, fixing a tensor indexing crash in KV cache continuation for the `transformers serve` streaming endpoint, and resolving a generation bug in Idefics3 when using `use_cache=False`. A caching layer was also added to the model linter to skip unchanged valid files and improve build performance.\r\n\r\n\r\n* Align lfm2 cache to other mamba caches (#44866) by @Cyrilvallez in [#44866]\r\n* feat: added cache to the model linter (#44790) by @tarekziade in [#44790]\r\n* Fix tensor indexing crash in serve generate_response KV cache continuation (#44735) by @mango766 in [#44735]\r\n* Idefics3 without cache fix (#44607) by @gabe-l-hart in [#44607]\r\n\r\n\r\n## Vision\r\n\r\nFixed backward compatibility for full-path imports of Fast Image Processors and resolved a Llama4 vision rotary embedding initialization error where `freqs_ci` was not registered as a buffer, causing failures when loading models with `device_map=\"auto\"`.\r\n\r\n\r\n* Fix backward compatibility for full path imports of Fast Image Processors (#44926) by @yonigozlan in [#44926]\r\n* fix(models, testing): Fix Llama4 vision rotary meta tensor initialization and MyT5 get_tokenizer signature (#44581) by @harshaljanjani in [#44581]\r\n* Fix AMD Docker image build timeout by pinning Flash Attention commit (#44546) by @Abdennacer-Badaoui in [#44546]\r\n\r\n\r\n## Generation\r\n\r\nThe `cache_position` argument has been fully removed from the generation pipeline, as all models have been updated to no longer use it (with a backward-compatibility path retained for remote code models). Additionally, integration tests for LASR with chunked decoding were added, and outdated references to deprecated pipeline tasks were cleaned up.\r\n\r\n\r\n* [generate] Never use `cache_position` anymore in generation (#44816) by @Cyrilvallez in [#44816]\r\n* Add an integration test for LASR using pipe and chunked decoding (#42823) by @kho in [#42823]\r\n* Fix: Remove references to `text2text-generation`, `summarization` and `translation` pipeline tasks (#44510) by @math-hiyoko in [#44510]\r\n\r\n\r\n## Bugfixes and improvements\r\n\r\n* Dynamic weight conversion is recursive (#44300) by @zucchini-nlp in [#44300]\r\n* Don't run `tests_hub` if no tests found (#45014) by @ydshieh in [#45014]\r\n* Fix type hint for `attention_chunk_size` in `Llama4TextConfig` (#45002) by @hmellor in [#45002]\r\n* Fix AutoProcessor.from_pretrained silently dropping hub kwargs (#44710) by @he-yufeng in [#44710]\r\n* Fix `maybe_autocast` crashing on meta device tensors (#44984) by @Butanium in [#44984]\r\n* fix: remove Copied from comments between @torch.jit.script and def for Python 3.13 compat (#44986) by @Krishnachaitanyakc in [#44986]\r\n* More small vllm fixes (#44990) by @ArthurZucker in [#44990]\r\n* fix(models): Fix Perceiver interpolate_pos_encoding interpolating to the source size (#44899) by @harshaljanjani in [#44899]\r\n* Allow `mm_token_type` be non-padded lists  (#44563) by @zucchini-nlp in [#44563]\r\n* Fix CPU 16 bytes alignment issue using equivalent fallback (#44970) by @IlyasMoutawwakil in [#44970]\r\n* refactor: unify QA calls (#44879) by @tarekziade in [#44879]\r\n* Fix tie_word_embedding issues with `Qwen2VL` (#44976) by @hmellor in [#44976]\r\n* Support Modular (!!) + Configs in `check_auto_docstrings` (#44803) by @yonigozlan in [#44803]\r\n* [ `vllm x v5`] nit (#44971) by @ArthurZucker in [#44971]\r\n* LwDetrImageLoss: Fix dtype casting to prevent crash when using amp on cuda device (#44886) by @m-matthias in [#44886]\r\n* [AMD CI] Gemma3/Gemma3n Expectations (#44972) by @Abdennacer-Badaoui in [#44972]\r\n* Officially launch parse_response (#44674) by @Rocketknight1 in [#44674]\r\n* fix load_best_model_checkpoint_at_end do not load the best model chec… (#44583) by @wilnn in [#44583]\r\n* Fix failing `T5ModelIntegrationTest` (#44934) by @Sai-Suraj-27 in [#44934]\r\n* Config kwargs (#44953) by @zucchini-nlp in [#44953]\r\n* [CB] [Minor] Simplify test suite (#44858) by @remi-or in [#44858]\r\n* Allow arbitrary template kwargs in processors (#44881) by @zucchini-nlp in [#44881]\r\n* Fix missing post_processor in DebertaV2Tokenizer causing no special t… (#44570) by @umbilnm in [#44570]\r\n* incorrect model list update (#44880) by @itazap in [#44880]\r\n* refactor: mlinter as its own package (#44939) by @tarekziade in [#44939]\r\n* [CB] Add an option to return logprobs (#44835) by @remi-or in [#44835]\r\n* [docs] peft (#44804) by @stevhliu in [#44804]\r\n* Continuous batching thread safety (#44924) by @Qubitium in [#44924]\r\n* Fix variable shadowing in pipeline example and typo in BART docs (BERT → BART) (#44935) by @VanshikaSohal in [#44935]\r\n* Fix failing job `Update Transformers metadata` after #43514 (#44941) by @ydshieh in [#44941]\r\n* Clearer type hints and fix rope validation in configs (#44943) by @zucchini-nlp in [#44943]\r\n* Correct docstrings for `from_pretrained` (url input deprecated) (#44946) by @BSchilperoort in [#44946]\r\n* fix(i18n): replace broken relative links to awesome-transformers.md with absolute URLs (#44905) by @NicoleRobin in [#44905]\r\n* chore(typing): added rule 11 (#44865) by @tarekziade in [#44865]\r\n* fix(camembert): add tie_word_embeddings=True to CamembertConfig (#44931) by @r266-tech in [#44931]\r\n* Support SizeDict import in get_size_dict (#44903) by @yonigozlan in [#44903]\r\n* Add big angry code agent warnings! (#44890) by @Rocketknight1 in [#44890]\r\n* [docs] model cards (#44837) by @stevhliu in [#44837]\r\n* Add backward compatibility for direct imports from legacy `image_processing_utils_fast` (#44897) by @yonigozlan in [#44897]\r\n* Fix core dumped when `NemotronH` is torch compiled (#44854) by @ydshieh in [#44854]\r\n* fix(testing): Fix PaliGemma 2 and PaddleOCR-VL test failures on main (#44765) by @harshaljanjani in [#44765]\r\n* Fix dtype guessing from state dict (#44883) by @Cyrilvallez in [#44883]\r\n* Add missing dunder methods to `SizeDict` (#44884) by @hmellor in [#44884]\r\n* Fix VL model rope_deltas batch size mismatch in online RL training (#44873) by @sergiopaniego in [#44873]\r\n* Fix `layer_types` type hint for `AFMoE` and `Llama4` (#44874) by @hmellor in [#44874]\r\n* Fix nemotron config docstrings (#44878) by @Cyrilvallez in [#44878]\r\n* Fix nemotron_h modular (#44876) by @Cyrilvallez in [#44876]\r\n* [Mistral] Fix query scaling for Mistral4 and Ministral3 (#44860) by @Cyrilvallez in [#44860]\r\n* Update some type hints (#44851) by @zucchini-nlp in [#44851]\r\n* Fix glm dsa (#44564) by @ArthurZucker in [#44564]\r\n* Update AFMoE architecture to use v5-style MoE impl (#44063) by @AutumnAurelium in [#44063]\r\n* Fix KeyError in convert_to_native_format for dict vocab (#44452) by @<NOT FOUND> in [#44452]\r\n* fix: XLNet: relative_positional_encoding computes on CPU every forward (#44782) by @JiwaniZakir in [#44782]\r\n* Fix annotations reader for python 3.14 in `PreTrainedModel` (#44672) by @neo in [#44672]\r\n* [CB] Better parametrization for compile (#44578) by @remi-or in [#44578]\r\n* Fix `KeyError` when patching mistral regex (#43376) by @LeonardoEmili in [#43376]\r\n* Correct code block formatting in weightconverter.md (#44839) by @zhulinchng in [#44839]\r\n* feat(ci): added a network debug report (#44636) by @tarekziade in [#44636]\r\n* Add GreedyLR adaptive learning rate scheduler (#44271) by @balak4 in [#44271]\r\n* Fix unexpected `position_ids` keys when loading OwlViT models (#44508) by @KartikPawade in [#44508]\r\n* Update more modular examples (#44834) by @Cyrilvallez in [#44834]\r\n* Fix and re-run modular converter on examples (#44833) by @Cyrilvallez in [#44833]\r\n* Remove cache_position in more models (4 and last one) (#44828) by @Cyrilvallez in [#44828]\r\n* Fix loading issue in Sam3 (#44831) by @zucchini-nlp in [#44831]\r\n* feat(integration): Add KubeflowCallback to enable automatic progress … (#44487) by @abhijeet-dhumal in [#44487]\r\n* Add GGUF support for MiniMax-M2.1 model (#44526) by @JoursBleu in [#44526]\r\n* Centralize AI agent templates in `.ai` (#44489) by @tarekziade in [#44489]\r\n* support xxxFast alias in v5 tokenizers (#44766) by @itazap in [#44766]\r\n* Remove cache_position in more models (3) (#44759) by @Cyrilvallez in [#44759]\r\n* [CI] Temporarily skip Mistral4 tests as they almost all fail (#44825) by @Cyrilvallez in [#44825]\r\n* [Gemma] Update conversion scripts for Transformers v5 Comaptibility (#44631) by @RyanMullins in [#44631]\r\n* fix bug embedding_size mismatch with hidden_size in electra model test (#44657) by @kaixuanliu in [#44657]\r\n* Fix pegasus conversion (#44571)  by @ArthurZucker in [#44571]\r\n* Fix repo-check bot (#44812) by @ydshieh in [#44812]\r\n* [docs] is_causal feature (#44777) by @stevhliu in [#44777]\r\n* docs(tasks): remove references to removed question-answering pipeline (#44787) by @<NOT FOUND> in [#44787]\r\n* Fix configs with `@strict` (#44770) by @zucchini-nlp in [#44770]\r\n* [AMD CI] Fix test failures across important models  (#44632) by @Abdennacer-Badaoui in [#44632]\r\n* Move VLM conversions to the main mapping (#44627) by @zucchini-nlp in [#44627]\r\n* Fix config loading issues (type issues) (#44789) by @ydshieh in [#44789]\r\n* Remove `is_causal` from `EuroBertConfig` (#44774) by @ydshieh in [#44774]\r\n* model-linter: Added rule 10 (#44761) by @tarekziade in [#44761]\r\n* [fix] mistral 4 docs (#44776) by @stevhliu in [#44776]\r\n* Fix: Eurobert model was missing @strict decorator and invalid test kwargs (#44767) by @tarekziade in [#44767]\r\n* fix: sig lip import (#44764) by @tarekziade in [#44764]\r\n* Disable async loading when quantizing on the fly (#44576) by @SunMarc in [#44576]\r\n* [MistralCommonBackend] Upgrade mistral-common to v1.10.0 (#44656) by @juliendenize in [#44656]\r\n* Fix `mlcd` auto config/model/mapping issues (#44730) by @ydshieh in [#44730]\r\n* Fix bug and add XPU Expectations for qwen2 and jamba tests (#44733) by @kaixuanliu in [#44733]\r\n* [medasr] doc update (#44633) by @eustlb in [#44633]\r\n* Fix missing / incorrect `config` class in some model class definitions (#44715) by @ydshieh in [#44715]\r\n* Update Nvidia CI docker file to use torch 2.10 (#44712) by @ydshieh in [#44712]\r\n* [`FA`] Fix fa detection (#44703) by @vasqu in [#44703]\r\n* Fix `set_encoder` (#44698) by @hmellor in [#44698]\r\n* [docs] cb config (#44675) by @stevhliu in [#44675]\r\n* Fix more model tester missing `parent` issue (#44685) by @ydshieh in [#44685]\r\n* Add register method for `ParallelInterface` (#44640) by @michaelbenayoun in [#44640]\r\n* [CB] [Bug] Fix crashes when running without cuda (#44673) by @remi-or in [#44673]\r\n* Another (small) set of fixes required for tiny model creation (#44666) by @ydshieh in [#44666]\r\n* Fix CookieCutter (#44334) by @NielsRogge in [#44334]\r\n* pipelines do not have modelcard (#44621) by @KoichiYasuoka in [#44621]\r\n* [`Chmv2`] Fix conversion after capture refactor (#44665) by @vasqu in [#44665]\r\n* [CB] Add dedicated config (#44434) by @remi-or in [#44434]\r\n* fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo (#44611) by @harshaljanjani in [#44611]\r\n* Ensure same `dtype` for subconfig when `_from_config` (#44629) by @zucchini-nlp in [#44629]\r\n* Remove `cache_position` in more models (2) (#44602) by @Cyrilvallez in [#44602]\r\n* fix: cast to proper dtype in EmbeddingParallel (#44612) by @michaelbenayoun in [#44612]\r\n* Remove many output_attentions and other traced outputs on 100+ models  (#43590) by @molbap in [#43590]\r\n* fix: raise error if mm_token_type_ids not supplied  (#44433) by @leopold-tzafon in [#44433]\r\n* Fix output capturing for Backbones (#44638) by @Cyrilvallez in [#44638]\r\n* Fix for `VibeVoiceAcousticTokenizer` (#44628) by @ydshieh in [#44628]\r\n* Fix off-by-one in decode_spans boundary check (#44584) by @mvanhorn in [#44584]\r\n* Fix more wrong HF hub checkpoint names (#44624) by @ydshieh in [#44624]\r\n* Update agentic contributions guidelines in AGENTS.md to force yielding. (#44411) by @burtenshaw in [#44411]\r\n* Expand model-structure lint rules with a fast AST-based, ruff-like framework (#44174) by @tarekziade in [#44174]\r\n* feat: add neuron in tensor parallelism initialization (#44498) by @michaelbenayoun in [#44498]\r\n* [WIP] FIX Make Mixtral LoRA loading work (#44478) by @BenjaminBossan in [#44478]\r\n* Fix Llava tests for torch too! (#44476) by @Rocketknight1 in [#44476]\r\n* Fix training ci and clean some tests (#44491) by @SunMarc in [#44491]\r\n* Remove useless identity assignment (#44600) by @Cyrilvallez in [#44600]\r\n* Add Yoni to run-slow workflow (#44598) by @vasqu in [#44598]\r\n* Add shared VLM tests (#42964) by @Rocketknight1 in [#42964]\r\n* Fix wrong (non-existing) checkpoints (#44549) by @ydshieh in [#44549]\r\n* Remove `cache_position` in more models (#44330) by @Cyrilvallez in [#44330]\r\n* Fix CircleCI summary report not showing due to missing dependency (#44597) by @ydshieh in [#44597]\r\n* Fix typos in add_new_model_like docstrings (#43544) by @Olexandr88 in [#43544]\r\n* Fix UnboundLocalError for tp_plan_alt when tp_plan is empty (#44540) by @YangKai0616 in [#44540]\r\n* FIX Multiple PEFT errors after v5 transition (#44592) by @BenjaminBossan in [#44592]\r\n* Fix missing BPE token conversion step in Chameleon (#44582) by @yonigozlan in [#44582]\r\n* Make paligemma embed tokens standard (#44432) by @zucchini-nlp in [#44432]\r\n* chore(typing): Add type checking to `src/transformers/quantizers` (#44412) by @tarekziade in [#44412]\r\n* Fix: AQLM quantizer to match updated replace_with_aqlm_linear signature (#44577) by @tarekziade in [#44577]\r\n* [device_map] Fix device_map computation by correctly adjusting memory available (#44565) by @Cyrilvallez in [#44565]\r\n* Fix error message label and docstring default in load_sharded_checkpoint (#44523) by @jnMetaCode in [#44523]\r\n* Correct Tapas initialization (#44575) by @Rocketknight1 in [#44575]\r\n* [`fix`] Prevent crash with Apertus without xielu installed (#44567) by @tomaarsen in [#44567]\r\n* Fix failing `MusicgenStereo` integration tests (#44527) by @Sai-Suraj-27 in [#44527]\r\n* Fix zamba2 rotary embedding call when use_mem_rope is False (#44551) by @echarlaix in [#44551]\r\n* [Bugfix] fix video inference of qwen3vl and qwen3.5 series (#44474) by @JJJYmmm in [#44474]\r\n* add XPU Expectations for `higgs_audio_v2` tests (#44482) by @kaixuanliu in [#44482]\r\n* chameleon added to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#44475) by @itazap in [#44475]\r\n* Revert \"test merge queue 1\" (#44552) by @ydshieh in [#44552]\r\n* test merge queue 1 (#44529) by @ydshieh2 in [#44529]\r\n* fix(testing): Fix MoonshineEncoder UnboundLocalError and Florence2VisionBackbone dtype mismatch (#44503) by @harshaljanjani in [#44503]\r\n* Fix: Remove references to transformers run command (#44513) by @math-hiyoko in [#44513]\r\n* [LW-DETR] Fix training (#44441) by @NielsRogge in [#44441]\r\n* Make `_prepare_input_fn` and `_prepare_output_fn` instance methods (#44499) by @michaelbenayoun in [#44499]\r\n* Fix ShieldGemma2 non-reproducible outputs by adding _tied_weights_keys (#44358) by @hardikmeisheri in [#44358]\r\n* Tensor Parallelism and `mps` device (#44506) by @michaelbenayoun in [#44506]\r\n* Fix failing `GPTNeoModelLanguageGenerationTest` (#44515) by @Sai-Suraj-27 in [#44515]\r\n* Fix failing `MarianIntegrationTests` (#44519) by @Sai-Suraj-27 in [#44519]\r\n* fix pin_memory for contiguous batching (#44455) by @jiqing-feng in [#44455]\r\n* Fix continuous batching for multimodal models (#44436) by @jw9603 in [#44436]\r\n* Fix KeyError in _parse_type_hint when Union contains Any (#44525) by @jnMetaCode in [#44525]\r\n* Fix AssistantTracker.is_active() returning False after activation with empty lists (#44524) by @jnMetaCode in [#44524]\r\n* Fix and re-enable extra_state tests (#43510) by @pstjohn in [#43510]\r\n* Fix ansi codes in loading reports when not connected to terminal (#44544) by @Cyrilvallez in [#44544]\r\n* Follow-up typing checking fixes (#44500) by @tarekziade in [#44500]\r\n* Fix backend dependency (#44542) by @Cyrilvallez in [#44542]\r\n* Add a new job in `build_pr_documentation.yml` (will be the new required job) (#44538) by @ydshieh in [#44538]\r\n* Update `build_pr_documentation` workflow for `merge_group` event (#44532) by @ydshieh in [#44532]\r\n* Fixed typo in docs/source/en/kv_cache.md (#44501) by @frogNotToad in [#44501]\r\n* Docs: fix SigLIP2 usage examples (#43641) by @KOKOSde in [#43641]\r\n* Fix type checker (#44502) by @Cyrilvallez in [#44502]\r\n* Add MLU bf16 support to is_torch_bf16_gpu_available (#44381) by @carcel-yu in [#44381]\r\n* fix model parallelism bug for eurobert model (#44490) by @kaixuanliu in [#44490]\r\n* Update `ty` to 0.0.20 (#44494) by @tarekziade in [#44494]\r\n* Add auto-docstring on configs (#44296) by @zucchini-nlp in [#44296]\r\n* Fix failed unit tests for moonshine_streaming model (#43936) by @kaixuanliu in [#43936]\r\n* Update distributed tests (#44338) by @SunMarc in [#44338]\r\n* Add `diffusers` to CI docker file (#44480) by @ydshieh in [#44480]\r\n* Replace placeholder tokens as specified in added_tokens_decoder (#44468) by @itazap in [#44468]\r\n* [vLLM] Fix backward compatibility with hardcoded subprocessors classes in processors (#44447) by @yonigozlan in [#44447]\r\n* [remote code/vllm] Fix incorrect tied weights (#44469) by @Cyrilvallez in [#44469]\r\n* Integrate the Neuron device to TrainingArguments (#44302) by @michaelbenayoun in [#44302]\r\n* Fix failing `DepthProModelIntegrationTest` (#44456) by @Sai-Suraj-27 in [#44456]\r\n* [timesfm2_5] fix loss scaling (#44465) by @kashif in [#44465]\r\n* Fix failing `ProphetNetModelIntegrationTest` (#44439) by @Sai-Suraj-27 in [#44439]\r\n* [Trainer] fix SP loss (#44461) by @kashif in [#44461]\r\n* skip 1 invalid test case for higgs_audio_v2 (#44350) by @kaixuanliu in [#44350]\r\n* Fix position_ids typo in Qwen3_5TextModel forward pass (#44399) by @<NOT FOUND> in [#44399]\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ydshieh\r\n    * Don't run `tests_hub` if no tests found (#45014)\r\n    * Fix failing job `Update Transformers metadata` after #43514 (#44941)\r\n    * fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)\r\n    * fix `processing_utils.py`: avoid deepcopying tokenizer in `ProcessorMixin` to improve performance (#44894)\r\n    * Fix core dumped when `NemotronH` is torch compiled (#44854)\r\n    * Fix repo-check bot (#44812)\r\n    * Fix config loading issues (type issues) (#44789)\r\n    * Remove `is_causal` from `EuroBertConfig` (#44774)\r\n    * Fix `mlcd` auto config/model/mapping issues (#44730)\r\n    * Fix missing / incorrect `config` class in some model class definitions (#44715)\r\n    * Update Nvidia CI docker file to use torch 2.10 (#44712)\r\n    * Fix more model tester missing `parent` issue (#44685)\r\n    * Another (small) set of fixes required for tiny model creation (#44666)\r\n    * Fix for `VibeVoiceAcousticTokenizer` (#44628)\r\n    * Fix more wrong HF hub checkpoint names (#44624)\r\n    * Fix wrong (non-existing) checkpoints (#44549)\r\n    * Fix CircleCI summary report not showing due to missing dependency (#44597)\r\n    * Fix PR comment CI for quantization job (#44579)\r\n    * Revert \"test merge queue 1\" (#44552)\r\n    * Add a new job in `build_pr_documentation.yml` (will be the new required job) (#44538)\r\n    * Update `build_pr_documentation` workflow for `merge_group` event (#44532)\r\n    * Add `diffusers` to CI docker file (#44480)\r\n* @NielsRogge\r\n    * Add VidEoMT (#44285)\r\n    * Fix CookieCutter (#44334)\r\n    * [LW-DETR] Fix training (#44441)\r\n* @tarekziade\r\n    * refactor: unify QA calls (#44879)\r\n    * refactor: mlinter as its own package (#44939)\r\n    * chore(typing): added rule 11 (#44865)\r\n    * feat: added cache to the model linter (#44790)\r\n    * feat(ci): added a network debug report (#44636)\r\n    * Centralize AI agent templates in `.ai` (#44489)\r\n    * model-linter: Added rule 10 (#44761)\r\n    * Fix: Eurobert model was missing @strict decorator and invalid test kwargs (#44767)\r\n    * fix: sig lip import (#44764)\r\n    * Expand model-structure lint rules with a fast AST-based, ruff-like framework (#44174)\r\n    * chore(typing): Add type checking to `src/transformers/quantizers` (#44412)\r\n    * Fix: AQLM quantizer to match updated replace_with_aqlm_linear signature (#44577)\r\n    * Follow-up typing checking fixes (#44500)\r\n    * Update `ty` to 0.0.20 (#44494)\r\n* @Sai-Suraj-27\r\n    * Fix failing `T5ModelIntegrationTest` (#44934)\r\n    * Add `Jina-Embeddings-V3` Model (#44251)\r\n    * Fix failing `MusicgenStereo` integration tests (#44527)\r\n    * Fix failing `GPTNeoModelLanguageGenerationTest` (#44515)\r\n    * Fix failing `MarianIntegrationTests` (#44519)\r\n    * Fix failing `DepthProModelIntegrationTest` (#44456)\r\n    * Fix failing `ProphetNetModelIntegrationTest` (#44439)\r\n* @remi-or\r\n    * [CB] [Minor] Simplify test suite (#44858)\r\n    * [CB] Add an option to return logprobs (#44835)\r\n    * [CB] Better parametrization for compile (#44578)\r\n    * [CB] [Bug] Fix crashes when running without cuda (#44673)\r\n    * [CB] Add dedicated config (#44434)\r\n    * [CB] Add paged_attention kernel (#44379)\r\n* @XingweiDeng\r\n    * [Model] Add UVDoc Model Support (#43385)\r\n    * [Model] Add PP-Chart2Table Model Support (#43767)\r\n    * [Model] Add PP-OCRV5_mobile_det Model Support  (#43247)\r\n    * [Model] Add PP-OCRV5_server_det Model Support (#43274)\r\n* @vasqu\r\n    * [`FA4`] Add kernels fallback (#44797)\r\n    * [`FA`] Fix fa detection (#44703)\r\n    * :rotating_light: [`FA4`] Initial support (#42435)\r\n    * [`Chmv2`] Fix conversion after capture refactor (#44665)\r\n    * Add Yoni to run-slow workflow (#44598)\r\n* @liu-jiaxuan\r\n    * [Model] Add SLANeXt Model Support (#43707)\r\n* @zhang-prog\r\n    * [Model] Add PP-OCRv5_server_rec and  PP-OCRv5_mobile_rec models Support (#44808)\r\n* @balak4\r\n    * Add GreedyLR adaptive learning rate scheduler (#44271)\r\n* @kaixuanliu\r\n    * fix bug embedding_size mismatch with hidden_size in electra model test (#44657)\r\n    * Fix bug and add XPU Expectations for qwen2 and jamba tests (#44733)\r\n    * Add XPU Expectations for vibe voice acoustic tokenizer tests (#44428)\r\n    * add XPU Expectations for `higgs_audio_v2` tests (#44482)\r\n    * fix model parallelism bug for eurobert model (#44490)\r\n    * Fix failed unit tests for moonshine_streaming model (#43936)\r\n    * skip 1 invalid test case for higgs_audio_v2 (#44350)\r\n* @juliendenize\r\n    * Add Mistral 4 (#44760)\r\n    * [MistralCommonBackend] Upgrade mistral-common to v1.10.0 (#44656)\r\n* @molbap\r\n    * Add model lerobot PI0 to transformers (#44160)\r\n    * Remove many output_attentions and other traced outputs on 100+ models  (#43590)\r\n* @JJJYmmm\r\n    * [Bugfix] fix video inference of qwen3vl and qwen3.5 series (#44474)\r\n* @math-hiyoko\r\n    * Fix: Remove references to `text2text-generation`, `summarization` and `translation` pipeline tasks (#44510)\r\n    * Fix: Remove references to transformers run command (#44513)\r\n","publishedAt":"2026-03-27T00:33:02.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.4.0","media":[]},{"id":"rel_cYoD1L0MoVA7oLUxDhCLA","version":"v5.3.0","title":"v5.3.0: EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2","summary":"## New Model additions\r\n\r\n### EuroBERT\r\n\r\n<img width=\"1080\" height=\"1080\" alt=\"image\" src=\"https://github.com/user-attachments/assets/33603f42-5435-42...","content":"## New Model additions\r\n\r\n### EuroBERT\r\n\r\n<img width=\"1080\" height=\"1080\" alt=\"image\" src=\"https://github.com/user-attachments/assets/33603f42-5435-421a-9641-baf72faacb22\" />\r\n\r\nEuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/eurobert) | [Paper](https://huggingface.co/papers/2503.05500) | [Blog Post](https://huggingface.co/blog/EuroBERT/release)\r\n* Add eurobert (#39455) by @ArthurZucker in [#39455](https://github.com/huggingface/transformers/pull/39455)\r\n\r\n### VibeVoice ASR\r\n\r\n<img width=\"673\" height=\"464\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e4093a6b-fc6e-4136-a15d-2fcd7b27a69e\" />\r\n\r\nVibeVoice ASR is an automatic speech recognition model from Microsoft that combines acoustic and semantic audio tokenizers with a causal language model for robust speech-to-text transcription. The model uses VibeVoice's acoustic and semantic tokenizers that process audio at 24kHz, paired with a Qwen2-based language decoder for generating transcriptions. It can process up to 60 minutes of continuous audio input, supports customized hotwords, performs joint ASR/diarization/timestamping, and handles over 50 languages with code-switching support.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/vibevoice_asr) | [Paper](https://huggingface.co/papers/2601.18184)\r\n* Add VibeVoice ASR (#43625) by @ebezzam in [#43625](https://github.com/huggingface/transformers/pull/43625)\r\n\r\n### TimesFM2.5\r\n\r\n<img width=\"799\" height=\"497\" alt=\"image\" src=\"https://github.com/user-attachments/assets/1e486803-1b68-496b-aa67-4c3f2055fbeb\" />\r\n\r\nTimesFM 2.5 is a pretrained time-series foundation model that uses a decoder-only attention architecture with input patching for forecasting. The model is designed to provide accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities without requiring dataset-specific training. It builds on the original TimesFM architecture with enhancements including rotary attention, QK normalization, per-dimension attention scaling, and continuous quantile prediction.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/timesfm2_5) | [Paper](https://huggingface.co/papers/2310.10688)\r\n* Timesfm 2.5 (#41763) by @kashif in [#41763](https://github.com/huggingface/transformers/pull/41763)\r\n\r\n### PP-DocLayoutV2\r\n\r\n<img width=\"1440\" height=\"436\" alt=\"image\" src=\"https://github.com/user-attachments/assets/31d6609b-ef42-4f15-8c34-eeb2c0d679a9\" />\r\n\r\nPP-DocLayoutV2 is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order prediction. The model is composed of two sequentially connected networks: an RT-DETR-based detection model that performs layout element detection and classification, followed by a pointer network that orders these layout elements. It is designed to analyze document layouts by identifying and organizing various layout components in their proper reading sequence.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pp_doclayout_v2)\r\n* [Model] Add PP-DocLayoutV2 Model Support (#43018) by @zhang-prog in [#43018](https://github.com/huggingface/transformers/pull/43018)\r\n\r\n### OlmoHybrid\r\n\r\nOLMo Hybrid is a hybrid architecture model from Ai2 that combines standard transformer attention layers with linear attention layers using the Gated Deltanet. This hybrid approach aims to improve efficiency while maintaining model quality by interleaving full attention layers with linear attention layers. The model uses a custom cache system that handles both KV cache for attention layers and recurrent state for linear attention layers.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/olmo_hybrid)\r\n* Add OLMo Hybrid model (#43358) by @yanhong-lbh in [#43358](https://github.com/huggingface/transformers/pull/43358)\r\n\r\n### ModernVBert\r\n\r\n<img width=\"332\" height=\"343\" alt=\"image\" src=\"https://github.com/user-attachments/assets/23e1e140-9ad2-4144-b5d6-8b8c1e3414c9\" />\r\n\r\nModernVBert is a Vision-Language encoder that combines ModernBert with a SigLIP vision encoder. It is optimized for visual document understanding and retrieval tasks, making it suitable for processing documents that contain both text and visual elements.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/modernvbert) | [Paper](https://huggingface.co/papers/2510.01149)\r\n* Add ModernVBERT models (#42504) by @paultltc in [#42504](https://github.com/huggingface/transformers/pull/42504)\r\n\r\n### ColModernVBert\r\n\r\nColModernVBert is a model for efficient visual document retrieval that leverages ModernVBert to construct multi-vector embeddings directly from document images, following the ColPali approach. The model enables retrieval and scoring of visual documents by processing both text queries and document images to generate embeddings that can be compared for relevance scoring.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/colmodernvbert) | [Paper](https://huggingface.co/papers/2510.01149)\r\n* Add ModernVBERT models (#42504) by @paultltc in [#42504](https://github.com/huggingface/transformers/pull/42504)\r\n\r\n### Higgs Audio V2\r\n\r\n<img width=\"3065\" height=\"1464\" alt=\"image\" src=\"https://github.com/user-attachments/assets/94ad4db1-3c10-43d1-b1f7-2ce01329c8a4\" />\r\n\r\nHiggs Audio V2 is a powerful audio foundation model developed by Boson AI that was pretrained on over 10 million hours of audio data and diverse text data. Despite having no post-training or fine-tuning, the model excels in expressive audio generation thanks to its deep language and acoustic understanding. The model supports various audio generation tasks including single-speaker and multi-speaker smart voice, zero-shot voice cloning, and multi-speaker voice cloning.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/higgs_audio_v2)\r\n* Add Higgs Audio V2 Model (#40294) by @szhengac in [#40294](https://github.com/huggingface/transformers/pull/40294)\r\n\r\n### Higgs Audio V2 Tokenizer\r\n\r\nThe Higgs Audio V2 Tokenizer is an audio tokenization model that operates at a low frame rate of 25 fps while maintaining high audio quality, effectively halving the frame rate of many baseline models. It uses unified 24 kHz training that mixes speech, music, and sound-event clips in one model to capture both semantic and acoustic details, facilitating the training of audio language models. The model enables fast inference by avoiding diffusion steps, with an encoder/decoder architecture that processes batches quickly for real-time or large-scale tasks.\r\n\r\n**Links:** [Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/higgs_audio_v2_tokenizer)\r\n* Add Higgs Audio V2 Model (#40294) by @szhengac in [#40294](https://github.com/huggingface/transformers/pull/40294)\r\n\r\n\r\n## Breaking changes\r\n\r\nTensor parallelism (TP) support for dense and MoE decoder-only models has been fixed and stabilized, requiring users to update their TP configurations and conversion mappings accordingly.\r\n* 🚨 fix + tests dense & MoE TP all reduce (decoder only) (#43722) by @3outeille\r\n\r\nThe `Ernie4.5 VL MoE` model class and configuration names have been renamed to align with vLLM/SGLang conventions, requiring users to update any references to the old model names in their code.\r\n* :rotating_light: [`Ernie 4.5 VL Moe`] Fix up namings to vllm/sglang convention (#44299) by @vasqu\r\n\r\nSeveral pipeline tasks have been removed or updated in the V5 cleanup (including `question-answering`, `visual-question-answering`, and `image-to-image`), requiring users to migrate to the replacement pipelines or updated task names.\r\n* 🚨 More V5 pipeline cleanup (#43325) by @Rocketknight1\r\n\r\n3D position IDs for vision-language models have been unified under a common interface (sourced from `qwen2-vl`), requiring users of affected VLMs (e.g., Ernie, GLM4V) to update their processors and any code that manually constructs position IDs.\r\n* :rotating_light: Unify 3D position ids (#43972) by @zucchini-nlp\r\n\r\n## 🚨 Tokenizer x vLLM fixes 🚨 : \r\nUnigram tokenizers were missing the `spm` precompiled charsmap support. We ran an overall v4 vs v5 regression test and fixed what we had missed.\r\n\r\nThis was done in:\r\n* [vllm + v5 fix] handle TokenizersBackend fallback properly for v5 (#44255) by @itazap \r\n\r\n## Generation\r\n\r\nGeneration input preparation was significantly refactored to stop relying on `cache_position` and instead pass pre-sliced `input_ids`/`inputs_embeds` directly to `prepare_inputs_for_generation`, simplifying the generation loop and laying groundwork for broader `cache_position` removal. Several bug fixes were also applied, including correct sampling for HiggsAudioV2, flaky cache-equality test stabilization for Idefics, and restored generation integration tests.\r\n\r\n\r\n* [higgs-audio-v2] fix sampling (#44386) by @eustlb in [#44386]\r\n* fix(flaky): idefics generate cache flake (#44180) by @tarekziade in [#44180]\r\n* Fix generation integration tests (#44225) by @zucchini-nlp in [#44225]\r\n* [generate] Always pass full input_ids in `prepare_inputs_for_generation` (#44226) by @Cyrilvallez in [#44226]\r\n* fix: HiggsAudioV2 cached decode inputs in compiled generation (#44201) by @tarekziade in [#44201]\r\n* [generate] Completely stop relying on `cache_position` to prepare inputs (#44130) by @Cyrilvallez in [#44130]\r\n* Simplify input preparation in generate (#44126) by @Cyrilvallez in [#44126]\r\n\r\n\r\n## Tokenization\r\n\r\nSeveral tokenization bugs were fixed in this release, including resolving an `AttributeError` in `MLukeTokenizer` caused by the v5 rename of `additional_special_tokens`, correcting the Fuyu tokenizer class mapping, fixing `LayoutXLM` tokenization test failures from the slow tokenizer removal refactor, and adding `olmo_hybrid` to the auto-tokenizer mapping. The tokenizer documentation was also updated to reflect the new unified v5 backend architecture and reorganized for clarity.\r\n\r\n\r\n* [tiny] Add olmo_hybrid to tokenizer auto-mapping (#44416) by @tyler-romero in [#44416]\r\n* fix(tokenizer): Fix MLukeTokenizer AttributeError post-v5 refactor (#44362) by @harshaljanjani in [#44362]\r\n* update fuyu tokenizer class (#44235) by @itazap in [#44235]\r\n* fix(testing): Fix LayoutXLM tokenization test and LightOnOCR SDPA flash test failures on main CI (#43988) by @harshaljanjani in [#43988]\r\n* [docs] tokenizer summary (#43965) by @stevhliu in [#43965]\r\n* [docs] refactor tokenizer docs (#43900) by @stevhliu in [#43900]\r\n\r\n\r\n## Kernels\r\n\r\nFixed several kernel-related issues including a security vulnerability, corrected Mamba kernel loading to handle incompatible import structures, ensured Liger Kernel is properly enabled during hyperparameter search, and expanded Flash Attention to support multiple compatible implementations.\r\n\r\n\r\n* Fix kernels security issue (#44395) by @Cyrilvallez in [#44395]\r\n* Enable Liger Kernel when doing hyperparameter search. (#44329) by @linfeng-du in [#44329]\r\n* [`Mamba`] Fix kernel loading (#44176) by @vasqu in [#44176]\r\n* [`Flash Attn`] Enable compatible implementations (#44177) by @vasqu in [#44177]\r\n* Fix percentage formatting in help messages for gradient checkpointing, Liger Kernel, and empty cache steps (#44100) by @qgallouedec in [#44100]\r\n\r\n\r\n## Quantization\r\n\r\nThis release adds several new quantization backends and fixes, including MLX quantization support for MPS devices, Four Over Six (4/6) NVFP4 quantization integration for NVIDIA Blackwell GPUs, and CPU support for MXFP4 models, alongside a bug fix for MXFP4 model saving using `reverse_op`.\r\n\r\n\r\n* [Quantization] Fixing mxfp4 saving using reverse_op (#43148) by @MekkCyber in [#43148]\r\n* [Quantization] Add metal quantization for MPS devices! (#43934) by @MekkCyber in [#43934]\r\n* Enable mxfp4 model on CPU (#43512) by @jiqing-feng in [#43512]\r\n* Add Four Over Six quantization integration (#43970) by @jackcook in [#43970]\r\n\r\n\r\n## Vision\r\n\r\nFixed backward compatibility for image processors loaded from older remote code that lack `valid_kwargs` definitions, and resolved test failures in AMD ROCm CI by adding the missing `timm` dependency to the Docker image.\r\n\r\n\r\n* [AMD CI] Add missing timm dependency to ROCm Docker image (#44389) by @Abdennacer-Badaoui in [#44389]\r\n* update glm image model expected out for tests (#43907) by @kaixuanliu in [#43907]\r\n* Fix image processors `from_dict` backward compatibility with old remote code (#44245) by @yonigozlan in [#44245]\r\n\r\n\r\n## Bugfixes and improvements\r\n\r\n* Update PR template (#44415) by @SunMarc in [#44415]\r\n* Add Qwen3.5 support for sequence classification (#44406) by @medhakimbedhief in [#44406]\r\n* update the expected output for qwen2_5_vl w/ pytorch 2.10 XPU (#44426) by @kaixuanliu in [#44426]\r\n* add support for nemotron_3 (#44390) by @liding-nv in [#44390]\r\n* [ Dynamic weight loader] fix remote code when format matches (#44396) by @ArthurZucker in [#44396]\r\n* [timesfm2_5] fix timesfm2.5 loss (#44331) by @kashif in [#44331]\r\n* Fix peft conversion mappings (#44413) by @Cyrilvallez in [#44413]\r\n* Reduce tqdm verbosity during model loading (#44414) by @Cyrilvallez in [#44414]\r\n* docs: Add NeMo Automodel community integration docs (#44304) by @adil-a in [#44304]\r\n* [CB] Small fixes (#44227) by @remi-or in [#44227]\r\n* Support non-gated experts (#44319) by @IlyasMoutawwakil in [#44319]\r\n* [Bugfix] fix qwen3.5 no split module (#44382) by @JJJYmmm in [#44382]\r\n* Fix mutable default arguments and resource leaks (#44287) by @jashshah999 in [#44287]\r\n* skip 2 invalid test cases for voxtral_realtime model (#44321) by @kaixuanliu in [#44321]\r\n* Mamba-1/-2 init weights in mixer class (#43778) by @kevinli573 in [#43778]\r\n* add expectations for xpu for olmo_hybrid model (#44353) by @kaixuanliu in [#44353]\r\n* [VITS] Add `speaking_rate` as an optionl forward argument (#43283) by @gau-nernst in [#43283]\r\n* Strict export cleanup (#44293) by @IlyasMoutawwakil in [#44293]\r\n* [docs] kernelconfig fix (#44337) by @stevhliu in [#44337]\r\n* Add `ProcessingKwargs` `ImagesKwargs` etc. to docs (#44269) by @yonigozlan in [#44269]\r\n* Fix typos in comments and docstrings (#44332) by @tysoncung in [#44332]\r\n* Add testing guide for agents for trainer tests (#44328) by @SunMarc in [#44328]\r\n* Update common tests Trainer (#44260) by @SunMarc in [#44260]\r\n* [timesfm2_5] fix timesfm mlp bias (#44325) by @kashif in [#44325]\r\n* fix zero3 init config (#44236) by @SunMarc in [#44236]\r\n* Update expected output for Jais2 model tests (#43910) by @kaixuanliu in [#43910]\r\n* Improve `has_similar_generate_outputs` assertions (#44166) by @tarekziade in [#44166]\r\n* Fix failed test case for exaone_moe model (#43938) by @kaixuanliu in [#43938]\r\n* fix(modeling_attn_mask_utils): remove FutureWarning from logger.warning_once() (#44307) by @imstevenpmwork in [#44307]\r\n* Remove remaining vestiges of the TranslationPipeline (#43869) by @Rocketknight1 in [#43869]\r\n* XPU now supports backward for the FA2 fixed path (#43905) by @YangKai0616 in [#43905]\r\n* Fix: use `TokenizersBackend` for Olmo3 to preserve custom `pre_tokenizer` (#44294) by @mario-sanz in [#44294]\r\n* Fix special token maps BC (#44281) by @ArthurZucker in [#44281]\r\n* [`Modular`] Fix file type regression (#44283) by @vasqu in [#44283]\r\n* [auto_docstring] Improve typing parsing and add tests (#43748) by @yonigozlan in [#43748]\r\n* Restore response_schema saving-loading (#44282) by @Rocketknight1 in [#44282]\r\n* Use associative scan HOP mamba recurrentgemma (#43737) by @riccardofelluga in [#43737]\r\n* chore: fixes in `Trainer` class docs (`compute_loss` & `hyperparameter_search`) (#44268) by @ethanknights in [#44268]\r\n* fix(trainer): pass optim_args to SGD, Adagrad, and RMSprop optimizers (#44203) by @nightcityblade in [#44203]\r\n* fix(utils): Make torch_compilable_check compatible with torch.export strict mode (#44266) by @harshaljanjani in [#44266]\r\n* Fix TypeError in convert_rope_params_to_dict when ignore_keys is a list (#44272) by @hangjun-ezra in [#44272]\r\n* [docs] callbacks and collators (#44239) by @stevhliu in [#44239]\r\n* [docs] trainer part 1 (#44185) by @stevhliu in [#44185]\r\n* Remove refs to grouped_entities (#44182) by @Rocketknight1 in [#44182]\r\n* [mimi] nit (#44237) by @eustlb in [#44237]\r\n* Fix local dataset loading priority in run_image_classification_no_tra… (#44199) by @gowthamr-tech in [#44199]\r\n* chore: added CLAUDE.md alias (#44232) by @tarekziade in [#44232]\r\n* fix: add missing return type annotations to type-checking utilities in generic.py (#44241) by @yushiran in [#44241]\r\n* Fix return value - fixes #44238 (#44240) by @tarekziade in [#44240]\r\n* fix regression report_to \"all\" (#44250) by @SunMarc in [#44250]\r\n* [`fix`] Set input_modalities on various architectures that aren't just text (#44078) by @tomaarsen in [#44078]\r\n* Add processing tests for phi4 multimodal (#44234) by @yonigozlan in [#44234]\r\n* fix: `VersionComparison.from_string` return type mismatch (#43709) by @tarekziade in [#43709]\r\n* refactor _inner_training_loop to smaller methods (#44041) by @winglian in [#44041]\r\n* [docs] fix broken chat_templating links in tasks docs (#44115) by @Deep-unlearning in [#44115]\r\n* Add missing backtick in `AnyToAnyPipeline.__call__` docstring (#44229) by @alvarobartt in [#44229]\r\n* Docs(it): fix typo in sentencepiece install command (#44218) by @matisgagneux21 in [#44218]\r\n* Docs(it): fix typo in docstring wording (#44219) by @matisgagneux21 in [#44219]\r\n* fix bug with position_ids on qwen3-vl models, such that position_ids include text position (#44158) by @leopold-tzafon in [#44158]\r\n* Update 404ing BillSum dataset URL on Summarization Task guide (#44212) by @alexandercarruthers in [#44212]\r\n* fix(models): Fix LayoutLMv2 NER crash and broken batched truncation/padding (#44187) by @harshaljanjani in [#44187]\r\n* [CB] [Major] Asynchronous batching (#43960) by @remi-or in [#43960]\r\n* Fix LASR feature extractor regression from invalid center argument (#44207) by @ainergiz in [#44207]\r\n* Models with incorrect tokenizer_class in tokenization_config.json tha… (#44179) by @itazap in [#44179]\r\n* chore(typing): initial ty integration (#44167) by @tarekziade in [#44167]\r\n* fix(flaky): `test_generate_with_and_without_position_ids` in GLM ORC (#44173) by @tarekziade in [#44173]\r\n* [docs] Add Chinese translations for common NLP task tutorials (#44144) by @TinderZ in [#44144]\r\n* [Mimi] Calibrate to ensure encoder streaming performs correctly (#43971) by @caffeinism in [#43971]\r\n* ESM2 attention_mask and token_dropout fix (#44163) by @lhallee in [#44163]\r\n* bring back our demons: clean_up_tokenization_spaces (#44035) by @ArthurZucker in [#44035]\r\n* Fix `Seq2SeqTrainingArguments` documentation (#35258) by @qgallouedec in [#35258]\r\n* AutoGrad support for grouped_mm fallback (#44152) by @IlyasMoutawwakil in [#44152]\r\n* Patch `__setitem__` on `ModelOutput` even if the parameter was previously `None` (#44080) by @tomaarsen in [#44080]\r\n* [`simple`] Fix up `__repr__` whitespace/brackets (#44048) by @tomaarsen in [#44048]\r\n* [`chore`] Fix incorrect forward type hint for Gemma3n (#44051) by @tomaarsen in [#44051]\r\n* Raise informative error when loading video processors (#44125) by @zucchini-nlp in [#44125]\r\n* fix(flaky): Different approach to make sure loss exists (#43804) by @tarekziade in [#43804]\r\n* [voxtral] fix voxtral proc (#44132) by @eustlb in [#44132]\r\n* [docs] Fix typos in GenerationConfig docstring (#44143) by @nightcityblade in [#44143]\r\n* Fix gemma3n `get_audio_features` (#44040) by @zucchini-nlp in [#44040]\r\n* Fix UMT5EncoderModel embedding weights not being tied after loading (#43880) by @jiqing-feng in [#43880]\r\n* fix(testing): Update stale device override test in GraniteSpeech (#44113) by @harshaljanjani in [#44113]\r\n* [Misc][vlms] Use text_config when initializing the fine-grained FP8Expert (#44032) by @JJJYmmm in [#44032]\r\n* docs: fix typo 'AuoQuant' → 'AutoQuant' and clarify FINEGRAINED_FP8 library column (#44131) by @cluster2600 in [#44131]\r\n* Update post proc (#44090) by @itazap in [#44090]\r\n* Fix: flaky `Kosmos2ModelTest` test (#44061) by @tarekziade in [#44061]\r\n* AutoTokenizer ignores config when model_type is None (#44127) by @itazap in [#44127]\r\n* Migrate GPT2 to standardized output capture decorators (#43983) by @Aki-07 in [#43983]\r\n* `grouped_mm` fallback (#44043) by @IlyasMoutawwakil in [#44043]\r\n* Bump dev version (#44099) by @qgallouedec in [#44099]\r\n* Fix loading logic issue (#44095) by @Cyrilvallez in [#44095]\r\n* [docs] customizing tokenizers (#43929) by @stevhliu in [#43929]\r\n* Merge test_keep_in_fp32_modules and test_keep_in_fp32_modules_strict (#44097) by @Rocketknight1 in [#44097]\r\n* [voxtral-realtime] update runner expected values  (#44096) by @eustlb in [#44096]\r\n* Use torch.isfinite (#44069) by @cyyever in [#44069]\r\n* add default flash impl (#44081) by @ArthurZucker in [#44081]\r\n* Remove unused dependencies (#43904) by @cyyever in [#43904]\r\n* Fix patchtsmixer call to post_init (#44082) by @Cyrilvallez in [#44082]\r\n* Fix false positive right-padding warning for decoder-only models in pipeline (#44021) by @<NOT FOUND> in [#44021]\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ArthurZucker\r\n    * Add eurobert (#39455)\r\n    * [ Dynamic weight loader] fix remote code when format matches (#44396)\r\n    * Fix special token maps BC (#44281)\r\n    * bring back our demons: clean_up_tokenization_spaces (#44035)\r\n    * add default flash impl (#44081)\r\n* @liding-nv\r\n    * add support for nemotron_3 (#44390)\r\n* @kashif\r\n    * [timesfm2_5] fix timesfm2.5 loss (#44331)\r\n    * [timesfm2_5] fix timesfm mlp bias (#44325)\r\n    * Timesfm 2.5 (#41763)\r\n* @remi-or\r\n    * [CB] Small fixes (#44227)\r\n    * [CB] [Major] Asynchronous batching (#43960)\r\n* @ebezzam\r\n    * [VibeVoice ASR] Use updated padding cache for ASR model. (#44392)\r\n    * Add VibeVoice ASR (#43625)\r\n* @MekkCyber\r\n    * [Quantization] Fixing mxfp4 saving using reverse_op (#43148)\r\n    * [Quantization] Add metal quantization for MPS devices! (#43934)\r\n* @tarekziade\r\n    * perf: Optimize SynthID logits processor batch index construction (#44172)\r\n    * Improve `has_similar_generate_outputs` assertions (#44166)\r\n    * fix(flaky): idefics generate cache flake (#44180)\r\n    * chore: added CLAUDE.md alias (#44232)\r\n    * Fix return value - fixes #44238 (#44240)\r\n    * fix: `VersionComparison.from_string` return type mismatch (#43709)\r\n    * fix: HiggsAudioV2 cached decode inputs in compiled generation (#44201)\r\n    * chore(typing): initial ty integration (#44167)\r\n    * fix(flaky): `test_generate_with_and_without_position_ids` in GLM ORC (#44173)\r\n    * fix(flaky): Different approach to make sure loss exists (#43804)\r\n    * Fix: flaky `Kosmos2ModelTest` test (#44061)\r\n* @zhang-prog\r\n    * [Model] Add PP-DocLayoutV2 Model Support (#43018)\r\n* @yanhong-lbh\r\n    * Add OLMo Hybrid model (#43358)\r\n* @vasqu\r\n    * :rotating_light: [`Ernie 4.5 VL Moe`] Fix up namings to vllm/sglang convention (#44299)\r\n    * [`Modular`] Fix file type regression (#44283)\r\n    * [`Mamba`] Fix kernel loading (#44176)\r\n    * [`Flash Attn`] Enable compatible implementations (#44177)\r\n* @jackcook\r\n    * Add Four Over Six quantization integration (#43970)\r\n* @winglian\r\n    * refactor _inner_training_loop to smaller methods (#44041)\r\n* @paultltc\r\n    * Add ModernVBERT models (#42504)\r\n* @TinderZ\r\n    * [docs] Add Chinese translations for common NLP task tutorials (#44144)\r\n* @szhengac\r\n    * Add Higgs Audio V2 Model (#40294)","publishedAt":"2026-03-04T17:42:16.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.3.0","media":[]},{"id":"rel_vplRtEKLnhfbC5Ie6rbLq","version":"v5.2.0","title":"v5.2.0: GLM-5, Qwen3.5, Voxtral Realtime, VibeVoice Acoustic Tokenizer","summary":"## New Model additions\r\n\r\n### VoxtralRealtime\r\n\r\n<img width=\"1920\" height=\"1080\" alt=\"image\" src=\"https://github.com/user-attachments/assets/80e37670-...","content":"## New Model additions\r\n\r\n### VoxtralRealtime\r\n\r\n<img width=\"1920\" height=\"1080\" alt=\"image\" src=\"https://github.com/user-attachments/assets/80e37670-6d70-402b-8c8e-ccfb8c32df2d\" />\r\n\r\nVoxtralRealtime is a streaming speech-to-text model from [Mistral AI](https://mistral.ai), designed for real-time automatic speech recognition (ASR). Unlike the offline [Voxtral](./voxtral) model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive.\r\n\r\nThe model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference.\r\n\r\n* Add Voxtral Realtime (#43769) by @eustlb\r\n\r\n### GLM-5 - GlmMoeDsa\r\n\r\n<img width=\"947\" height=\"638\" alt=\"image\" src=\"https://github.com/user-attachments/assets/4c4fff37-7f40-4e86-b4a0-db718f45c93b\" />\r\n\r\nThe zAI team launches GLM-5, and introduces it as such:\r\n\r\n> GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.\r\n> \r\n> Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed [slime](https://github.com/THUDM/slime), a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.\r\n\r\n* Add GlmMoeDsa (#43858) by @Cyrilvallez\r\n\r\n### Qwen3.5, Qwen3.5 Moe\r\n\r\n<img width=\"1920\" height=\"1080\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b56dcaca-80e7-4b22-80a5-2f767bb65095\" />\r\n\r\nThe Qwen team launches Qwen 3.5, and introduces it as such:\r\n\r\n> We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.5 series, namely Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B demonstrates outstanding results across a full range of benchmark evaluations, including reasoning, coding, agent capabilities, and multimodal understanding, empowering developers and enterprises to achieve significantly greater productivity. Built on an innovative hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts, the model attains remarkable inference efficiency: although it comprises 397 billion total parameters, just 17 billion are activated per forward pass, optimizing both speed and cost without sacrificing capability. We have also expanded our language and dialect support from 119 to 201, providing broader accessibility and enhanced support to users around the world.\r\n\r\n\r\n* Adding Support for Qwen3.5 (#43830) by @bozheng-hit\r\n\r\n### VibeVoice Acoustic Tokenizer\r\n\r\n<img width=\"821\" height=\"349\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b1433597-b43b-4d2d-a2c7-216d7792b8c9\" />\r\n\r\n[VibeVoice](https://huggingface.co/papers/2508.19205) is a novel framework for synthesizing high-fidelity, long-form speech with multiple speakers by employing a next-token diffusion approach within a Large Language Model (LLM) structure. It's designed to capture the authentic conversational \"vibe\" and is particularly suited for generating audio content like podcasts and multi-participant audiobooks.\r\n\r\nOne key feature of VibeVoice is the use of two continuous audio tokenizers, one for extracting acoustic features and another for semantic features.\r\n\r\n* Add VibeVoice Acoustic Tokenizer (#43400) by @ebezzam\r\n\r\n## Breaking changes\r\n\r\n* :rotating_light: [`Attn`] New attn mask interface everywhere (#42848)\r\n* :rotating_light: Modify ModernBERT's default attention implementation to stop using FA (#43764)\r\n\r\n:rotating_light: This one is quite breaking for super super super old modles: :rotating_light: :rotating_light: \r\n* fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791) \r\nIf the config does not have a model-type field, we no longer check the name of the folder like for https://huggingface.co/prajjwal1/bert-tiny/blob/main/config.json\r\n\r\n## Bugfixes and improvements\r\n\r\n* [docs] deploying (#43241) by @stevhliu\r\n* [Trainer] Move NEFTune impl to standalone functions (#43714) by @SunMarc\r\n* Fix `convert_rope_params_to_dict` so it uses `rope_theta` from the config (#43766) by @hmellor\r\n* Bump dev version (#43777) by @qgallouedec\r\n* Improved `AGENTS.md` (#43763) by @tarekziade\r\n* Fix-release-ubild (#43773) by @ArthurZucker\r\n* unpin torch for CircleCI (#43790) by @ydshieh\r\n* [`Modular Dependencies`] Fixup qwen rms norms (#43772) by @vasqu\r\n* fix(testing): Fix BLOOM tokenizer, CLAP audio features, and CLVP text tester usage in tests (#43798) by @harshaljanjani\r\n* Remove unconditional train_batch_size assignment (#43770) by @lordaarush\r\n* [`Repo Consistency`] Fix rms norm (#43803) by @vasqu\r\n* fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791) by @tarekziade\r\n* Refactor trainer data_collator and callbacks tests (#43776) by @SunMarc\r\n* [core] Faster and thread-safe `check_model_inputs` implementation (#43765) by @Cyrilvallez\r\n* [Trainer] use deepspeed SP process group when Accelerate doesn’t build a mesh (#43799) by @kashif\r\n* fix(flaky): enforce manual seed to reduce flakiness (#43794) by @tarekziade\r\n* Add TRL CI bot workflow to trigger tests on PR comments (#43809) by @qgallouedec\r\n* Fix DeepSpeed model preparation logic in Trainer class (#43780) by @qgallouedec\r\n* [docs] reveal more in toctree (#43808) by @stevhliu\r\n* Fix markdown documentation (#43076) by @cyyever\r\n* Fix slack-report workflow file (#43851) by @ydshieh\r\n* add `do_sample=False` to qwen2_5_vl model tests to stablize the output (#43728) by @kaixuanliu\r\n* Fix incorrect timestamp calculation in Qwen3VL Processor (#43659) by @jonathan-fulton\r\n* Remove GPU tracking from TrackioCallback and remove env var support (#43371) by @qgallouedec\r\n* Add id and resume support to SwanLab integration (#43719) by @i-pj\r\n* fix gptoss crash in tp (#43853) by @sywangyi\r\n* Delete batch_split from EncoderDecoderCache (#43814) by @cyyever\r\n* delete unnecessary code to make moe compatible to full graph compile (#43855) by @kaixuanliu\r\n* Update ModelType for Unigram tokenizer (#43860) by @pavel-esir\r\n* [docs] Remove pipeline() examples from summarization/translation tasks (#43831) by @Mr-Neutr0n\r\n* Fix video interpolation in pe_audio_video (#43811) by @Rocketknight1\r\n* Look for the pad_token_id in the right place for Llama4 (#43539) by @Rocketknight1\r\n* Fix cardinality error for DETR models without explicit background class (#43513) by @heathdutton\r\n* docs: Add Switch Transformers docstring notes and update spectrogram comment (#43336) by @harshaljanjani\r\n* [xLSTM] Fix bugs preventing small model training (#43209) by @Anri-Lombard\r\n* docs: correct typo 'neccessary' to 'necessary' (#43868) by @thecaptain789\r\n* Improve PR comment CI feedback  (#43852) by @ydshieh\r\n* Fix init weights in remote code (#43768) by @zucchini-nlp\r\n* Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion (#43876) by @OiPunk\r\n* [MistralCommonBackend] fix loading proc (#43887) by @eustlb\r\n* [`Jamba`] Fallback to slow path and warn instead of error out (#43889) by @vasqu\r\n* Fix SwanLab callback to forward resume init args (#43848) by @OiPunk\r\n* Fix old tech stack in doc (#43879) by @cyyever\r\n* Update TrainingArguments (#43806) by @SunMarc\r\n* Remove unnecessary code or checks for PT 2.4+ (#43787) by @cyyever\r\n* Make it possible to evaluate when using sequence parallel in HF Trainer (#43517) by @jp1924\r\n* [Trainer] Move optimizer cls init to trainer_optimizer.py (#43738) by @SunMarc\r\n* fix the error of tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::Fb… (#43547) by @sywangyi\r\n* fix fbgemm fp8 multi-device load failure. (#43581) by @sywangyi\r\n* Refactor trainer init (#43807) by @SunMarc\r\n* [`fix`] Use `last_hidden_state` key from `get_image_features` for llama4 (#43882) by @tomaarsen\r\n* [Docs] Add docs for GLM-OCR and fix EomT-DINOv3 (#43710) by @NielsRogge\r\n* Update hub metadata (#43892) by @zucchini-nlp\r\n* [fix] DAC model: Apply STE in Dac.from_latents to match the forward pass (#43820) by @harshaljanjani\r\n* Separate `check_model_inputs` into `capture_outputs` and `merge_with_config_defaults` + ensure correctness (#43862) by @Cyrilvallez\r\n* Remove mask slicing in all eager attentions (#42186) by @Cyrilvallez\r\n* Fix expected DAC outputs due to (old) change in CI settings. (#43896) by @ebezzam\r\n* Minor changes trainer (#43744) by @SunMarc\r\n* adding BC for custom toks accessing slow tok attrs deprecated in v5 (#43898) by @itazap\r\n* Fix typo in quantization_operations in PEFT integrations (#43821) by @redpanda1995\r\n* Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753) by @cyyever\r\n* Decorate cache updates with no_grad, just in case (#43897) by @Rocketknight1\r\n* revert place_model_on_device to property (#43895) by @SunMarc\r\n* Train sampler unification (#43138) by @jiosephlee\r\n* fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast (#43839) by @Mr-Neutr0n\r\n* Fix missing fast image patch counter in Glm46V (#43877) by @OiPunk\r\n* Fix old tech stack in doc (#43902) by @cyyever\r\n* Move `_keys_to_ignore_on_load_missing` for now (#43893) by @ArthurZucker\r\n* Changes to cache_utils should trigger all tests all the time (#43920) by @Cyrilvallez\r\n* Ernie4 5 vl moe (#43755) by @kaixuanliu\r\n* Harmonize `input_embeds` to `inputs_embeds` everywhere (#43916) by @Cyrilvallez\r\n* fix: TextClassificationPipeline docs mentioning deprecated return_all_scores (#43903) by @math-hiyoko\r\n* Revert #43897 (#43923) by @Rocketknight1\r\n* Fix AttributeError in OwlViT conversion script for Python 3.10+ (#43922) by @DimiChatzipavlis\r\n* add openAI style `image_url` content support in `apply_chat_template` (#43786) by @kaixuanliu\r\n* Prepare and keep track of position ids in `generate` (#43734) by @zucchini-nlp\r\n* Fix lifted_tensor in Gemma3n export which dynamo can't reason about (#43801) by @robell\r\n* Fix bark test (#43942) by @Cyrilvallez\r\n* Fix docker files (#43946) by @ydshieh\r\n* Fix flaky test for multimodal LLMs (#43944) by @Rocketknight1\r\n* Add explicit utf-8 encoding to CircleCI scripts for Windows compatibility (#43925) by @<NOT FOUND>\r\n* Modernize string formatting (f-strings) in conversion scripts (#43943) by @<NOT FOUND>\r\n* Fix weight decay exclusions in `run_*_no‑trainer.py` examples (#42769) by @casinca\r\n* fix: Better weight decay exclusion in `run_*_no‑trainer.py` examples (#43947) by @casinca\r\n* Timm backbone saves and loads `out_features` (#43886) by @zucchini-nlp\r\n* Fix qwen-vl position ids when generating several times (#43952) by @zucchini-nlp\r\n* Fix `get_number_of_image_tokens` (#43948) by @zucchini-nlp\r\n* Fix typos in docstrings, comments, and error messages (#43949) by @<NOT FOUND>\r\n* Fix LASR test layerdrop issue (#43954) by @Rocketknight1\r\n* [kernels] fix kernel versions  (#43955) by @MekkCyber\r\n* [Doc tests] Fix bug (#43729) by @NielsRogge\r\n* fix(models): Preserve custom token IDs through DiaConfig save and load (#43928) by @harshaljanjani\r\n* update somes audio models (#43865) by @Deep-unlearning\r\n* Improve memory allocator during loading (#43945) by @Cyrilvallez\r\n* Inclusion of process_group in the gather_full_tensor function in tensor_parallel.py (#43932) by @quic-meetkuma\r\n* Fix sync gradient (#43919) by @SunMarc\r\n* Reorder Trainer methods (#43914) by @SunMarc\r\n* Fix TypeError in dot_natural_key when state_dict keys have mixed types at same position (#43966) by @shtse8\r\n* Enhance JSON schema generation to support instance, static, and class methods (#43968) by @qgallouedec\r\n* Remove unused squeeze from VJEPA2 embeddings rotation (#43984) by @materight\r\n* Improve new failing test analysis for PR comment CI (#44033) by @ydshieh\r\n* Remove `other_workflow_run_ids` for `issue_comment` in `utils/notification_service.py` (#44036) by @ydshieh\r\n* stable grouped_mm API (#43977) by @IlyasMoutawwakil\r\n* create .git-blame-ignore-revs file  (#43982) by @SunMarc\r\n* docs: fix typos across documentation files (#43993) by @saurav0369\r\n* update python requirement to 3.10+ to match codebase (#44009) by @mariam851\r\n* Improve use of torch.is_autocast_enabled (#43930) by @cyyever\r\n* Use torch.xlogy  (#44006) by @cyyever\r\n* [Deespeed] fix WeightConverter.convert() use (#43926) by @kashif\r\n* Reduce reduce CUDA sync (#44005) by @cyyever\r\n* split out accelerator args builder method (#43987) by @winglian\r\n* SINQ quantization strategy integration (adapted for Transformers V5) (#43112) by @ChiaraBoretti\r\n* fix(models): Unpack BitNet packed weights to fix CI failure (#43721) by @harshaljanjani\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ChiaraBoretti\r\n    * SINQ quantization strategy integration (adapted for Transformers V5) (#43112)\r\n* @cyyever\r\n    * Reduce reduce CUDA sync (#44005)\r\n    * Use torch.xlogy  (#44006)\r\n    * Improve use of torch.is_autocast_enabled (#43930)\r\n    * Fix old tech stack in doc (#43902)\r\n    * Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753)\r\n    * Remove unnecessary code or checks for PT 2.4+ (#43787)\r\n    * Fix old tech stack in doc (#43879)\r\n    * Delete batch_split from EncoderDecoderCache (#43814)\r\n    * Fix markdown documentation (#43076)\r\n* @eustlb\r\n    * Add Voxtral Realtime (#43769)\r\n    * [MistralCommonBackend] fix loading proc (#43887)\r\n* @ebezzam\r\n    * Fix expected DAC outputs due to (old) change in CI settings. (#43896)\r\n    * Add VibeVoice Acoustic Tokenizer (#43400)\r\n* @vasqu\r\n    * [`Jamba`] Fallback to slow path and warn instead of error out (#43889)\r\n    * :rotating_light: [`Attn`] New attn mask interface everywhere (#42848)\r\n    * [`Repo Consistency`] Fix rms norm (#43803)\r\n    * [`Modular Dependencies`] Fixup qwen rms norms (#43772)\r\n* @bozheng-hit\r\n    * Adding Support for Qwen3.5 (#43830)\r\n","publishedAt":"2026-02-16T18:55:53.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.2.0","media":[]},{"id":"rel_xkATNHgqbHPHam2ieUhto","version":"v5.1.0","title":"v5.1.0: EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCR","summary":"## New Model additions\r\n\r\n### EXAONE-MoE\r\n\r\n<img width=\"2278\" height=\"1142\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0c3d5341-0483-...","content":"## New Model additions\r\n\r\n### EXAONE-MoE\r\n\r\n<img width=\"2278\" height=\"1142\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0c3d5341-0483-49c3-8467-f9784ec94b37\" />\r\n\r\nK-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.\r\n\r\n* Add EXAONE-MoE implementations (#43080) by @nuxlear\r\n\r\n### PP-DocLayoutV3\r\n\r\n<img width=\"6252\" height=\"1892\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b2e58244-8ed3-42c6-80d7-e32842977ddb\" />\r\n\r\n**PP-DocLayoutV3** is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.\r\n\r\n* [Model] Add PP-DocLayoutV3 Model Support (#43098) by @zhang-prog\r\n\r\n### Youtu-LLM\r\n\r\n<img width=\"564\" height=\"352\" alt=\"image\" src=\"https://github.com/user-attachments/assets/864372be-4ecb-41fd-8c92-f3515be040d3\" />\r\n\r\nYoutu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks. \r\n\r\n  * Add Youtu-LLM model (#43166) by @LuJunru\r\n\r\n### GlmOcr\r\n\r\n<img width=\"3972\" height=\"2352\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a7ddfb4f-42ea-4dc6-bc73-aefb0f750c4e\" />\r\n\r\nGLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.\r\n\r\n* [GLM-OCR] GLM-OCR Support (#43391)by @zRzRzRzRzRzRzR\r\n\r\n## Breaking changes\r\n\r\n* 🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.__init__. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code\r\n\r\n* 🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.\r\n\r\n* 🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant\r\nAfter this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.\r\n\r\n* 🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.\r\n\r\n* 🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an `int()` with `round()`, expect light numerical differences \r\n\r\n* 🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of `AnnotationFormat`. \r\n\r\n## Bugfixes and improvements\r\n\r\n* fix(models): Migrate legacy segmentation_indices to out_indices in BeitConfig (#43505) by @harshaljanjani\r\n* [docs] Update torch version (#42135) by @stevhliu\r\n* Remove SDPA workarounds for torch 2.4+ (#43754) by @cyyever\r\n* add use_deterministic to guarantee the consistency for youtu-llm model (#43759) by @kaixuanliu\r\n* fix: add compatible_model_types to suppress model type mismatch warnings (#43495) by @leoneperdigao\r\n* Fix T5 v1.1 detection (#43681) by @githubnemo\r\n* Add moonshine streaming (#43702) by @eustlb\r\n* Allow bi-directional attention for all models (#43705) by @Cyrilvallez\r\n* Docs: fix Training step by removing tokenizer from trainer initialization (#43733) by @nesjett\r\n* Fix scheduler initialization order (#43711) by @SunMarc\r\n* Fix accelerate integration import  (#43732) by @SunMarc\r\n* Update torch minimum version to 2.4 (#41307) by @cyyever\r\n* Fix dtype in image-text-to-text pipe (#43731) by @zucchini-nlp\r\n* Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3 (#43574) by @jp1924\r\n* fix: AttributeError for Qwen3_omni_moe (#43593) by @Vallabh-1504\r\n* Improve typing/explanations for general model properties (#43712) by @Cyrilvallez\r\n* [Kernels] kernel migration updates for activation kernels (#43518) by @ariG23498\r\n* [`feat`] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsen\r\n* Added S110 - try-except-pass rule (#43687) by @tarekziade\r\n* [docs] benchmarks (#43694) by @stevhliu\r\n* fix norm_eps dtype (#43669) by @fschlatt\r\n* Llava onevision: output align for tests and add `image_sizes` input param (#43678) by @kaixuanliu\r\n* Fix CLIPOutput attentions not being returned (#43657) by @jonathan-fulton\r\n* [`Attn`] Fixup interface usage after refactor (#43706) by @vasqu\r\n* Fix model/processor mismatch in SigLIP2 quantization example (#43652) by @jonathan-fulton\r\n* Fix crash of custom models in Notebook or Repl (#43690) by @Cyrilvallez\r\n* Simplify TrainingArguments docstring (#43568) by @SunMarc\r\n* Composite model inherit automatically all important properties from their children (#43691) by @Cyrilvallez\r\n* Update configuration_qwen3.py (#43703) by @francesco-bertolotti\r\n* fix gptoss tp crash (#43695) by @sywangyi\r\n* [CB] Keep order of incoming requests (#43626) by @remi-or\r\n* Fix Apertus model loading (NotImplementedError: Cannot copy out of meta tensor; no data!) (#43473) by @xenova\r\n* Remove `num_frames` in ASR pipeline (#43546) by @jiqing-feng\r\n* remove ipex and ccl for xpu and cpu (#42852) by @yao-matrix\r\n* update guide with new attr name for toks (#43689) by @itazap\r\n* Docs: fix typos in Get started (index, quicktour) (#43666) by @CodeByKodi\r\n* the cache class is deprecated by @vasqu (direct commit on main)\r\n* custom tok init fix (#43591) by @itazap\r\n* More export friendly rewrites and skipping the failing ones (#43436) by @IlyasMoutawwakil\r\n* Cast byte_count to int in caching_allocator_warmup for MPS compatibility (#43608) by @tobyliu2004\r\n* [Docs] Complete missing Llama4 configuration docs (#43460) by @udaymehta\r\n* Fix t5 failures (#43374) by @Abdennacer-Badaoui\r\n* Add EoMT with DINOv3 backbone (#41212) by @NielsRogge\r\n* Update DBRX docs to reference re-uploaded checkpoint (#43196) by @qgallouedec\r\n* [loading] Fix forced upcasting to fp32 (#43683) by @Cyrilvallez\r\n* Fix FP8Expert for Qwen (#43670) by @yiliu30\r\n* Simplify loading structure (#43589) by @Cyrilvallez\r\n* [CB] Refactor logic for inputs and outputs outside of the main API (#43569) by @remi-or\r\n* Make sure hub errors are surfaced in `PreTrainedTokenizerBase` (#43675) by @tarekziade\r\n* Fix `FP8Expert` for DeepSeek R1 (#43616) by @yiliu30\r\n* Use correct sampling rate in chat template (#43674) by @zucchini-nlp\r\n* [`HunYuan`] Fix RoPE init (#43411) by @vasqu\r\n* XPU now supports MoE kernel(MegaBlocks) implementation (#43435) by @YangKai0616\r\n* [`Sam`] Fixup training flags (#43567) by @vasqu\r\n* remove torchao.autoquant from transformers (#43561) by @vkuzo\r\n* [DeepSpeed] properly handle MoE weight conversion (#43524) by @kashif\r\n* Tie zamba weights correctly (#43623) by @zucchini-nlp\r\n* [kernels] Centralize kernels tests (#42819) by @MekkCyber\r\n* Fix `process_bad_commit_report.py`: avoid items to appear in `null` author in the report (#43662) by @ydshieh\r\n* Fix `KeyError` in `check_bad_commit.py` (#43655) by @ydshieh\r\n* [Benchmark] Minor fix for benchmark: kernel is not correctly called (#43428) by @sywangyi\r\n* Add explicit commit info to PR comment CI feedback  (#43635) by @ydshieh\r\n* Better new failures reporting for PR comment CI (#43629) by @ydshieh\r\n* [docs] serving (#42853) by @stevhliu\r\n* add XPU expected output for MixedInt8GPT2Test (#43615) by @kaixuanliu\r\n* Don't modify mappings in tests (#43634) by @Rocketknight1\r\n* Allow Attention and Experts to be used as standalone modules (#43622) by @Cyrilvallez\r\n* Don't modify `tied_weight_keys` in-place (#43619) by @zucchini-nlp\r\n* [`Rope`] Revert #43410 and make inheritance implicit again (#43620) by @vasqu\r\n* [vllm compat] Separate renaming from conversion ops (#43621) by @Cyrilvallez\r\n* refactor + robusts tests for Tensor Parallel  (#42809) by @3outeille\r\n* add contiguous operation for diffllama model for xpu to enable compile mode. (#43614) by @kaixuanliu\r\n* add xpu expectation for lw_detr model (#43339) by @kaixuanliu\r\n* minimax_m2: fix failed test case for XPU (#43324) by @kaixuanliu\r\n* Improve new failures reporting (#43628) by @ydshieh\r\n* Fix extras on all supported Python versions (#43490) by @tarekziade\r\n* fix(models): Fix suno/bark-small CPU offload device mismatch causing CI failures (#43607) by @harshaljanjani\r\n* [CB] [Serve] Fix broken serve tests (#43594) by @remi-or\r\n* Docs: fix typo in weight converter guide (#43610) by @KOKOSde\r\n* [MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583) by @YangKai0616\r\n* Fixes configuration default values (#43592) by @zucchini-nlp\r\n* Fix `make_batched_video` with 5D arrays (#43486) by @zucchini-nlp\r\n* Operation Green CI II (#43537) by @Rocketknight1\r\n* enable cpu paged cache (#42869) by @jiqing-feng\r\n* Qwen3 omni - fix get video features (#43588) by @zucchini-nlp\r\n* [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342) by @JaredforReal\r\n* [Model] Refactor modernbert with the attention interface (#43030) by @YangKai0616\r\n* Regex post processing in loading (#43585) by @Cyrilvallez\r\n* simplify extra tokens logic in base (#43230) by @itazap\r\n* Add XPU support to the tests for solar_open (#43579) by @YangKai0616\r\n* remove FbgemmFp8LinearTest (#43545) by @sywangyi\r\n* Increase default ReadTimeout in tests (#43586) by @Wauplin\r\n* Fix mistral checkpoint loading in `utils/fetch_hub_objects_for_ci.py`: avoid too many requests and/or timeout (#43584) by @ydshieh\r\n* [CI][AMD] Fix Pipeline CI  (#43178) by @Abdennacer-Badaoui\r\n* fix(converter): speed up `MistralConverter.extract_vocab_merges_from_model` (#43557) by @tarekziade\r\n* Improve GPU monitoring: switch to multiprocessing and use amdsmi for AMD GPUs (#43552) by @Abdennacer-Badaoui\r\n* Update test of Youtu-LLM to pr-aligned repos (#43578) by @LuJunru\r\n* Rework dependencies and extras + Remove outdated `templates` folder (#43536) by @Cyrilvallez\r\n* Fix repo. consistency bot (push permission issue) (#43570) by @ydshieh\r\n* Fix Wav2vec and a few others (#43566) by @Cyrilvallez\r\n* [`Modular`] Allow to add new bases that are not present in the inherited class (#43556) by @vasqu\r\n* add an option to disable Sam3VideoModel progress bar (#43564) by @ndeybach\r\n* check/fix repo. check bot workflow (#43565) by @ydshieh\r\n* Increase timeout when preparing CI (#43560) by @Rocketknight1\r\n* 43054: Add Siglip2Tokenizer to enforce training-time text preprocessing defaults (#43101) by @vaibhav-research\r\n* check PR bot permission - part 3 (try content attribute) (#43555) by @ydshieh\r\n* check PR bot permission - part 2 (style only) (#43554) by @ydshieh\r\n* check PR bot permission - part 1 (#43553) by @ydshieh\r\n* Fix failing tests due to no attribute `pad_token_id` (#43453) by @Sai-Suraj-27\r\n* fix: GPT OSS Conversion Script Enhancements (#42901) by @KyleMylonakisProtopia\r\n* [Quantization] Fix triton_kernels name after being renamed to gpt-oss-triton-kernels (#43528) by @MekkCyber\r\n* [Quantization] Add cutlass kernel for FP8 (#43304) by @MekkCyber\r\n* [CB] Minor perf improvements and ty compatibility (#43521) by @remi-or\r\n* Fix tiles mixing for batched input, add tie_word_embeddings to LFM2VL config (#43379) by @ankke\r\n* fix: return labels instead of label in reduce_label method in BeitImageProcessorFast (#43527) by @sbucaille\r\n* [`RoPE`] Make explicit inheritance (#43410) by @vasqu\r\n* Fix for #43530 (#43535) by @Rocketknight1\r\n* Operation Green CI (#43530) by @Rocketknight1\r\n* Tie the weights even if initializing from a config on meta device (#43523) by @Cyrilvallez\r\n* [kernels] Update cv_utils name (#43529) by @MekkCyber\r\n* add trackio to training notebooks (#43442) by @merveenoyan\r\n* Mark test_prompt_lookup_decoding as flaky (#42184) by @Rocketknight1\r\n* Fix some MoE routers (#43445) by @IlyasMoutawwakil\r\n* batched_mm is slow on cpu (#43438) by @IlyasMoutawwakil\r\n* fix: initialize BatchNorm2d buffers only when needed (#43520) by @tarekziade\r\n* Fix loading of Qwen3 FP8 (#43494) by @githubnemo\r\n* fix `ShieldGemma2IntegrationTest::test_model` (#43343) by @sywangyi\r\n* Update `SamHQModelIntegrationTest::test_inference_mask_generation_batched_points_batched_images` for `XPU` (#43511) by @sywangyi\r\n* Revert utils files changes from PR #42845 (#43507) by @ydshieh\r\n* Move hardcoded time_step params to config for Bamba, FalconH1, GraniteMoeHybrid (#43461) by @raimbekovm\r\n* Prepare inputs for generation is called from `super()` (#43280) by @zucchini-nlp\r\n* Enhance repo. consistency bot (#43503) by @ydshieh\r\n* Add `pytest-random-order` for reproducible test randomization (#43483) by @tarekziade\r\n* Add missing GPURawMetrics.from_dict() method in benchmark_v2 (#43499) by @Abdennacer-Badaoui\r\n* push dev version 5.0.1.dev0 by @ArthurZucker (direct commit on main)\r\n* Fix failing `markuplm` & `perception_lm` integration tests (#43464) by @Sai-Suraj-27\r\n* fix(Phi4Multimodal): Fix incorrect default vision/audio config initialization in Phi4MultimodalConfig (#43480) by @charlieJ107\r\n* handle 1D position_ids for modeling_flash_attention_utils as well (#43403) by @kaixuanliu\r\n* Remove stale TODO comments in UDOP tied weights (#43477) by @raimbekovm\r\n* Fix Mxfp4 dequantize (#43326) by @Cyrilvallez\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @cyyever\r\n    * Remove SDPA workarounds for torch 2.4+ (#43754)\r\n    * Update torch minimum version to 2.4 (#41307)\r\n    * 🚨 Remove deprecated AnnotionFormat (#42983)\r\n* @eustlb\r\n    * Add moonshine streaming (#43702)\r\n* @tarekziade\r\n    * Added S110 - try-except-pass rule (#43687)\r\n    * Make sure hub errors are surfaced in `PreTrainedTokenizerBase` (#43675)\r\n    * Fix extras on all supported Python versions (#43490)\r\n    * fix(converter): speed up `MistralConverter.extract_vocab_merges_from_model` (#43557)\r\n    * fix: initialize BatchNorm2d buffers only when needed (#43520)\r\n    * Add `pytest-random-order` for reproducible test randomization (#43483)\r\n* @nuxlear\r\n    * Add EXAONE-MoE implementations (#43080)\r\n* @vasqu\r\n    * [`Attn`] Fixup interface usage after refactor (#43706)\r\n    * the cache class is deprecated\r\n    * [`HunYuan`] Fix RoPE init (#43411)\r\n    * [`Sam`] Fixup training flags (#43567)\r\n    * [`Rope`] Revert #43410 and make inheritance implicit again (#43620)\r\n    * [`Modular`] Allow to add new bases that are not present in the inherited class (#43556)\r\n    * [`RoPE`] Make explicit inheritance (#43410)\r\n* @remi-or\r\n    * [CB] Keep order of incoming requests (#43626)\r\n    * [CB] Refactor logic for inputs and outputs outside of the main API (#43569)\r\n    * [CB] [Serve] Fix broken serve tests (#43594)\r\n    * [CB] Minor perf improvements and ty compatibility (#43521)\r\n* @NielsRogge\r\n    * Add EoMT with DINOv3 backbone (#41212)\r\n* @YangKai0616\r\n    * XPU now supports MoE kernel(MegaBlocks) implementation (#43435)\r\n    * [MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583)\r\n    * [Model] Refactor modernbert with the attention interface (#43030)\r\n    * Add XPU support to the tests for solar_open (#43579)\r\n* @ydshieh\r\n    * Fix `process_bad_commit_report.py`: avoid items to appear in `null` author in the report (#43662)\r\n    * Fix `KeyError` in `check_bad_commit.py` (#43655)\r\n    * Add explicit commit info to PR comment CI feedback  (#43635)\r\n    * Better new failures reporting for PR comment CI (#43629)\r\n    * Improve new failures reporting (#43628)\r\n    * Fix mistral checkpoint loading in `utils/fetch_hub_objects_for_ci.py`: avoid too many requests and/or timeout (#43584)\r\n    * Fix repo. consistency bot (push permission issue) (#43570)\r\n    * check/fix repo. check bot workflow (#43565)\r\n    * check PR bot permission - part 3 (try content attribute) (#43555)\r\n    * check PR bot permission - part 2 (style only) (#43554)\r\n    * check PR bot permission - part 1 (#43553)\r\n    * Revert utils files changes from PR #42845 (#43507)\r\n    * Enhance repo. consistency bot (#43503)\r\n* @JaredforReal\r\n    * [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342)\r\n* @zhang-prog\r\n    * [Model] Add PP-DocLayoutV3 Model Support (#43098)\r\n* @LuJunru\r\n    * Update test of Youtu-LLM to pr-aligned repos (#43578)\r\n    * Add Youtu-LLM model (#43166)\r\n* @zRzRzRzRzRzRzR\r\n    * [GLM-OCR] GLM-OCR Support (#43391)","publishedAt":"2026-02-05T15:44:54.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.1.0","media":[]},{"id":"rel_-YuASOjVEqazYkInAbZvD","version":"v5.0.0","title":"Transformers v5","summary":"## Transformers v5 release notes\r\n\r\n<img width=\"1800\" height=\"1013\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7b5187d7-6945-4108-a54...","content":"## Transformers v5 release notes\r\n\r\n<img width=\"1800\" height=\"1013\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7b5187d7-6945-4108-a546-6d1d7bfb55e3\" />\r\n\r\n- Highlights\r\n- Significant API changes: dynamic weight loading, tokenization\r\n- Backwards Incompatible Changes\r\n- Bugfixes and improvements\r\n\r\nWe have a migration guide that will be continuously updated available on the `main` branch, please check it out in case you're facing issues: [migration guide](https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md).\r\n\r\n## Highlights\r\n\r\nWe are excited to announce the initial release of Transformers v5. This is the first major release in five years, and the release is significant: 1200 commits have been pushed to `main` since the latest minor release. This release removes a lot of long-due deprecations, introduces several refactors that significantly simplify our APIs and internals, and comes with a large number of bug fixes.\r\n\r\nWe give an overview of our focus for this release in the [following blogpost](https://huggingface.co/blog/transformers-v5). In these release notes, we'll focus directly on the refactors and new APIs coming with v5.\r\n\r\nThis release is the full V5 release. **It sets in motion something bigger: going forward, starting with v5, we'll now release minor releases every week, rather than every 5 weeks. Expect v5.1 to follow next week, then v5.2 the week that follows, etc.**\r\n\r\nWe're moving forward with this change to ensure you have access to models as soon as they're supported in the library, rather than a few weeks after.\r\n\r\nIn order to install this release, please do so with the following:\r\n\r\n```shell\r\npip install transformers\r\n```\r\n\r\nFor us to deliver the best package possible, it is imperative that we have feedback on how the toolkit is currently working for you. Please try it out, and [open an issue](https://github.com/huggingface/transformers/issues/) in case you're facing something inconsistent/a bug.\r\n\r\nTransformers version 5 is a community endeavor, and we couldn't have shipped such a massive release without the help of the entire community.\r\n\r\n## Significant API changes\r\n\r\n### Dynamic weight loading\r\n\r\nWe introduce a new weight loading API in `transformers`, which significantly improves on the previous API. This\r\nweight loading API is designed to apply operations to the checkpoints loaded by transformers.\r\n\r\nInstead of loading the checkpoint exactly as it is serialized within the model, these operations can reshape, merge,\r\nand split the layers according to how they're defined in this new API. These operations are often a necessity when\r\nworking with quantization or parallelism algorithms.\r\n\r\nThis new API is centered around the new `WeightConverter` class:\r\n\r\n```python\r\nclass WeightConverter(WeightTransform):\r\n    operations: list[ConversionOps]\r\n    source_keys: Union[str, list[str]]\r\n    target_keys: Union[str, list[str]]\r\n```\r\n\r\nThe weight converter is designed to apply a list of operations on the source keys, resulting in target keys. A common\r\noperation done on the attention layers is to fuse the query, key, values layers. Doing so with this API would amount\r\nto defining the following conversion:\r\n\r\n```python\r\nconversion = WeightConverter(\r\n    [\"self_attn.q_proj\", \"self_attn.k_proj\", \"self_attn.v_proj\"],  # The input layers\r\n    \"self_attn.qkv_proj\",  # The single layer as output\r\n    operations=[Concatenate(dim=0)],\r\n)\r\n```\r\n\r\nIn this situation, we apply the `Concatenate` operation, which accepts a list of layers as input and returns a single \r\nlayer. \r\n\r\nThis allows us to define a mapping from architecture to a list of weight conversions. Applying those weight conversions\r\ncan apply arbitrary transformations to the layers themselves. This significantly simplified the `from_pretrained` method\r\nand helped us remove a lot of technical debt that we accumulated over the past few years.\r\n\r\nThis results in several improvements:\r\n- Much cleaner definition of transformations applied to the checkpoint\r\n- Reversible transformations, so loading and saving a checkpoint should result in the same checkpoint\r\n- Faster model loading thanks to scheduling of tensor materialization\r\n- Enables complex mix of transformations that wouldn't otherwise be possible (such as quantization + MoEs, or TP + MoEs)\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/41580\r\n\r\n### Tokenization\r\n\r\nJust as we moved towards a single backend library for model definition, we want our tokenizers, and the `Tokenizer` object to be a lot more intuitive. With v5, tokenizer definition is much simpler; one can now initialize an empty `LlamaTokenizer` and train it directly on your corpus.\r\n\r\nDefining a new tokenizer object should be as simple as this:\r\n\r\n```python\r\nfrom transformers import TokenizersBackend, generate_merges\r\nfrom tokenizers import pre_tokenizers, Tokenizer\r\nfrom tokenizers.model import BPE\r\n\r\nclass Llama5Tokenizer(TokenizersBackend):\r\n    def __init__(self, unk_token=\"<unk>\",bos_token=\"<s>\", eos_token=\"</s>\", vocab=None, merges=None ):\r\n        if vocab is None:\r\n            self._vocab = {\r\n                str(unk_token): 0,\r\n                str(bos_token): 1,\r\n                str(eos_token): 2,\r\n            }\r\n\r\n        else:\r\n            self._vocab = vocab\r\n\r\n            self._merges = merges\r\n\r\n        self._tokenizer = Tokenizer(\r\n            BPE(vocab=self._vocab, merges=self._merges, fuse_unk=True)\r\n        )\r\n        self._tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(\r\n            replacement=\"▁\", prepend_scheme=_get_prepend_scheme(self.add_prefix_space, self), split=False\r\n        )\r\n        super().__init__(\r\n            tokenizer_object=self._tokenizer,\r\n            unk_token=unk_token,\r\n            bos_token=bos_token,\r\n            eos_token=eos_token,\r\n        )\r\n```\r\n\r\nOnce the tokenizer is defined as above, you can load it with the following: `Llama5Tokenizer()`. Doing this returns you an empty, trainable tokenizer that follows the definition of the authors of `Llama5` (it does not exist yet :wink:).\r\n\r\nThe above is the main motivation towards refactoring tokenization: we want tokenizers to behave similarly to models: trained or empty, and with exactly what is defined in their class definition.\r\n\r\n### Backend Architecture Changes: moving away from the slow/fast tokenizer separation\r\n\r\nUp to now, transformers maintained two parallel implementations for many tokenizers:\r\n- \"Slow\" tokenizers (`tokenization_<model>.py`) - Python-based implementations, often using [SentencePiece](https://github.com/google/sentencepiece) as the backend.\r\n- \"Fast\" tokenizers (`tokenization_<model>_fast.py`) - Rust-based implementations using the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library.\r\n\r\nIn v5, we consolidate to a single tokenizer file per model: `tokenization_<model>.py`. This file will use the most appropriate backend available:\r\n\r\n1. **TokenizersBackend** (preferred): Rust-based tokenizers from the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library. In general it provides optimal performance, but it also offers a lot more features that are commonly adopted across the ecosystem:\r\n  - handling additional tokens\r\n  - a full python API for setting and updating \r\n  - automatic parallelization,\r\n  - automatic offsets\r\n  - customization\r\n  - training\r\n2. **SentencePieceBackend**: for tokenizers requiring the `sentencepiece` library. It inherits from `PythonBackend`. \r\n3. **PythonBackend**: a Python implementations of the features provided by `tokenizers`. Basically allows adding tokens.\r\n4. **MistralCommonBackend**: relies on `MistralCommon`'s tokenization library. (Previously known as the `MistralCommonTokenizer`)\r\n\r\nThe `AutoTokenizer` automatically selects the appropriate backend based on available files and dependencies. This is transparent, you continue to use `AutoTokenizer.from_pretrained()` as before. This allows transformers to be future-proof and modular to easily support future backends.\r\n\r\n### Defining a tokenizers outside of the existing backends\r\n\r\nWe enable users and tokenizer builders to define their own tokenizers from top to bottom. Tokenizers are usually defined using a backend such as `tokenizers`, `sentencepiece` or `mistral-common`, but we offer the possibility to design the tokenizer at a higher-level, without relying on those backends.\r\n\r\nTo do so, you can import the `PythonBackend` (which was previously known as `PreTrainedTokenizer`). This class encapsulates all the logic related to added tokens, encoding, and decoding.\r\n\r\nIf you want something even higher up the stack, then `PreTrainedTokenizerBase` is what `PythonBackend` inherits from. It contains the very basic tokenizer API features: \r\n- `encode`\r\n- `decode`\r\n- `vocab_size`\r\n- `get_vocab`\r\n- `convert_tokens_to_ids`\r\n- `convert_ids_to_tokens`\r\n- `from_pretrained`\r\n- `save_pretrained`\r\n- among a few others\r\n\r\n### API Changes\r\n\r\n#### 1. Direct tokenizer initialization with vocab and merges\r\n\r\nStarting with v5, we now enable initializing blank, untrained `tokenizers`-backed tokenizers:\r\n\r\n```py\r\nfrom transformers import LlamaTokenizer\r\n\r\ntokenizer = LlamaTokenizer()\r\n```\r\n\r\nThis tokenizer will therefore follow the definition of the `LlamaTokenizer` as defined in its class definition. It can then be trained on a corpus as can be seen in [the `tokenizers` documentation](https://huggingface.co/docs/tokenizers/training_from_memory).\r\n\r\nThese tokenizers can also be initialized from vocab and merges (if necessary), like the previous \"slow\" tokenizers:\r\n\r\n```py\r\nfrom transformers import LlamaTokenizer\r\n\r\nvocab = {\"<unk>\": 0, \"<s>\": 1, \"</s>\": 2, \"hello\": 3, \"world\": 4}\r\nmerges = [(\"h\", \"e\"), (\"l\", \"l\"), (\"o\", \" \")]\r\n\r\ntokenizer = LlamaTokenizer(vocab=vocab, merges=merges)\r\n```\r\n\r\nThis tokenizer will behave as a Llama-like tokenizer, with an updated vocabulary. This allows comparing different tokenizer classes with the same vocab; therefore enabling the comparison of different pre-tokenizers, normalizers, etc.\r\n\r\n⚠️ The `vocab_file` (as in, a path towards a file containing the vocabulary) cannot be used to initialize the `LlamaTokenizer` as loading from files is reserved to the `from_pretrained` method.\r\n\r\n#### 2. Simplified decoding API\r\n\r\nThe `batch_decode` and `decode` methods have been unified to reflect behavior of the `encode` method. Both single and batch decoding now use the same `decode` method. See an example of the new behavior below:\r\n\r\n```python\r\nfrom transformers import AutoTokenizer\r\ntokenizer = AutoTokenizer.from_pretrained(\"t5-small\") \r\ninputs = [\"hey how are you?\", \"fine\"]\r\ntokenizer.decode(tokenizer.encode(inputs))\r\n```\r\n\r\nGives:\r\n```diff\r\n- 'hey how are you?</s> fine</s>'\r\n+ ['hey how are you?</s>', 'fine</s>']\r\n```\r\n\r\nWe expect `encode` and `decode` to behave, as two sides of the same coin: `encode`, `process`, `decode`,  should work. \r\n\r\n> [!NOTE]\r\n> A common use-case would be: `encode`, `model.generate`, `decode`.  However, using `generate` would return `list[list[int]]`, which would then be incompatible with `decode`.\r\n\r\n#### 3. Unified encoding API\r\n\r\nThe `encode_plus` method is deprecated in favor of the single `__call__` method.\r\n\r\n#### 4. `apply_chat_template` returns `BatchEncoding`\r\n\r\nPreviously, `apply_chat_template` returned `input_ids` for backward compatibility. Starting with v5, it now consistently returns a `BatchEncoding` dict like other tokenizer methods.\r\n\r\n```python\r\n# v5\r\nmessages = [\r\n    {\"role\": \"user\", \"content\": \"Hello!\"},\r\n    {\"role\": \"assistant\", \"content\": \"Hi there!\"}\r\n]\r\n\r\n# Now returns BatchEncoding with input_ids, attention_mask, etc.\r\noutputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\")\r\nprint(outputs.keys())  # dict_keys(['input_ids', 'attention_mask'])\r\n```\r\n\r\n#### 5. Removed legacy configuration file saving:\r\n\r\nWe simplify the serialization of tokenization attributes:\r\n\r\n- `special_tokens_map.json` - special tokens are now stored in `tokenizer_config.json`.\r\n- `added_tokens.json` - added tokens are now stored in `tokenizer.json`.\r\n- `added_tokens_decoder` is only stored when there is no `tokenizer.json`.\r\n\r\nWhen loading older tokenizers, these files are still read for backward compatibility, but new saves use the consolidated format. We're gradually moving towards consolidating attributes to fewer files so that other libraries and implementations may depend on them more reliably.\r\n\r\n#### 6. Model-Specific Changes\r\n\r\nSeveral models that had identical tokenizers now import from their base implementation:\r\n\r\n- **LayoutLM** → uses BertTokenizer\r\n- **LED** → uses BartTokenizer  \r\n- **Longformer** → uses RobertaTokenizer\r\n- **LXMert** → uses BertTokenizer\r\n- **MT5** → uses T5Tokenizer\r\n- **MVP** → uses BartTokenizer\r\n\r\nThese modules will eventually be removed altogether.\r\n\r\n**Removed T5-specific workarounds**\r\n\r\nThe internal `_eventually_correct_t5_max_length` method has been removed. T5 tokenizers now handle max length consistently with other models.\r\n\r\n### Testing Changes\r\n\r\nA few testing changes specific to tokenizers have been applied:\r\n- Model-specific tokenization test files now focus on integration tests.\r\n- Common tokenization API tests (e.g., `add_tokens`, `encode`, `decode`) are now centralized and automatically applied across all tokenizers. This reduces test duplication and ensures consistent behavior\r\n\r\nFor legacy implementations, the original BERT Python tokenizer code (including `WhitespaceTokenizer`, `BasicTokenizer`, etc.) is preserved in `bert_legacy.py` for reference purposes.\r\n\r\n#### 7. Deprecated / Modified Features\r\n\r\n**Special Tokens Structure:**\r\n- `SpecialTokensMixin`: Merged into `PreTrainedTokenizerBase` to simplify the tokenizer architecture.\r\n- `special_tokens_map`: Now only stores named special token attributes (e.g., `bos_token`, `eos_token`). Use `extra_special_tokens` for additional special tokens (formerly `additional_special_tokens`). `all_special_tokens` includes both named and extra tokens.\r\n\r\n```python\r\n# v4\r\ntokenizer.special_tokens_map  # Included 'additional_special_tokens'\r\n\r\n# v5\r\ntokenizer.special_tokens_map  # Only named tokens\r\ntokenizer.extra_special_tokens  # Additional tokens\r\n```\r\n\r\n- `special_tokens_map_extended` and `all_special_tokens_extended`: Removed. Access `AddedToken` objects directly from `_special_tokens_map` or `_extra_special_tokens` if needed.\r\n- `additional_special_tokens`: Still accepted for backward compatibility but is automatically converted to `extra_special_tokens`.\r\n\r\n**Deprecated Methods:**\r\n- `sanitize_special_tokens()`: Already deprecated in v4, removed in v5.\r\n- `prepare_seq2seq_batch()`: Deprecated; use `__call__()` with `text_target` parameter instead.\r\n\r\n```python\r\n# v4\r\nmodel_inputs = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, max_length=128)\r\n\r\n# v5\r\nmodel_inputs = tokenizer(src_texts, text_target=tgt_texts, max_length=128, return_tensors=\"pt\")\r\nmodel_inputs[\"labels\"] = model_inputs.pop(\"input_ids_target\")\r\n```\r\n\r\n- `BatchEncoding.words()`: Deprecated; use `word_ids()` instead.\r\n\r\n**Removed Methods:**\r\n- `create_token_type_ids_from_sequences()`: Removed from base class. Subclasses that need custom token type ID creation should implement this method directly.\r\n- `prepare_for_model()`, `build_inputs_with_special_tokens()`, `truncate_sequences()`: Moved from `tokenization_utils_base.py` to `tokenization_python.py` for `PythonBackend` tokenizers. `TokenizersBackend` provides model-ready input via `tokenize()` and `encode()`, so these methods are no longer needed in the base class.\r\n- `_switch_to_input_mode()`, `_switch_to_target_mode()`, `as_target_tokenizer()`: Removed from base class. Use `__call__()` with `text_target` parameter instead.\r\n\r\n```python\r\n# v4\r\nwith tokenizer.as_target_tokenizer():\r\n    labels = tokenizer(tgt_texts, ...)\r\n\r\n# v5\r\nlabels = tokenizer(text_target=tgt_texts, ...)\r\n```\r\n\r\n- `parse_response()`: Removed from base class.\r\n\r\n### Performance\r\n\r\n#### MoE Performance\r\n\r\nThe v5 release significantly improves the performance of the MoE models, as can be seen in the graphs below. We improve and optimize MoE performance through batched and grouped experts implementations, and we optimize them for decoding using `batched_mm`.\r\n\r\n<img width=\"2048\" height=\"1451\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c3f2e59f-3026-4f56-9a56-36e4eb0fcf73\" />\r\n\r\n#### Core performance\r\n\r\nWe focus on improving the performance of loading weights on device (which gives speedups up to 6x in tensor parallel situations); this is preliminary work that we'll continue to work on in the coming weeks. Some notable improvements:\r\n\r\n- [saving] Simplify general logic by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42766\r\n- Do not rely on config for inferring model dtype by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42838\r\n- Improve BatchFeature: stack list and lists of torch tensors by @yonigozlan in https://github.com/huggingface/transformers/pull/42750\r\n- Remove tied weights from internal attribute if they are not tied by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42871\r\n- Enforce call to post_init and fix all of them by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42873\r\n- Simplify tie weights logic by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42895\r\n- Add buffers to _init_weights for ALL models by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42309\r\n- [loading] Really initialize on meta device for huge perf gains by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42941\r\n- Do not use accelerate hooks if the device_map has only 1 device by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43019\r\n- Move missing weights and non-persistent buffers to correct device earlier by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43021\r\n\r\n## Library-wide changes with lesser impact\r\n\r\n### Default `dtype` update\r\n\r\nWe have updated the default `dtype` for all models loaded with `from_pretrained` to be `auto`. This will lead to model instantiations respecting the `dtype` in which the model was saved, rather than forcing it to load in float 32.\r\n\r\nYou can, of course, still specify the `dtype` in which you want to load your model by specifying it as an argument to the `from_pretrained` method.\r\n\r\n### Shard size\r\n\r\nThe Hugging Face Hub infrastructure has gradually moved to a XET backend. This will significantly simplify uploads and downloads, with higher download and upload speeds, partial uploads, and, most notably, a higher threshold for accepted file sizes on the Hugging Face Hub.\r\n\r\nTo reflect this, we're increasing the default shard size of models serialized on the Hub to 50GB (up from 5GB).\r\n\r\n### `use_auth_token`\r\n\r\nThe `use_auth_token` argument/parameter is deprecated in favor of `token` everywhere.\r\nYou should be able to search and replace `use_auth_token` with `token` and get the same logic.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/41666\r\n\r\n### Attention-related features\r\n\r\nWe decided to remove some features for the upcoming v5 as they are currently only supported in a few old models and no longer integrated in current model additions. It's recommended to stick to v4.x in case you need them. Following features are affected:\r\n- No more head masking, see [#41076](https://github.com/huggingface/transformers/pull/41076). This feature allowed to turn off certain heads during the attention calculation and only worked for eager.\r\n- No more relative positional biases in Bert-like models, see [#41170](https://github.com/huggingface/transformers/pull/41170). This feature was introduced to allow relative position scores within attention calculations (similar to T5). However, this feature is barely used in official models and a lot of complexity instead. It also only worked with eager.\r\n- No more head pruning, see [#41417](https://github.com/huggingface/transformers/pull/41417) by @gante. As the name suggests, it allowed to prune heads within your attention layers.\r\n\r\n### Updates to supported torch APIs\r\n\r\nWe dropped support for two torch APIs:\r\n- `torchscript` in https://github.com/huggingface/transformers/pull/41688\r\n- `torch.fx` in https://github.com/huggingface/transformers/pull/41683\r\n\r\nThose APIs were deprecated by the PyTorch team, and we're instead focusing on the supported APIs `dynamo` and `export`.\r\n\r\n## Quantization changes\r\n\r\nWe clean up the quantization API in transformers, and significantly refactor the weight loading as highlighted\r\nabove.\r\n\r\nWe drop support for two quantization arguments that have been deprecated for some time:\r\n- `load_in_4bit`\r\n- `load_in_8bit`\r\n\r\nWe remove them in favor of the `quantization_config` argument which is much more complete. As an example, here is how\r\nyou would load a 4-bit bitsandbytes model using this argument:\r\n\r\n```python\r\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\r\n\r\nquantization_config = BitsAndBytesConfig(load_in_4bit=True)\r\n\r\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\r\n    \"meta-llama/Llama-3.2-3B\",\r\n    device_map=\"auto\",\r\n    quantization_config=quantization_config\r\n)\r\n```\r\n\r\n## Configuration\r\n\r\n- Methods to init a nested config such as `from_xxx_config` are deleted. Configs can be init from the `__init__` method in the same way. See [#41314](https://github.com/huggingface/transformers/pull/41314).\r\n- It is no longer possible to load a config class from a URL file. Configs must be loaded from either a local path or a repo on the Hub. See [#42383](https://github.com/huggingface/transformers/pull/42383).\r\n- All parameters for configuring model's rotary embedding are now stored under `mode.rope_parameters`, including the `rope_theta` and `rope_type`. Model's `config.rope_parameters` is a simple dictionaty in most cases, and can also be a nested dict in special cases (i.e. Gemma3 and ModernBert) with different rope parameterization for each layer type. Trying to get `config.rope_theta` will throw an attribute error from now on. See [#39847](https://github.com/huggingface/transformers/pull/39847) and [#42255](https://github.com/huggingface/transformers/pull/42255)\r\n- Qwen-VL family configuration is in a nested format and trying to access keys directly will throw an error (e.g. `config.vocab_size`). Users are expected to access keys from their respective sub-configs (`config.text_config.vocab_size`).\r\n- Configurations of non-generative models (any model that doesn't call `model.generate()`) will no longer have a `generation_config` and `model.config.generation_config` will throw an attribute error.\r\n\r\n## Processing\r\n\r\n### Tokenization\r\n\r\n- Slow tokenizer files (aka: `tokenization_<model>.py` ) will be removed in favor of using fast tokenizer files `tokenization_<model>_fast.py` --> will be renamed to `tokenization_<model>.py`.  As fast tokenizers are :hugs:`tokenizers` - backend, they include a wider range of features that are maintainable and reliable. \r\n- Other backends (sentence piece, tokenizers, etc.) will be supported with a light layer if loading a fast tokenizer fails\r\n- Remove legacy files like special_tokens_map.json and added_tokens.json\r\n- Remove _eventually_correct_t5_max_length \r\n- `encode_plus` --> `__call__`\r\n- `batch_decode` --> `decode`\r\n\r\n`apply_chat_template` by default returns naked `input_ids` rather than a `BatchEncoding` dict. \r\nThis was inconvenient - it should return a `BatchEncoding` dict like `tokenizer.__call__()`, but we were stuck with \r\nit for backward compatibility. The method now returns a `BatchEncoding`.\r\n\r\nLinked PRs: \r\n- https://github.com/huggingface/transformers/issues/40938\r\n- https://github.com/huggingface/transformers/pull/40936\r\n- https://github.com/huggingface/transformers/pull/41626\r\n\r\n### Processing classes\r\n\r\n- In processing classes each attribute will be serialized under `processor_config.json` as a nested dict, instead of serializing attributes in their own config files. Loading will be supported for all old format processors (https://github.com/huggingface/transformers/pull/41474)\r\n- `XXXFeatureExtractors` classes are completely removed in favor of `XXXImageProcessor` class for all vision models (https://github.com/huggingface/transformers/pull/41174)\r\n- Minor change: `XXXFastImageProcessorKwargs` is removed in favor of `XXXImageProcessorKwargs` which will be shared between fast and slow processors (https://github.com/huggingface/transformers/pull/40931)\r\n\r\n## Modeling\r\n\r\n- Some `RotaryEmbeddings` layers will start returning a dict of tuples, in case the model uses several RoPE configurations (Gemma2, ModernBert). Each value will be a tuple of \"cos, sin\" per RoPE type.\r\n- Config attribute for `RotaryEmbeddings` layer will be unified and accessed via `config.rope_parameters`. Config attr for `rope_theta` might not be accessible anymore for some models, and instead will be in `config.rope_parameters['rope_theta']`. BC will be supported for a while as much as possible, and in the near future we'll gradually move to the new RoPE format  (https://github.com/huggingface/transformers/pull/39847)\r\n- Vision Language models will not have a shortcut access to its language and vision component from the generative model via `model.language_model`. It is recommended to either access the module with `model.model.language_model` or `model.get_decoder()`. See [#42156](https://github.com/huggingface/transformers/pull/42156/)\r\n- All models now accept `kwargs` in their forward methods\r\n\r\n### Generate\r\n\r\n- Old, deprecated output type aliases were removed (e.g. `GreedySearchEncoderDecoderOutput`). We now only have 4 output classes built from the following matrix: decoder-only vs encoder-decoder, uses beams vs doesn't use beams (https://github.com/huggingface/transformers/pull/40998)\r\n- Removed deprecated classes regarding decoding methods that were moved to the Hub due to low usage (constraints and beam scores) (https://github.com/huggingface/transformers/pull/41223)\r\n- If `generate` doesn't receive any KV Cache argument, the default cache class used is now defined by the model (as opposed to always being `DynamicCache`) (https://github.com/huggingface/transformers/pull/41505)\r\n- Generation parameters are no longer accessible via model's config. If generation paramaters are serialized in `config.json` for any old model, it will be loaded back into model's generation config. Users are expected to access or modify generation parameters only with `model.generation_config.do_sample = True`. \r\n\r\n## Trainer\r\n\r\n### New Features\r\n\r\n* **ALST/Ulysses Sequence Parallelism Integration**\r\n  - Added sequence parallelism support via HF Accelerate for training with longer sequences. Enables splitting sequences across devices using ALST (All-to-All Long Sequence Training) and Ulysses algorithms with DeepSpeed.\r\n* **Improved `compute_loss_func` Handling**\r\n  - `compute_loss_func` now always takes priority over the model's built-in loss computation, giving users consistent control over custom loss functions.\r\n* **`num_items_in_batch` in Prediction Step**\r\n  - The `num_items_in_batch` argument is now passed to `compute_loss` during `prediction_step`, enabling proper loss scaling during evaluation.\r\n\r\n### Breaking Changes\r\n\r\n* **`report_to` now defaults to `\"none\"`**\r\n  - Logging integrations are no longer auto-detected by default; users must explicitly specify which reporting backends to use.\r\n\r\n### Removing arguments without deprecation cycle in `TrainingArguments` due to low usage\r\n\r\n- `mp_parameters` -> legacy param that was later on added to the Sagemaker trainer\r\n- `_n_gpu` -> not intended for users to set, we will initialize it correctly instead of putting it in the `TrainingArguments`\r\n- `overwrite_output_dir` - > replaced by `resume_from_checkpoint`, and it was only used in the examples script, no impact on Trainer. \r\n- `logging_dir` -> only used for tensorboard, set `TENSORBOARD_LOGGING_DIR` env var instead\r\n- `jit_mode_eval` -> use `use_torch_compile` instead, as torchscript is not recommended anymore\r\n- `tpu_num_cores`-> It is actually better to remove it, as it is not recommended to set the number of cores. By default, all TPU cores are used . Set `TPU_NUM_CORES` env var instead\r\n- `past_index` -> it was only used for a very small number of models that have special architecture like transformersxl + it was not documented at all how to train those models\r\n- `ray_scope` -> only for a minor arg for ray integration. Set `RAY_SCOPE` var env instead \r\n- `warmup_ratio` -> use `warmup_step` instead. We combined both args together by allowing passing float values in `warmup_step`. \r\n\r\n### Removing deprecated arguments in `TrainingArguments`\r\n\r\n- `fsdp_min_num_params` and `fsdp_transformer_layer_cls_to_wrap` -> use `fsdp_config`\r\n- `tpu_metrics_debug` -> `debug` \r\n- `push_to_hub_token` -> `hub_token`\r\n- `push_to_hub_model_id` and `push_to_hub_organization` -> `hub_model_id`\r\n- `include_inputs_for_metrics` -> `include_for_metrics`\r\n- `per_gpu_train_batch_size` -> `per_device_train_batch_size`\r\n- `per_gpu_eval_batch_size` -> `per_device_eval_batch_size`\r\n- `use_mps_device` -> mps will be used by default if detected\r\n- `fp16_backend` and `half_precision_backend` -> we will only rely on `torch.amp` as everything has been upstreamed to torch\r\n- `no_cuda` -> `use_cpu`\r\n- ` include_tokens_per_second` -> `include_num_input_tokens_seen`\r\n- `use_legacy_prediction_loop` -> we only use `evaluation_loop` function from now on\r\n\r\n### Removing deprecated arguments in `Trainer`\r\n\r\n- `tokenizer` in initialization -> `processing_class`\r\n- `model_path` in train() -> `resume_from_checkpoint`\r\n\r\n### Removed features for `Trainer`\r\n\r\n- sigpot integration for hp search was removed as the library was archived + the api stopped working\r\n- drop support for sagemaker API <1.10\r\n- bump accelerate minimum version to 1.1.0 \r\n- bump peft minimum version to 0.18.0\r\n- bump bitsandbytes minimum version to 0.46.1\r\n\r\n###  New defaults for `Trainer`\r\n\r\n- `use_cache` in the model config will be set to `False`. You can still change the cache value through `TrainingArguments` `usel_cache` argument if needed. \r\n\r\n## Pipeline\r\n\r\n- Image text to text pipelines will no longer accept images as a separate argument along with conversation chats. Image data has to be embedded in the chat's \"content\" field. See [#42359](https://github.com/huggingface/transformers/pull/42359)\r\n\r\n## PushToHubMixin\r\n\r\n- removed deprecated `organization` and `repo_url` from `PushToHubMixin`. You must pass a `repo_id` instead.\r\n- removed `ignore_metadata_errors` from `PushToMixin`. In practice if we ignore errors while loading the model card, we won't be able to push the card back to the Hub so it's better to fail early and not provide the option to fail later.\r\n- `push_to_hub` do not accept `**kwargs` anymore. All accepted parameters are explicitly documented.\r\n- arguments of `push_to_hub` are now keyword-only to avoid confusion. Only `repo_id` can be positional since it's the main arg.\r\n- removed `use_temp_dir` argument from `push_to_hub`. We now use a tmp dir in all cases.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42391.\r\n\r\n## CLI\r\n\r\nThe deprecated `transformers-cli ...` command was deprecated, `transformers ...` is now the only CLI entry point.\r\n\r\n`transformers` CLI has been migrated to `Typer`, making it easier to maintain + adding some nice features out of \r\nthe box (improved `--help` section, autocompletion).\r\n\r\nBiggest breaking change is in `transformers chat`. This command starts a terminal UI to interact with a chat model. \r\nIt used to also be able to start a Chat Completion server powered by `transformers` and chat with it. In this revamped \r\nversion, this feature has been removed in favor of `transformers serve`. The goal of splitting `transformers chat` \r\nand `transformers serve` is to define clear boundaries between client and server code. It helps with maintenance \r\nbut also makes the commands less bloated. The new signature of `transformers chat` is:\r\n\r\n```\r\nUsage: transformers chat [OPTIONS] BASE_URL MODEL_ID [GENERATE_FLAGS]...\r\n\r\nChat with a model from the command line.\r\n```\r\n\r\nIt works hand in hand with `transformers serve`, which means that if `transformers serve` is running on its default endpoint, `transformers chat` can be launched as follows:\r\n\r\n```sh\r\ntransformers chat HuggingFaceTB/SmolLM3-3B\r\n```\r\n\r\nIt can however use any OpenAI API compatible HTTP endpoint:\r\n\r\n```sh\r\ntransformers chat HuggingFaceTB/SmolLM3-3B https://router.huggingface.co/v1\r\n```\r\n\r\nLinked PRs: \r\n- https://github.com/huggingface/transformers/pull/40997\r\n- https://github.com/huggingface/transformers/pull/41487\r\n\r\n### Removal of the `run` method\r\n\r\nThe `transformers run` (previously `transformers-cli run`) is an artefact of the past, was not documented nor tested,\r\nand isn't part of any public documentation. We're removing it for now and ask you to please let us know in case\r\nthis is a method you are using; in which case we should bring it back with better support.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42447\r\n\r\n## Environment variables\r\n\r\n- Legacy environment variables like `TRANSFORMERS_CACHE`, `PYTORCH_TRANSFORMERS_CACHE`, and `PYTORCH_PRETRAINED_BERT_CACHE` have been removed. Please use `HF_HOME` instead.\r\n- Constants `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_PREFIX`, and `HUGGINGFACE_CO_RESOLVE_ENDPOINT` have been removed. Please use `huggingface_hub.constants.ENDPOINT` instead.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42391.\r\n\r\n## Requirements update\r\n\r\n`transformers` v5 pins the `huggingface_hub` version to `>=1.0.0`. See this [migration guide](https://huggingface.co/docs/huggingface_hub/concepts/migration) to learn more about this major release. Here are to main aspects to know about:\r\n- switched the HTTP backend from `requests` to `httpx`. This change was made to improve performance and to support both synchronous and asynchronous requests the same way. If you are currently catching `requests.HTTPError` errors in your codebase, you'll need to switch to `httpx.HTTPError`.\r\n- related to 1., it is not possible to set proxies from your script. To handle proxies, you must set the `HTTP_PROXY` / `HTTPS_PROXY` environment variables\r\n- `hf_transfer` and therefore `HF_HUB_ENABLE_HF_TRANSFER` have been completed dropped in favor of `hf_xet`. This should be transparent for most users. Please let us know if you notice any downside!\r\n\r\n`typer-slim` has been added as required dependency, used to implement both `hf` and `transformers` CLIs.\r\n\r\n## New model additions in v5\r\n\r\n### CWM\r\n\r\n<img width=\"809\" height=\"471\" alt=\"image\" src=\"https://github.com/user-attachments/assets/58bb9c70-d481-48ed-ab8f-6553be7c240f\" />\r\n\r\nThe Code World Model (CWM) model was proposed in [CWM: An Open-Weights LLM for Research on Code Generation with World Models](https://ai.facebook.com/research/publications/cwm) by Meta FAIR CodeGen Team. CWM is an LLM for code generation and reasoning about code that has, in particular, been trained to better represent and reason about how code and commands affect the state of a program or system. Specifically, we mid-trained CWM on a large number of observation-action trajectories from Python execution traces and agentic interactions in containerized environments. We post-trained with extensive multi-task RL in verifiable coding, math, and multi-turn software engineering environments.\r\n\r\n* Add Code World Model (CWM)  by @jacobkahn in #41199\r\n\r\n### SAM3\r\n\r\n<img width=\"1505\" height=\"915\" alt=\"image\" src=\"https://github.com/user-attachments/assets/eec48633-f02b-464a-ae5c-c65473387e53\" />\r\n\r\nSAM3 (Segment Anything Model 3) was introduced in [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3-segment-anything-with-concepts/).\r\n\r\nThe SAM3 addition adds four new architectures:\r\n- Sam3\r\n- Sam3Tracker\r\n- Sam3TrackerVideo\r\n- Sam3Video\r\n\r\nSAM3 performs Promptable Concept Segmentation (PCS) on images. PCS takes text and/or image exemplars as input (e.g., \"yellow school bus\"), and predicts instance and semantic masks for every single object matching the concept.\r\n\r\nSam3Tracker and Sam3TrackerVideo perform Promptable Visual Segmentation (PVS) on images. PVS takes interactive visual prompts (points, boxes, masks) or text inputs to segment a specific object instance per prompt. This is the task that SAM 1 and SAM 2 focused on, and SAM 3 improves upon it. Sam3Tracker and Sam3TrackerVideo are updated versions of SAM2 Video that maintain the same API while providing improved performance and capabilities.\r\n\r\nSAM3 Video performs Promptable Concept Segmentation (PCS) on videos. PCS takes text as input (e.g., \"yellow school bus\"), and predicts instance and semantic masks for every single object matching the concept, while preserving object identities across video frames. The model combines a detection module (SAM3) with a tracking module (SAM2-style tracker) to enable robust object tracking across video frames using text prompts.\r\n\r\n* Add SAM3 to 🤗 Transformers  by @yonigozlan in #42285\r\n\r\n### LFM2 MoE\r\n\r\n<img width=\"1080\" height=\"849\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a9fa1b81-114d-4054-9699-5083ac69d830\" />\r\n\r\nLFM2-MoE is a Mixture-of-Experts (MoE) variant of [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38). The LFM2 family is optimized for on-device inference by combining short‑range, input‑aware gated convolutions with grouped‑query attention (GQA) in a layout tuned to maximize quality under strict speed and memory constraints.\r\n\r\nLFM2‑MoE keeps this fast backbone and introduces sparse MoE feed‑forward networks to add representational capacity without significantly increasing the active compute path. The first LFM2-MoE release is LFM2-8B-A1B, with 8.3B total parameters and 1.5B active parameters. The model excels in quality (comparable to 3-4B dense models) and speed (faster than other 1.5B class models).\r\n\r\n* [Model] Lfm2Moe  by @paulpak58 in #41401\r\n\r\n### VideoLlama 3\r\n\r\n<img width=\"812\" height=\"366\" alt=\"image\" src=\"https://github.com/user-attachments/assets/21c82c6e-cf0a-4d6c-a707-b9e57663ca85\" />\r\n\r\nThe [VideoLLaMA3](https://huggingface.co/papers/2501.13106) model is a major update to [VideoLLaMA2](https://huggingface.co/papers/2406.07476) from Alibaba DAMO Academy.\r\n\r\n* [model] Add VideoLLaMA3 implementation  by @lkhl in #40499\r\n\r\n### AudioFlamingo 3\r\n\r\n<img width=\"621\" height=\"475\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c9616758-b3aa-41d0-bd58-695966ba146d\" />\r\n\r\nAudio Flamingo 3 (AF3) is a fully open large audio–language model designed for robust understanding and reasoning over speech, environmental sounds, and music. AF3 pairs a Whisper-style audio encoder with a causal language model and performs replace-in-place audio–text fusion: the processor aligns post-pool audio frames to a dedicated placeholder token and the model replaces those token slots with projected audio embeddings during the forward pass.\r\n\r\nThe model checkpoint is available at: [nvidia/audio-flamingo-3-hf](https://huggingface.co/nvidia/audio-flamingo-3-hf)\r\n\r\nHighlights:\r\n\r\n- Unified audio encoder across speech, sound, and music.\r\n- Long-audio support via windowing and post-pool alignment (up to 10 minutes maximum). The model processes audio in 30-second windows with a hard limit of 20 windows (10 minutes total). Audio longer than 10 minutes will be truncated.\r\n- Deterministic fusion that preserves sequence length by replacing audio placeholder tokens with audio embeddings.\r\n\r\n* [models] Add AudioFlamingo3 integration  by @lashahub in #40290\r\n\r\n### Nanochat\r\n\r\n[NanoChat](https://huggingface.co/karpathy/nanochat-d32) is a compact decoder-only transformer model designed for educational purposes and efficient training. The model features several fundamental architectural innovations which are common in modern transformer models. Therefore, it is a good model to use as a starting point to understand the principles of modern transformer models. NanoChat is a variant of the [Llama](https://huggingface.co/docs/transformers/en/model_doc/llama) architecture, with simplified attention mechanism and normalization layers.\r\n\r\n* [MODEL] Nanochat implementation  by @burtenshaw in #41634\r\n\r\n### FastVLM\r\n\r\n<img width=\"868\" height=\"331\" alt=\"image\" src=\"https://github.com/user-attachments/assets/cd8b82cf-10de-49b0-af2a-28ffbeac6fa7\" />\r\n\r\nFastVLM is an open-source vision-language model featuring a novel hybrid vision encoder, FastViTHD. Leveraging reparameterizable convolutional layers, scaled input resolution, and a reduced number of visual tokens, FastVLM delivers high accuracy with exceptional efficiency. Its optimized architecture enables deployment even on edge devices, achieving ultra-low TTFT (time to first token) without sacrificing performance.\r\n\r\n* Add FastVLM by @camilla-deckard in https://github.com/huggingface/transformers/pull/41112\r\n\r\n### PaddleOCR-VL\r\n\r\n<img width=\"3840\" height=\"2160\" alt=\"image\" src=\"https://github.com/user-attachments/assets/712fbe57-a4b6-4bf1-acf6-3ef803f75f0e\" />\r\n\r\nPaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.\r\n\r\n* [Model] Add PaddleOCR-VL Model Support by @zhang-prog in https://github.com/huggingface/transformers/pull/42178\r\n\r\n### SAM: Perception Encoder Audiovisual\r\n\r\n<img width=\"719\" height=\"541\" alt=\"image\" src=\"https://github.com/user-attachments/assets/2fc5ee26-3c15-451f-bc7f-c779c2a78919\" />\r\n\r\nPE Audio (Perception Encoder Audio) is a state-of-the-art multimodal model that embeds audio and text into a shared (joint) embedding space.\r\nThe model enables cross-modal retrieval and understanding between audio and text.\r\n\r\n**Text input**\r\n- Produces a single embedding representing the full text.\r\n\r\n**Audio input**\r\n- **PeAudioFrameLevelModel**\r\n  - Produces a sequence of embeddings, one every 40 ms of audio.\r\n  - Suitable for audio event localization and fine-grained temporal analysis.\r\n- **PeAudioModel**\r\n  - Produces a single embedding for the entire audio clip.\r\n  - Suitable for global audio-text retrieval tasks.\r\n\r\n**The resulting embeddings can be used for:**\r\n- Audio event localization\r\n- Cross-modal (audio–text) retrieval and matching\r\n\r\n* Sam: Perception Encoder Audiovisual by @eustlb in https://github.com/huggingface/transformers/pull/42905\r\n\r\n### Jais2\r\n\r\n<img width=\"2100\" height=\"1154\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a9343e81-903b-4445-ba0c-61c87830776a\" />\r\n\r\nJais2 a next-generation Arabic open-weight LLM trained on the richest Arabic-first dataset to date. Built from the ground up with 8B and 70B parameters, Jais 2 understands Arabic the way it's truly spoken across dialects, cuulutre, and modern expression. It is developed by MBZUAI, Inception and Cerebras Systems and based on the transformer architecture with modifications including:\r\n- LayerNorm instead of RMSNorm\r\n- ReLU² activation function\r\n- Rotary Position Embeddings (RoPE)\r\n\r\n* adds jais2 model support by @sarathc-cerebras in https://github.com/huggingface/transformers/pull/42684\r\n\r\n### Pixio\r\n\r\n<img width=\"5478\" height=\"2102\" alt=\"image\" src=\"https://github.com/user-attachments/assets/abbd93d4-8c4c-4fd9-8fab-58d969d8b296\" />\r\n\r\n[Pixio](https://github.com/LiheYoung/transformers/blob/f69542c52874fa566f0d55687f4d6e189fa77e33/docs/source/en/model_doc) is a vision foundation model that uses [ViT](https://github.com/LiheYoung/transformers/blob/f69542c52874fa566f0d55687f4d6e189fa77e33/docs/source/en/model_doc/vit) as a feature extractor for multiple downstream tasks like depth estimation, semantic segmentation, feed-forward 3D reconstruction, robotics, and image classification. It is built on the Masked Autoencoder (MAE) pre-training framework, with four minimal yet critical updates: 1) deeper decoder, 2) larger masking granularity, 3) more class tokens, and 4) web-scale curated training data.\r\n\r\n* Add Pixio pre-trained models by @LiheYoung in https://github.com/huggingface/transformers/pull/42795\r\n\r\n### Ernie 4.5 VL MoE\r\n\r\n<img width=\"848\" height=\"853\" alt=\"image\" src=\"https://github.com/user-attachments/assets/02817ead-7560-4eea-8f2b-0b959553a3cd\" />\r\n\r\nThe Ernie 4.5 VL MoE model was released in the [Ernie 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) release by baidu. This family of models contains multiple different architectures and model sizes. The Vision-Language series in specific is composed of a novel multimodal heterogeneous structure, sharing paremeters across modalities and dedicating parameters to specific modalities. This becomes especially apparent in the Mixture of Expert (MoE) which is composed of\r\n- Dedicated Text Experts\r\n- Dedicated Vision Experts\r\n- Shared Experts\r\n\r\nThis architecture has the advantage to enhance multimodal understanding without compromising, and even improving, performance on text-related tasks. An more detailed breakdown is given in the [Technical Report](https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf).\r\n\r\n* [`Ernie 4.5`] Ernie VL models by @vasqu in https://github.com/huggingface/transformers/pull/39585\r\n\r\n### GLM-ASR\r\n\r\n<img width=\"1600\" height=\"1029\" alt=\"image\" src=\"https://github.com/user-attachments/assets/d630a900-9ef5-467c-93ab-34f064501b8d\" />\r\n\r\n**GLM-ASR-Nano-2512** is a robust, open-source speech recognition model with **1.5B parameters**. Designed for\r\nreal-world complexity, it outperforms OpenAI Whisper V3 on multiple benchmarks while maintaining a compact size.\r\n\r\nKey capabilities include:\r\n\r\n* **Exceptional Dialect Support**\r\n  Beyond standard Mandarin and English, the model is highly optimized for **Cantonese (粤语)** and other dialects,\r\n  effectively bridging the gap in dialectal speech recognition.\r\n\r\n* **Low-Volume Speech Robustness**\r\n  Specifically trained for **\"Whisper/Quiet Speech\"** scenarios. It captures and accurately transcribes extremely\r\n  low-volume audio that traditional models often miss.\r\n\r\n* **SOTA Performance**\r\n  Achieves the **lowest average error rate (4.10)** among comparable open-source models, showing significant advantages\r\n  in Chinese benchmarks (Wenet Meeting, Aishell-1, etc..).\r\n\r\nThis model was contributed by [Eustache Le Bihan](https://huggingface.co/eustlb) and [Yuxuan Zhang](https://huggingface.co/ZHANGYUXUAN-zR).\r\nyou can check the [model card](https://huggingface.co/zai-org/GLM-ASR-Nano-2512) for more details and our \r\n[github repo](https://github.com/zai-org/GLM-ASR).\r\n\r\n* GLM-ASR  Support by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/42875\r\n\r\n### GLM 4.7 Flash\r\n\r\n<img width=\"5038\" height=\"5860\" alt=\"image\" src=\"https://github.com/user-attachments/assets/af0f2821-3831-4afd-a7dc-26a91f17afd0\" />\r\n\r\nGLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.\r\n\r\n* [GLM-4.7] GLM-Lite Support by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43031\r\n\r\n### GLM Image\r\n\r\n<img width=\"10101\" height=\"7371\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e2de76ab-35a9-4e42-9252-6b3dbf75e991\" />\r\n\r\nWe present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at this https URL. Code, models and more information are released at https://github.com/zai-org/GLM-V\r\n\r\n* [GLM-Image] AR Model Support for GLM-Image by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43100\r\n\r\n### LWDetr\r\n\r\n<img width=\"706\" height=\"242\" alt=\"image\" src=\"https://github.com/user-attachments/assets/2c4770f4-5d48-4d87-a454-b574b648e5ae\" />\r\n\r\n[LW-DETR](https://huggingface.co/papers/2407.17140) proposes a light-weight Detection Transformer (DETR) architecture designed to compete with and surpass the dominant YOLO series for real-time object detection. It achieves a new state-of-the-art balance between speed (latency) and accuracy (mAP) by combining recent transformer advances with efficient design choices.\r\n\r\nThe LW-DETR architecture is characterized by its simple and efficient structure: a plain ViT Encoder, a Projector, and a shallow DETR Decoder. It enhances the DETR architecture for efficiency and speed using the following core modifications:\r\n\r\n    Efficient ViT Encoder: Uses a plain ViT with interleaved window/global attention and a window-major organization to drastically reduce attention complexity and latency.\r\n    Richer Input: Aggregates multi-level features from the encoder and uses a C2f Projector (YOLOv8) to pass two-scale features ( 1 / 8 and 1 / 32 ).\r\n    Faster Decoder: Employs a shallow 3-layer DETR decoder with deformable cross-attention for lower latency and faster convergence.\r\n    Optimized Queries: Uses a mixed-query scheme combining learnable content queries and generated spatial queries.\r\n\r\n* Add LWDetr model by @sbucaille in https://github.com/huggingface/transformers/pull/40991\r\n\r\n### LightOnOCR\r\n\r\n<img width=\"1172\" height=\"661\" alt=\"image\" src=\"https://github.com/user-attachments/assets/cee98e82-b3d0-42a6-b820-3061752ad4a8\" />\r\n\r\nLightOnOcr combines a Vision Transformer encoder (Pixtral-based) with a lightweight text decoder (Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.\r\n\r\n* Add LightOnOCR model implementation by @baptiste-aubertin in https://github.com/huggingface/transformers/pull/41621\r\n\r\n## Bugfixes and improvements\r\n\r\n* `JetMoe` Fix jetmoe after #40132  by @ArthurZucker in #41324\r\n* Fixed tiny incorrect import in `gemma3`  by @Sai-Suraj-27 in #41354\r\n* Rope for Qwen2--5-vl  by @zucchini-nlp in #41173\r\n* 🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence  by @Cyrilvallez in #41268\r\n* Standardize `PretrainedConfig` to `PreTrainedConfig`  by @Cyrilvallez in #41300\r\n* Fix trainer for py3.9  by @SunMarc in #41359\r\n* Check model inputs - hidden states  by @zucchini-nlp in #40994\r\n* [`ModularChecker`] QOL for the modular checker  by @ArthurZucker in #41361\r\n* Fixing a typo for BLT model  by @Narsil in #41325\r\n* :rotating_light: [`v5`] Remove relative position embeddings (for bert like models)  by @vasqu in #41170\r\n* Fix typo in model proposal template  by @Ombucha in #41352\r\n* Better typehints for `apply_chat_template`  by @Samoed in #41355\r\n* 🚨 Remove BetterTransformer  by @Cyrilvallez in #41367\r\n* [testing] update `test_longcat_generation_cpu`  by @ydshieh in #41368\r\n* Fix flash_attention.py: wrong argument passing for attn_implementation  by @TKONIY in #41347\r\n* Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939  by @sonianuj287 in #41284\r\n* Fixes in check_model_inputs, GPTBigCodeModel and ImageGPTModel  by @IlyasMoutawwakil in #40811\r\n* Remove unnecessary list comprehension  by @cyyever in #41305\r\n* make some ut cases pass on xpu w/ latest torch  by @yao-matrix in #41337\r\n* Remove unused function patameters  by @cyyever in #41358\r\n* [`CB`] Refactors the way we access paged  by @ArthurZucker in #41370\r\n* serve: add non-streaming mode to /v1/responses; stream event parity; remove placeholder logprobs  by @antznette1 in #41353\r\n* Update from pretrained error when loading  by @ArthurZucker in #33380\r\n* [`v5`] Sync Bert and Bart eager attention  by @vasqu in #41248\r\n* fix asr ut failures  by @yao-matrix in #41332\r\n* fix resample in asr pipeline  by @yhzx233 in #41298\r\n* Correct numerical regression in vision embeddings  by @i3hz in #41374\r\n* [kernels] Kernel Config   by @MekkCyber in #41232\r\n* [Cache] lfm2 cache: allocate empty kv layers during init  by @paulpak58 in #41396\r\n* Fix test for model with dotted name and relative imports  by @st81 in #41343\r\n* Prefer raising `TypeError` exception for invalid type  by @Sai-Suraj-27 in #41346\r\n* [v5] Bump accelerate to 1.1.0   by @SunMarc in #41234\r\n* Fix incorrect assignment in `update_device_map` for GPTQ quantizer  by @Sai-Suraj-27 in #41328\r\n* [v5] Delete left traces of feature extractor  by @zucchini-nlp in #41321\r\n* Remove deprecation warning  by @Cyrilvallez in #41425\r\n* Fix overriding common_kwargs defaults in processor calls  by @yonigozlan in #41381\r\n* v5 dev version  by @LysandreJik in #41436\r\n* Tiny Cleanup - Removed duplicate class field definition's  by @Sai-Suraj-27 in #41293\r\n* 🚨🚨 Remove all traces of legacy cache format  by @Cyrilvallez in #41378\r\n* 🚨 [v5] Prune `prune_heads`  by @gante in #41417\r\n* [v5] Bump min version of bitsandbytes to 0.46.1   by @SunMarc in #41283\r\n* Fixing comments in __init__ file  by @MekkCyber in #41414\r\n* Use accelerator API to free device memory  by @cyyever in #41195\r\n* enable new model uts to xpu and fix some failures on xpu  by @yao-matrix in #41386\r\n* [torchao] Add regex support for ModuleFqnToConfig  by @jerryzh168 in #41242\r\n* :facepalm: CB nit!   by @ArthurZucker in #41413\r\n* Remove Python 3.9 classifier  by @cyyever in #41410\r\n* [`JetMoe`] Fix KV head repetition and padding free  by @vasqu in #41423\r\n* [testing] Fix `JetMoeIntegrationTest`  by @ydshieh in #41377\r\n* Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation  by @ErfanBaghaei in #40837\r\n* Validate processing kwargs with @strict from huggingface_hub   by @zucchini-nlp in #40793\r\n* Update hqq.md  by @prathamesh-chavan-22 in #41452\r\n* enable some falcon-mamba uts on xpu  by @yao-matrix in #41428\r\n* Fix generate outputs and simplify cache tests  by @Cyrilvallez in #41440\r\n* Fix doc  by @Cyrilvallez in #41457\r\n* 🚨 [v5] Rename left traces of `past_key_value` in BERT-like models  by @zucchini-nlp in #41448\r\n* Subconfig is a class attribute  by @zucchini-nlp in #41308\r\n* [v5] rm `utils/tf_ops/`  by @gante in #41402\r\n* Update GLM-4.1V MMRope implementation  by @zRzRzRzRzRzRzR in #41182\r\n* [kernels] Cleanup deta kernel  by @MekkCyber in #41470\r\n* 🚨 [v5] Rendundant code in nested configs  by @zucchini-nlp in #41314\r\n* Remove KERAS_NLP_IMPORT_ERROR  by @cyyever in #41468\r\n* Fix auto model configuration for encoder of perceptionlm  by @fschlatt in #41464\r\n* Fix tests fsdp  by @SunMarc in #41422\r\n* Import Callable from collections.abc  by @cyyever in #41130\r\n* Pickle - part 2  by @ydshieh in #41476\r\n* Remove infer_device  by @cyyever in #41088\r\n* Change RT-Detr docs to reflect fixed 640x640 input size  by @konstantinos-p in #41364\r\n* Cleaning hub kernels   by @MekkCyber in #41477\r\n* [v5] remove load_in_4bit and load_in_8bit  by @SunMarc in #41287\r\n* :rotating_light: [`Attention Masks`] Bidirectional masks for encoder and encoder-decoder models  by @vasqu in #41265\r\n* [Fix] Fix test file error  by @YangKai0616 in #40973\r\n* enhance patched_tearDown to support python 3.11+  by @yao-matrix in #41429\r\n* RT-Detr correct 2d positional embeddings for non-square images  by @konstantinos-p in #41380\r\n* Fix bnb fsdp loading for pre-quantized checkpoint  by @SunMarc in #41415\r\n* Remove SigOpt  by @SunMarc in #41479\r\n* Remove `past_index`  by @SunMarc in #41384\r\n* Remove deprecated args in Trainer for v5  by @SunMarc in #41404\r\n* Update GLM-4.6 doc  by @zRzRzRzRzRzRzR in #41471\r\n* `report_to` default changed to \"none\" + cleaning deprecated env var  by @SunMarc in #41375\r\n* deprecate `overwrite_output_dir`  by @SunMarc in #41323\r\n* [`CI`] Fix copies on main  by @vasqu in #41486\r\n* [Trainer] deprecate ray scope  by @SunMarc in #41403\r\n* deprecate `jit_mode_eval`  by @SunMarc in #41376\r\n* Remove `local_rank` arg from `TrainingArguments`  by @SunMarc in #41382\r\n* Update philosophy  by @molbap in #41438\r\n* Remove DISABLE_KERNEL_MAPPING flag  by @MekkCyber in #41475\r\n* Streaming should be handled at the request-level rather than at the istance level  by @LysandreJik in #41444\r\n* fix bnb model loading  by @jiqing-feng in #41499\r\n* [kernels] Remove RWKV kernel finally !  by @MekkCyber in #41493\r\n* [kernels] rm yoso kernel  by @MekkCyber in #41495\r\n* Try to remove `pickle` - `BloomTokenizerFast`  by @ydshieh in #41466\r\n* Fixed tiny incorrect imports in `glm4v`  by @Sai-Suraj-27 in #41483\r\n* [Parakeet] unnecessary warning & auto mapping  by @eustlb in #41412\r\n* [causallm tester] automate pipeline mappings + bloom tests  by @gante in #41318\r\n* Fix some tests  by @Cyrilvallez in #41503\r\n* fix gemma3n case failure  by @yao-matrix in #41426\r\n* [voxtral] language detection + skipping lang:xx  by @eustlb in #41225\r\n* Set `truncation` to `False` in Qwen3Omni to avoid default truncation  by @BakerBunker in #41473\r\n* [QoL] modular conversion shows LoC saved  by @molbap in #41500\r\n* More trainer cleaning   by @SunMarc in #41489\r\n* Bump to hfh 1.0.0.rc5 to fix test  by @Wauplin in #41508\r\n* Revert `local_rank` deletion and some cleaning  by @SunMarc in #41504\r\n* Fix detectron2 import  by @Cyrilvallez in #41510\r\n* add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc  by @benkeene in #41484\r\n* Remove outdated flags  by @Cyrilvallez in #41512\r\n* remove `tpu_num_cores`  by @SunMarc in #41383\r\n* Allow optuna's catch kwargs passthrough  by @nicha-api in #41496\r\n* Fix Latex typesetting in documentation  by @cyyever in #41177\r\n* [testing] reduce runtime of `HunYuanMoEV1IntegrationTest:test_model_generation`  by @ydshieh in #41373\r\n* [Qwen3VL] fix: hidden_states in place modification error  by @HollowMan6 in #41535\r\n* Add MLlama fast image processor  by @yonigozlan in #41391\r\n* Fixed Type-hints in function defintions  by @Sai-Suraj-27 in #41525\r\n* [SAM] Fix typing hints   by @zucchini-nlp in #41506\r\n* Restore cuda graphs to continuous batching  by @remi-or in #41421\r\n* Add AMD developer cloud support  by @fan-amd in #41126\r\n* Enable modular files from other libraries  by @regisss in #41372\r\n* 🚨 [v5] `generate` delegates default cache initialization to the model  by @gante in #41505\r\n* Fixed typos and formatting  by @julian-st in #34215\r\n* Add VideoMAE video processor   by @Aki-07 in #41534\r\n* [`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff  by @ArthurZucker in #41445\r\n* Remove references to AutoModelForVision2Seq  by @Rocketknight1 in #41513\r\n* [Qwen3VL] fix device mismatch error for FSDP2 training  by @HollowMan6 in #41536\r\n* Patch MistralCommonTokenizer  by @juliendenize in #41439\r\n* Fix an import error with PreTrainModel  by @remi-or in #41571\r\n* [Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting  by @danielquintas8 in #41420\r\n* [kernels] rm mra kernels  by @MekkCyber in #41507\r\n* delete some tokenizer tests using pickle  by @ydshieh in #41514\r\n* Add DINOv3Backbone for ConvNext variant  by @merveenoyan in #40651\r\n* Add conditional checks to _check_and_adjust_attn_implementation()  by @zheliuyu in #41542\r\n* add rmsnorm kernels support for Intel XPU  by @kaixuanliu in #41563\r\n* Revert \"add rmsnorm kernels support for Intel XPU\"  by @MekkCyber in #41579\r\n* [VisionEncoderDecoderModel] Update loss function  by @NielsRogge in #40863\r\n* Add __iter__ to DynamicCache  by @remi-or in #41569\r\n* Revert some breaking changes bnb  by @SunMarc in #41581\r\n* Fix typsetting and content of llm_tutorial_optimization.md  by @cyyever in #41172\r\n* Gemma3 fixes  by @remi-or in #41572\r\n* Benchmark overhaul  by @remi-or in #41408\r\n* Enable non-streaming mode in `transformers serve`  by @LysandreJik in #41446\r\n* [device_map] Accelerate loading by computing device_map much faster  by @Cyrilvallez in #41548\r\n* Add `logits_to_keep` to many older CausalLM models  by @philiproeleveld in #41335\r\n* fix some case failures lead by \"`torch.compile` recompiled part of th…  by @sywangyi in #41558\r\n* remove ray_scope and check_quantized_param  by @SunMarc in #41587\r\n* Update issue template   by @SunMarc in #41573\r\n* [`Docs`] Fix changed references  by @vasqu in #41614\r\n* Import `expand_device_map` instead of redefining it  by @Cyrilvallez in #41608\r\n* Fix trainer simple tests  by @SunMarc in #41449\r\n* More markdown file fixes  by @cyyever in #41599\r\n* torch 2.9 don't ❤️ torchcodec 💔   by @ydshieh in #41610\r\n* Update a dataset reop link  by @ydshieh in #41618\r\n* Add fast path for bidirectional mask creation to fix regression  by @i3hz in #41586\r\n* enable sdpa enable gqa logic for Ascend NPU  by @FightingZhen in #41601\r\n* Fix video processing channel format  by @zucchini-nlp in #41603\r\n* [chat template] update when \"push_to_hub\"  by @zucchini-nlp in #39815\r\n* Remove the head masking block in some vision models  by @ydshieh in #41620\r\n* Remove deprecated code  by @SunMarc in #41616\r\n* Fix quantization base class   by @SunMarc in #41613\r\n* [docs] Duplicate entry  by @stevhliu in #41591\r\n* Update executorch.md  by @jackzhxng in #41582\r\n* Add Backbone API fine-tuning tutorial  by @merveenoyan in #41590\r\n* 🚨 [v5] Toggle the serialization format in processors  by @zucchini-nlp in #41474\r\n* Add aux loss for GLM-4.5V  by @zRzRzRzRzRzRzR in #41564\r\n* Allow passing `tp_plan` in `from_pretrained` directly  by @Cyrilvallez in #41435\r\n* Fix tokenization test  by @Cyrilvallez in #41649\r\n* Remove randomly added script  by @Cyrilvallez in #41650\r\n* Add missing dates to docs  by @yonigozlan in #41576\r\n* Migrate transformers cli to Typer  by @Wauplin in #41487\r\n* Fix FP-Quant quantization fallback CPU dispatch.  by @BlackSamorez in #41619\r\n* fix check inputs for text2text pipeline  by @jiqing-feng in #41556\r\n* [`Executorch`] Simplify for encoder models  by @vasqu in #41627\r\n* [`Ernie 4.5 Moe`] Fix Moe and offloading  by @vasqu in #41385\r\n* [CI] Build translated docs  by @stevhliu in #41632\r\n* Fix fp32_ln for various models  by @remi-or in #41605\r\n* Adjust device logging level and add minor fixes  by @mario-koddenbrock in #41636\r\n* Fix EncoderDecoder cache  by @remi-or in #41612\r\n* Format MarkDown documentation and tiny fixes  by @cyyever in #41638\r\n* Fix typos in documentation  by @cyyever in #41641\r\n* Fix confusing cls assignment  by @cyyever in #41642\r\n* Double router compute?  by @molbap in #41653\r\n* [kernels] refactor function kernel calling  by @MekkCyber in #41577\r\n* [Fix] Deepseek V3 expert bias routing  by @fjosw in #41647\r\n* purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet  by @Vaibhavs10 in #41656\r\n* [`Masks`] Fix mask handling in eager for vision models  by @vasqu in #41625\r\n* Use | for Optional and Union typing  by @cyyever in #41646\r\n* Switch to CB if cache_implementation == paged  by @remi-or in #41655\r\n* Add in-out modalities as class attribute per model  by @zucchini-nlp in #41366\r\n* Fix dtype casting with quantization  by @Cyrilvallez in #41665\r\n* Fix serving continuous batching  by @SunMarc in #41624\r\n* Small changes to benchmarking script  by @remi-or in #41662\r\n* Improve package version check  by @Cyrilvallez in #41661\r\n* improve `utils/check_bad_commit.py`  by @ydshieh in #41658\r\n* Erroring when KernelConfig is passed without use_kernels = True  by @MekkCyber in #41657\r\n* [Trainer] [Breaking change] `use_cache` default to `False`  by @SunMarc in #41585\r\n* 🌐 [i18n-KO] Translated `chat_extras.md` to Korean  by @Judy-Choi in #39863\r\n* 🌐 [i18n-KO] Translated sam_hq.md to Korean  by @HyunZ118 in #41340\r\n* [i18n-KO] Translated `big_bird.md` to Korean  by @ssum21 in #40445\r\n* 🌐 [i18n-KO] Translated `code_llama.md` to Korean  by @Judy-Choi in #40558\r\n* 🌐 [i18n-KO] Translated llama4.md to Korean  by @TaskerJang in #40396\r\n* :globe_with_meridians: [i18n-KO] Translated `ko-LFM2.md` to Korean  by @ssum21 in #41502\r\n* Adding superglue fast image processing  by @AlphaOrOmega in #41394\r\n* Fix ckpt in docs  by @zucchini-nlp in #41659\r\n* torch 2.9 still don't ❤️ torchcodec 0.8 💔  by @ydshieh in #41686\r\n* Remove deprecated `use_auth_token` parameter  by @Wauplin in #41666\r\n* Remove  require_torch_bf16_gpu  by @cyyever in #40979\r\n* path validation for security reason  by @ydshieh in #41256\r\n* 🚨 Remove torchscript support  by @Cyrilvallez in #41688\r\n* Fix MarkDown syntax  by @cyyever in #41676\r\n* Use | for Optional and Union typing   by @cyyever in #41675\r\n* 🚨 [v5] Refactor RoPE for layer types  by @zucchini-nlp in #39847\r\n* Enable faiss-cpu on Windows  by @cyyever in #41678\r\n* Fix Pylint warnings  by @cyyever in #41644\r\n* 🚨 Remove torch.fx support  by @Cyrilvallez in #41683\r\n* Remove skipped tests without parents  by @Cyrilvallez in #41691\r\n* Enable  FURB rules in ruff  by @cyyever in #41395\r\n* Remove upper version bound of pandas  by @cyyever in #41677\r\n* [`Attn`] Allow dynamic causality in SDPA via Kwargs  by @vasqu in #41692\r\n* Simplify GQA conditions in sdpa_attention.py  by @justinchuby in #41699\r\n* [docs] Manual tp-plan  by @stevhliu in #41674\r\n* 🌐 [i18n-KO] Translated gemma3n.md to Korean  by @HyunZ118 in #40873\r\n* pin torchcodec on CI docker image  by @ydshieh in #41703\r\n* Update `run_name` docs in TrainingArguments  by @tobiasofsn in #41705\r\n* further improve `utils/check_bad_commit.py`  by @ydshieh in #41658) \r\n* feat: add benchmark v2 ci with results pushed to dataset  by @McPatate in #41672\r\n* Gemma3 conversion script maintenance  by @RyanMullins in #41704\r\n* Fix Qwen3-Omni inference when mixing video and image inputs in one batch  by @BakerBunker in #41741\r\n* Fix typo in LFM-VL  by @zucchini-nlp in #41742\r\n* Revert \"Remove upper version bound of pandas\"  by @ydshieh in #41744\r\n* [doc] remove broken notebooks on AMD Dev Cloud  by @pagezyhf in #41743\r\n* Update type hints in tokenization_utils.py to use | syntax  by @faizan842 in #41713\r\n* Fix documentation issues  by @cyyever in #41726\r\n* Apply RUFF PIE rules  by @cyyever in #41727\r\n* Small Fix for imports   by @MekkCyber in #41411\r\n* Docs(zh-hans): Refine wording for professionalism in README  by @Ri-Nai in #40943\r\n* Add vision contribution guide  by @molbap in #41456\r\n* upgrade xpu docker file to torch 2.8  by @yao-matrix in #41551\r\n* [v5] Delete `videos` from image processing classes   by @zucchini-nlp in #41607\r\n* Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again  by @i3hz in #41758\r\n* [kernels] Add version to function mapping  by @MekkCyber in #41685\r\n* Reduce warning noise caused by Tensor.new_tensor  by @st81 in #41748\r\n* Fix graphormer model compilation with Cython 3.1.4  by @alexmalyshev in #41671\r\n* Update type hints in modeling_rope_utils.py to use | syntax  by @faizan842 in #41714\r\n* [v5] Remove deprecated tranformers.onnx  by @echarlaix in #41700\r\n* Modernize CLIP modeling code   by @molbap in #41546\r\n* Simplify pipeline padding logic  by @Rocketknight1 in #41667\r\n* Chat response parsing  by @Rocketknight1 in #40894\r\n* Add LightGlue fast image processor  by @yonigozlan in #41670\r\n* Fix bark after #41445  by @ydshieh in #41645\r\n* Remove invalid `@staticmethod` from module-level get_device_and_memory_breakdown  by @albertvillanova in #41747\r\n* Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular  by @albertvillanova in #41757\r\n* fix: Gemma 3 weights conversion vision and multimodal projector paths  by @RyanMullins in #41767\r\n* [v5] Delete legacy chat template saving  by @zucchini-nlp in #41648\r\n* [quantization] fix compressed_tensors tests  by @MekkCyber in #41780\r\n* [quantization] Skip Fp8 tests when hardware capability < 8.9  by @MekkCyber in #41785\r\n* Swap columns and rows of the grid layout in LFM2-VL  by @ankke in #41755\r\n* fix type annotation typo in docstring  by @johntheprime in #41788\r\n* Fix chat schema tests  by @Rocketknight1 in #41793\r\n* Fix attention mask in mamba layers  by @zucchini-nlp in #41790\r\n* [quantization] fix torchao tests after 0.14.0 release  by @MekkCyber in #41777\r\n* [`Onnx docs`] Remove some traces  by @vasqu in #41791\r\n* flash attn pytest marker  by @ydshieh in #41781\r\n* Bump AMD docker  by @remi-or in #41792\r\n* make apollo test case pass  by @yao-matrix in #41805\r\n* Add a safeguard around a flaky test in gemma2  by @remi-or in #41811\r\n* Fix Qwen3Next dtype API usage  by @SrijanUpadhyay in #41735\r\n* [Trainer] remove env vars   by @SunMarc in #41697\r\n* Fixed grammar mistakes  by @FrogWarlord in #41799\r\n* Fixed some grammar mistakes  by @FrogWarlord in #41802\r\n* transformers cli default flag fix  by @ArjunPimpale in #41761\r\n* Deprecate warmup_ratio  by @SunMarc in #41326\r\n* transformers serve quantization docs + some api fixes for bitsandbytes  by @SunMarc in #41253\r\n* [Parakeet] add output_attention_mask  by @eustlb in #41694\r\n* unpin torch/torchcodec for CircleCI  by @ydshieh in #41839\r\n* extend bitnet cases to xpu, all 8 cases pass  by @yao-matrix in #41831\r\n* extend 2 trainer test cases to xpu  by @yao-matrix in #41829\r\n* extend 2 blip2 and falcon_h1 test cases to xpu  by @yao-matrix in #41825\r\n* further reducing flakiness in `utils/check_bad_commit.py`  by @ydshieh in #41658)  \r\n* Remove redundant code from Qwen3VLProcessor  by @Xqle in #41836\r\n* Fix MXFP4 quantizer to support variable num_local_experts and hidden_size  by @marksverdhei in #41795\r\n* Fix Qwen2Audio flash attention mask format for generation  by @Abdennacer-Badaoui in #41843\r\n* Fix const parsing for dict inputs in chat schemas  by @Rocketknight1 in #41824\r\n* Share embedding modules in BART, not only weights  by @githubnemo in #41821\r\n* Fix TypeError: find_adapter_config_file() got an unexpected keyword argument '_adapter_model_path'  by @albertvillanova in #41604\r\n* :rotating_light: [`Clip`] Fix masking and enable flash attention on all model types  by @vasqu in #41750\r\n* CI workflow for Flash Attn  by @ydshieh in #41857\r\n* Fix torch.no_grad decorator in VLMS  by @yaswanth19 in #41888\r\n* Fix installation cmds in docs  by @yaswanth19 in #41887\r\n* revert changes in _is_package_available  by @MekkCyber in #41891\r\n* make lfm2_moe integration test pass on XPU  by @yao-matrix in #41796\r\n* Fix: avoid duplicate token in maybe_load_adapters  by @luaenrique in #41903\r\n* speed up loading checkpoints for zero stage 3  by @ri938 in #41850\r\n* evaluate>=0.4.6 is needed  by @stas00 in #41920\r\n* Add 6 huggingface notebooks on AMD dev cloud  by @fan-amd in #41883\r\n* Fix invalid examples in QwenVL model docstrings and add Qwen3VL example  by @Xqle in #41812\r\n* Allow parse_response to accept token IDs  by @Rocketknight1 in #41849\r\n* Fix Florence2 conversion script model_type KeyError  by @i3hz in #41866\r\n* Update some workflow files  by @ydshieh in #41892\r\n* fix some ut failures on XPU w/ torch 2.9  by @yao-matrix in #41923\r\n* Cache latest pytorch amd image locally on mi325 CI runner cluster  by @jitesh-gupta in #41926\r\n* Minor fix in docker image build workflow  by @ydshieh in #41949\r\n* fix some ut failures on XPU w/ torch 2.9  by @yao-matrix in #41941\r\n* Fix rope_parameters for gemma3 weights conversion script  by @douglas-reid in #41922\r\n* Fix: Gemma3TextConfig rope scaling assignments  by @RyanMullins in #41934\r\n* fix prepare_config_and_inputs_for_common bug in llava test  by @yao-matrix in #41942\r\n* Fix: prevent .gitignore truncation in run_clm_no_trainer.py  by @luaenrique in #41957\r\n* V4.57.1 training ci: Refactor `test_tensor_parallel.py`  by @3outeille in #41918\r\n* [v5] Return a BatchEncoding dict from apply_chat_template by default  by @Rocketknight1 in #41626\r\n* make recurrent_gemma and voxtral cases pass on xpu  by @yao-matrix in #41958\r\n* Fix typo in image_processing_lfm2_vl_fast  by @yonigozlan in #41940\r\n* Run slow v2  by @ydshieh in #41914\r\n* Fix `detectron2` installation in docker files  by @ydshieh in #41975\r\n* Fix `autoawq[kernels]` installation in quantization docker file  by @ydshieh in #41978\r\n* add support for saving encoder only so any parakeet model can be loaded for inference  by @nithinraok in #41969\r\n* Use indices as position_ids in modernebert  by @remi-or in #41789\r\n* test tensor parallel: make tests for dense model more robust  by @3outeille in #41968\r\n* fix: dict[RopeParameters] to dict[str, RopeParameters]  by @RyanMullins in #41963\r\n* docs: add continuous batching page  by @McPatate in #41847\r\n* Fix `torchcodec` version in quantization docker file  by @ydshieh in #41988\r\n* [kernels] Add Tests & CI for kernels  by @MekkCyber in #41765\r\n* Move the Mi355 to regular docker  by @remi-or in #41989\r\n* More data in benchmarking  by @remi-or in #41848\r\n* fix (CI): Refactor SSH runners  by @glegendre01 in #41991\r\n* fix 3 failed test cases for video_llama_3 model on Intel XPU  by @kaixuanliu in #41931\r\n* Integrate colqwen2.5 using colqwen2 modelling code  by @sahil-kabir in #40600\r\n* Fixed wrong padding value in OWLv2  by @gjamesgoenawan in #41938\r\n* Fix `run slow v2`: empty report when there is only one model  by @ydshieh in #42002\r\n* [kernels] change import time in KernelConfig  by @MekkCyber in #42004\r\n* DOC Fix typo in argument name: pseudoquant  by @BenjaminBossan in #41994\r\n* Fix `torch+deepspeed` docker file  by @ydshieh in #41985\r\n* Correct syntax error in trainer.md  by @Yacklin in #42001\r\n* Reduce the number of benchmark in the CI  by @remi-or in #42008\r\n* Fix continuous batching tests  by @Rocketknight1 in #42012\r\n* add back `logging_dir`  by @SunMarc in #42013\r\n* Fix issue with from pretrained and kwargs in image processors  by @yonigozlan in #41997\r\n* Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors  by @MilkClouds in #41871\r\n* Add GLPNImageProcessorFast   by @Aravind-11 in #41725\r\n* add fuyu fast image processors  by @DeXtAr47-oss in #41817\r\n* [kernels] Fix XPU layernorm kernel  by @MekkCyber in #41583\r\n* [v5] Deprecate Text2Text and related pipelines  by @Rocketknight1 in #41996\r\n* [FPQuant] MXFP8 and MXFP4 backwards support  by @BlackSamorez in #41897\r\n* fix `deeepspeed` in AMD docker file  by @ydshieh in #42025\r\n* CodeQL workflow for security analysis  by @paulinebm in #42015\r\n* [tests] Add Context-parallel CI tests  by @kashif in #41860\r\n* extend fp_quant cases to xpu  by @yao-matrix in #41833\r\n* Change trigger time for AMD CI  by @ydshieh in #42034\r\n* Fix the order of methods in processor loading  by @zucchini-nlp in #42031\r\n* 🔴  Isolate prefill from generation loops  by @manueldeprada in #40652\r\n* update `huggingface_hub` dependency version  by @hanouticelina in #42033\r\n* Remove some custom datasets defined in codebase  by @ydshieh in #41511\r\n* Cleanup workflow - part 1  by @ydshieh in #42023\r\n* Fix `pr_slow_ci_suggestion.yml` after #42023  by @ydshieh in #42049\r\n* Fix AutoImageProcessor.register and documentation in auto processing modules  by @MilkClouds in #41864\r\n* Fix Qwen3-Omni RoPE  by @zucchini-nlp in #41778\r\n* Avoid explicit checkout in workflow  by @ydshieh in #42057\r\n* Annoying typo in attention error message  by @manueldeprada in #42037\r\n* Be careful at explicit checkout actions  by @ydshieh in #42060\r\n* Fix another `Argument list too long` in `pr_slow_ci_suggestion.yml`  by @ydshieh in #42061\r\n* Fix KeyError in GPT-OSS weight conversion script  by @Aznix07 in #42007\r\n* Fix KeyError in _is_package_available for packages with dotted names  by @yashwantbezawada in #42050\r\n* Revert back to use GitHub context   by @ydshieh in #42066\r\n* Fix missing arg in check_docstring  by @yonigozlan in #42054\r\n* [deepspeed tests fixes]  by @stas00 in #41925\r\n* Fix logic in setting self.fsdp when it is False  by @roychan in #41974\r\n* fix tensor device placement issue of 2 UT cases  by @yao-matrix in #41921\r\n* add workflow to check permissions and advise a set of permissions req…  by @paulinebm in #42071\r\n* Fix security issue 5  by @paulinebm in #42072\r\n* Fix inconsistency of commit sha during the workflow run  by @ydshieh in #42074\r\n* QwenVL: add skipped keys in `setattr` as well  by @zucchini-nlp in #41808\r\n* permissions worflows fix  by @paulinebm in #42080\r\n* 4.1V Model and GLM-4.5V Model Conversion Code Updates  by @zRzRzRzRzRzRzR in #41784\r\n* feat(ci): add continuous batching to benchmarks  by @McPatate in #41916\r\n* Fix modular docstring for Mixtral  by @diegoakel in #42041\r\n* Fix Auto classes to support dynamically registered processors  by @MilkClouds in #41865\r\n* Reinstate self.scaling in Gemma3nTextAttention  by @RyanMullins in #41751\r\n* [v5] 🚨Refactor subprocessors handling in processors  by @yonigozlan in #41633\r\n* add xpu support in test_modeling_janus.py::JanusIntegrationTest::test…  by @sywangyi in #41986\r\n* Revert \"permissions worflows fix\"  by @ydshieh in #42110\r\n* Fix return metadata checking logic  by @Xqle in #42108\r\n* Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor  by @kho in #42076\r\n* [Bugfix] fix qwen3vl expand generation with video  by @JJJYmmm in #42089\r\n* Fix base model prefix in VLMs  by @zucchini-nlp in #42059\r\n* fix continuous batching issues, extend ut cases to xpu  by @yao-matrix in #41830\r\n* 📝 docs(smolvlm): fix variable name in batch inference example  by @gorkachea in #42123\r\n* fix qwen2vl/qwen3vl video processor temporal padding when num_frames%temporal_patch_size!=1  by @yaogang2060 in #42083\r\n* [`Attn Masks`] Non-vmap default for attention masks  by @vasqu in #41852\r\n* Fix GPT-2 Flash Attention 2 generation with left-padding  by @Abdennacer-Badaoui in #41966\r\n* Fix model name test for compressed tensors   by @SunMarc in #42128\r\n* Fix MaskFormer/Mask2Former fast image processors  by @yonigozlan in #41393\r\n* Remove unused functions in `image_transforms.py`  by @yaswanth19 in #42044\r\n* update deps table  by @ArthurZucker in #42120\r\n* fix: improve video processing fps assignment logic  by @Xqle in #42009\r\n* Fix T5Gemma module structure  by @Cyrilvallez in #42145\r\n* DataCollatorForLanguageModeling warning error fixed  by @mjaliz in #42144\r\n* Bugfix/remove emojis from print  by @7amim in #42091\r\n* Avoid mutating user-provided arguments in preprocessing utils  by @LeonardoEmili in #42126\r\n* Enforce check_auto_docstring  by @yonigozlan in #41635\r\n* Add dinov3 autobackbone  by @vijayabhaskar-ev in #41276\r\n* Fix logic error in `prepare_inputs_for_generation` cache slicing condition  by @albertvillanova in #41764\r\n* :rotating_light: Fix gradient checkpointing for several models and improve test robustness    by @githubnemo in #41818\r\n* [`T5Gemma`] Fix cross attention cache  by @vasqu in #41890\r\n* T5 migration to new masking interface  by @Aravind-11 in #41804\r\n* fix: improve visibility of ValueError root causes in model config loading  by @scottzh8 in #41972\r\n* add xpu to valid hardware for torch.compile  by @sywangyi in #42079\r\n* extend test_beam_search_early_stop_heuristic case to other device  by @sywangyi in #42078\r\n* fix failure of tests/models/shieldgemma2/test_modeling_shieldgemma2.p…  by @sywangyi in #42022\r\n* Fixes Flash Attention implementation for models   by @i3hz in #42149\r\n* fix test failure of speculative_generation on xpu  by @sywangyi in #42052\r\n* add rmsnorm kernels support for npu  by @zheliuyu in #42106\r\n* update torchao doc  by @jiqing-feng in #42139\r\n* feat(kernels): add opt-out flag to disable kernels hub usage through the lib  by @mfuntowicz in #41990\r\n* handle inputs from Siglip/Siglip2 non-automapped encoder layers  by @molbap in #41930\r\n* Add slow to some examples tests   by @SunMarc in #42164\r\n* fix(ci): unexpected keyword argument `streaming`  by @McPatate in #42102\r\n* pin `pytest<9` for now  by @ydshieh in #42162\r\n* Docs/i18n updates  by @lilin-1 in #42006\r\n* Fix in-place modification of user-input in SAM2 embed boxes  by @xenova in #42173\r\n* [`Pop2Piano`] Fix cache usage  by @vasqu in #42170\r\n* Fix helper fn for new processor config format  by @zucchini-nlp in #42085\r\n* Remove unnecessary slicing in sdpa_attention_forward  by @justinchuby in #41900\r\n* [`PEFT`] Fix prefix tuning  by @vasqu in #41696\r\n* [typo] fix mrope-interleave annotation to avoid ambiguity  by @JJJYmmm in #42177\r\n* Update transformers to support `FqnToConfig`  by @jcaip in #41894\r\n* [`PEFT`] Fix the general test for prefix tuning  by @vasqu in #42185\r\n* [TP] Fix parameter detection issue and some invalid TP-plans  by @Cyrilvallez in #42129\r\n* Refactor weight loading  by @ArthurZucker in #41580\r\n* 🚨 Delete deprecations with end-cycle in v4.xx and v5.0  by @zucchini-nlp in #41681\r\n* Add AutoTokenizer mapping for mistral3 and ministral  by @patrickvonplaten in #42198\r\n* Fix checkpoint loading with DeepSpeed ZeRO3  by @tohtana in #42201\r\n* [`Pop2Piano`] Fix tied weights  by @vasqu in #42193\r\n* New docker from AMD  by @remi-or in #42208\r\n* Add cross links for model contribution  by @zucchini-nlp in #42207\r\n* Stop inheriting tests!  by @Rocketknight1 in #42192\r\n* Refactor check_auto_docstring using AST  by @yonigozlan in #41432\r\n* [`BLT`] Fix cache usage  by @vasqu in #42188\r\n* Update `test_dynamic_cache_exportability_multiple_run` (failing on torch 2.10 nightly)  by @ydshieh in #42212\r\n* Much more efficient and clear weight initialization and tie weights  by @Cyrilvallez in #42191\r\n* GLM-V update with new processor  by @zRzRzRzRzRzRzR in #42122\r\n* Fix initialization guard for pytest  by @Cyrilvallez in #42234\r\n* Fix TP plans for MoE models  by @Cyrilvallez in #42236\r\n* Add prefix sharing to continuous batching  by @remi-or in #42094\r\n* Loading optimization  by @Cyrilvallez in #42239\r\n* calls `AttentionMaskConverter._unmask_unattended` for xpu device before  by @kaixuanliu in #42230\r\n* FIX Broken PEFT adapter loading  by @BenjaminBossan in #42187\r\n* Fix processor test for glm  by @molbap in #42233\r\n* Fix UnboundLocalError in RT-DETR loss computation  by @yashwantbezawada in #42224\r\n* Stop inheriting tests (again)  by @Rocketknight1 in #42247\r\n* [loading] Fix device when source and target are different  by @Cyrilvallez in #42246\r\n* Reduce timing on CircleCI - part 1 (Use @slow for IntegrationTests)  by @ydshieh in #42206\r\n* 🚨 Delete generation params from model config  by @zucchini-nlp in #41695\r\n* Allow VLMs to have a correct `base_model`  by @zucchini-nlp in #41589\r\n* Make tests run in less time by reducing `batch_size`  by @ydshieh in #42213\r\n* Revert \"Make tests run in less time by reducing `batch_size`\"  by @ydshieh in #42258\r\n* Cleanup reference to TFBertTokenizer and TFGPT2Tokenizer  by @Rocketknight1 in #42182\r\n* delete already deprecated models  by @ydshieh in #42235\r\n* Fix bnb for the weights refactor  by @SunMarc in #42043\r\n* Fix looping in torch guard decorator  by @Cyrilvallez in #42260\r\n* 🚨  Generalize `get_decoder()` for multimodal and delete redundant code 🔪   by @zucchini-nlp in #42156\r\n* Audio Flamingo3 - fix attention masking  by @zucchini-nlp in #42278\r\n* Add support for torch device objects in device validator  by @yonigozlan in #42267\r\n* Remove doc files of other langs for deleted models  by @ydshieh in #42276\r\n* [testing] fix `cwm`  by @ydshieh in #42261\r\n* fix a typo: pbd -> pdb  by @jaeminoh in #42268\r\n* Enable glm46v UTs on XPU  by @YangKai0616 in #42274\r\n* [testing] fix some cases in xpu  by @sywangyi in #42273\r\n* Remove random flag  by @Cyrilvallez in #42282\r\n* Fix accelerate integration  by @Cyrilvallez in #42264\r\n* Fix validation checks order in benchmark_v2  by @Abdennacer-Badaoui in #42280\r\n* Update torchcodec to match torchaudio version  by @remi-or in #42288\r\n* Use `torch.get_autocast_dtype` instead of `torch.get_autocast_gpu_dtype`  by @qgallouedec in #42055\r\n* perf: Optimization for Min-p sampling implementation  by @casinca in #42248\r\n* Fix device_map computation part 2  by @Cyrilvallez in #42290\r\n* Fixed the docstring for `WhisperFeatureExtractor`  by @TopCoder2K in #42286\r\n* avoiding conditional indexing in positionalencoding to avoid possibil…  by @ppadjinTT in #42090\r\n* ENH: Add support for LoRA hotswapping  by @BenjaminBossan in #41297\r\n* Fix Break change of AWQ FusedModules due to Attention Refactor  by @fanqiNO1 in #41909\r\n* Remove error string test that was failing  by @Rocketknight1 in #42301\r\n* Properly protect the is_compiling checks  by @Cyrilvallez in #42304\r\n* Remove outdated methods in modeling_utils.py  by @Cyrilvallez in #42302\r\n* Fix Mac mps dataloader_num_workers > 1 causes RuntimeError: _share_filename_: only available on CPU  by @AmitMY in #38819\r\n* Fix the init_weights for the MoE models  by @Cyrilvallez in #42306\r\n* Update link to generation strategies documentation  by @omkar-334 in #42252\r\n* Update conversion mapping to separate renaming from converting  by @ArthurZucker in #42254\r\n* fix(granitemoe*): Only create block_sparse_moe if num_local_experts > 0  by @gabe-l-hart in #42036\r\n* [SAM3 Video] Add support for multi prompts   by @yonigozlan in #42293\r\n* Add Pix2Struct fast image processor  by @yonigozlan in #42020\r\n* Fix post processing methods in  keypoints matching models  by @yonigozlan in #42018\r\n* fix tests/models/xcodec/test_modeling_xcodec.py::XcodecIntegrationTest  by @sywangyi in #42272\r\n* [loading] Fix device detection  by @Cyrilvallez in #42323\r\n* Fix typo from side_dict to size_dict  by @nihui in #42319\r\n* HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate  by @stas00 in #41832\r\n* Fix gpt2 modeling tests  by @Abdennacer-Badaoui in #42321\r\n* [loading] Use fewer threads by default for much better performances  by @Cyrilvallez in #42324\r\n* Allow LayoutLMV3Processor to accept rescale_factor  by @Rocketknight1 in #42305\r\n* Correctly create tied key mapping in post_init, and dynamic tie weight  by @Cyrilvallez in #42270\r\n* [`CI`] Skip `EfficientLoFTR` test  by @vasqu in #42327\r\n* [XPU] Add flash_attn2 support for XPU  by @YangKai0616 in #41956\r\n* [`Attn Masks`] Lift bidirectional mask restriction on eager  by @vasqu in #42325\r\n* fix bug when gemma3n model run on multiple device  by @kaixuanliu in #42303\r\n* Fix ChineseCLIPModel.get_text_features  by @JiangJQ2000 in #42351\r\n* Gemma3 hybrid fix  by @remi-or in #42287\r\n* fix(benchmarks): correct sdpa_backend inconsistency and attn_implementation for continuous batching  by @engmohamedsalah in #42339\r\n* Auto convert tekken.json  by @ArthurZucker in #42299\r\n* [loading] Re-add and improve disk offloading support  by @Cyrilvallez in #42242\r\n* Fix typo - indentation in JSON dump example  by @anthropikos in #42332\r\n* Fix tied weight for Bart (for BC)  by @Cyrilvallez in #42355\r\n* Fix reference to yelp dataset  by @JuanFKurucz in #42349\r\n* Fix documentation reference to pytorch max memory allocated  by @JuanFKurucz in #42350\r\n* Fix reference to imagenet 1k dataset  by @JuanFKurucz in #42348\r\n* Fix typos  by @omahs in #42354\r\n* Protect `torch.distributed` imports  by @Cyrilvallez in #42361\r\n* Expand npu device for KernelConfig  by @zheliuyu in #42358\r\n* Replace Optional and Union typing with | in some source files  by @cyyever in #42294\r\n* Fix code examples to load gpt 1 openai community model  by @JuanFKurucz in #42347\r\n* fix tekken pattern matching  by @ArthurZucker in #42363\r\n* Fixed-wrong-ZeRO3-json-snippet-found-in-deepspeed-markdown-file  by @Yacklin in #42346\r\n* Make benchmarking lighter: clean-up result files and remove non-needed arguments  by @remi-or in #42357\r\n* Add image processor fast vitpose  by @yonigozlan in #42021\r\n* Small tp fix  by @ArthurZucker in #42366\r\n* Remove test inheritance for EfficientLoftr, rename KeypointMatchingOutput to model specific name  by @yonigozlan in #42365\r\n* Tiny doc fix  by @molbap in #42296\r\n* Fix TimesFM patch normalization instability  by @AnMakc in #42099\r\n* [core] Fix torchao   by @MekkCyber in #42289\r\n* Fix tp  by @ArthurZucker in #42368\r\n* [`Attn Masks`] Add skip option for non-packed sequences  by @vasqu in #42367\r\n* 📚 docs(granite-speech): add comprehensive usage examples  by @gorkachea in #42125\r\n* Xcodec fix  by @eustlb in #42095\r\n* Replace Optional and Union typing with | in some source files  by @cyyever in #42372\r\n* [`Mistral Tokenizers`] Fix tokenizer detection  by @vasqu in #42389\r\n* misc don't recreate it  by @ArthurZucker in #42394\r\n* [SAM3] Fix precompute vision_embeds or text_embeds for inference  by @yonigozlan in #42407\r\n* 🚨 Image-text pipeline expects correctly formatted chat  by @zucchini-nlp in #42359\r\n* Many small fixes for the CI  by @remi-or in #42364\r\n* [core] fix mxfp4  by @MekkCyber in #42382\r\n* fixed json syntax error for zero2 configuration file found in deepspeed.md  by @Yacklin in #42406\r\n* GLM4V - delete duplicate config attribute  by @zucchini-nlp in #42416\r\n* 🚨 Remove generic output_attentions warning  by @Aravind-11 in #42334\r\n* Bart config doesn't need generation parameters  by @zucchini-nlp in #42337\r\n* Simplify and standardize processor tests  by @yonigozlan in #41773\r\n* Clean bnb integration using weight converter  by @SunMarc in #42426\r\n* Any to any pipeline and auto-mapping  by @zucchini-nlp in #40884\r\n* Fix processor usage + add chat_template support to TTS pipeline, and shift common chat template logic to base class.  by @ebezzam in #42326\r\n* [fp8] fix scales param name  by @MekkCyber in #42434\r\n* Fix an edge case for `get_encoder()`  by @zucchini-nlp in #42295\r\n* Disable loss rounding in training stats log  by @AnMakc in #42104\r\n* Benchmark simplification  by @remi-or in #42408\r\n* Future annotations break FastAPI  by @LysandreJik in #42450\r\n* [cleanup] Don't use Repository in create_dummy_models.py script  by @Wauplin in #42380\r\n* [cleanup] Remove deprecated load config from file  by @Wauplin in #42383\r\n* [`FA`] Cleanup loading logic  by @vasqu in #41427\r\n* tiny fix for deepseekocr support [vllm]  by @molbap in #42423\r\n* fix: Restore explicit .keys() calls for TensorDict compatibility  by @pankajbaid567 in #42373\r\n* Transformers serve -> list all generative models from the cache   by @LysandreJik in #42146\r\n* 🚨 [v5][PEFT] Bump min version requirement of PEFT to  0.18.0  by @BenjaminBossan in #41889\r\n* [cleanup] Offline mode and cache dir from `huggingface_hub` constants + cleanup in `PushToHubMixin`  by @Wauplin in #42391\r\n* Correctly return finish reason length when finished  by @LysandreJik in #42157\r\n* FIX: Minimal fix for loading PEFT weights  by @BenjaminBossan in #42387\r\n* Let's break Qwen-VL 🚨    by @zucchini-nlp in #42420\r\n* [`CI`] Add to run slow  by @vasqu in #42459\r\n* Fix the \"test_offline\" test  by @LysandreJik in #42458\r\n* `transformers chat` launched without base_url has a direct tie to localhost:8000  by @LysandreJik in #42463\r\n* update with more recent tts models  by @Deep-unlearning in #42328\r\n* rm slow tokenizers  by @itazap in #40936\r\n* [loading/saving] Reverse all loading operations when saving  by @Cyrilvallez in #42396\r\n* Fix T5 tests: use generation_config for generation parameters  by @Abdennacer-Badaoui in #42419\r\n* remove reference to TF models from docs  by @zucchini-nlp in #42443\r\n* [Trainer] use output.loss when using liger-kernel  by @kashif in #42444\r\n* replace source_keys and target_keys  by @SunMarc in #42471\r\n* Update migration guide - generation config  by @zucchini-nlp in #42470\r\n* 🚨 Move `rotary_partial_emb` to RopeParams and delete unnecessary code 🔪   by @zucchini-nlp in #42255\r\n* Fix doc builds  by @Rocketknight1 in #42478\r\n* extend CwmIntegrationTest to xpu  by @sywangyi in #42314\r\n* add require_deterministic_for_xpu to make the case pass in xpu  by @sywangyi in #42439\r\n* Skip failing irrelevant test for ColQwen2  by @Rocketknight1 in #42480\r\n* [quantization] make torchao tests slow  by @MekkCyber in #42482\r\n* Fix gpt2 tokenizer `add_prefix_space` default value   by @SunMarc in #42481\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ArthurZucker\r\n    * `JetMoe` Fix jetmoe after #40132 (#41324)\r\n    * [`ModularChecker`] QOL for the modular checker (#41361)\r\n    * [`CB`] Refactors the way we access paged (#41370)\r\n    * Update from pretrained error when loading (#33380)\r\n    * :facepalm: CB nit!  (#41413)\r\n    * [`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff (#41445)\r\n    * update deps table (#42120)\r\n    * Refactor weight loading (#41580)\r\n    * Update conversion mapping to separate renaming from converting (#42254)\r\n    * Auto convert tekken.json (#42299)\r\n    * fix tekken pattern matching (#42363)\r\n    * Small tp fix (#42366)\r\n    * Fix tp (#42368)\r\n    * misc don't recreate it (#42394)\r\n* @vasqu\r\n    * :rotating_light: [`v5`] Remove relative position embeddings (for bert like models) (#41170)\r\n    * [`v5`] Sync Bert and Bart eager attention (#41248)\r\n    * [`JetMoe`] Fix KV head repetition and padding free (#41423)\r\n    * :rotating_light: [`Attention Masks`] Bidirectional masks for encoder and encoder-decoder models (#41265)\r\n    * [`CI`] Fix copies on main (#41486)\r\n    * [`Docs`] Fix changed references (#41614)\r\n    * [`Executorch`] Simplify for encoder models (#41627)\r\n    * [`Ernie 4.5 Moe`] Fix Moe and offloading (#41385)\r\n    * [`Masks`] Fix mask handling in eager for vision models (#41625)\r\n    * [`Attn`] Allow dynamic causality in SDPA via Kwargs (#41692)\r\n    * [`Onnx docs`] Remove some traces (#41791)\r\n    * :rotating_light: [`Clip`] Fix masking and enable flash attention on all model types (#41750)\r\n    * [`Attn Masks`] Non-vmap default for attention masks (#41852)\r\n    * [`T5Gemma`] Fix cross attention cache (#41890)\r\n    * [`Pop2Piano`] Fix cache usage (#42170)\r\n    * [`PEFT`] Fix prefix tuning (#41696)\r\n    * [`PEFT`] Fix the general test for prefix tuning (#42185)\r\n    * [`Pop2Piano`] Fix tied weights (#42193)\r\n    * [`BLT`] Fix cache usage (#42188)\r\n    * [`CI`] Skip `EfficientLoFTR` test (#42327)\r\n    * [`Attn Masks`] Lift bidirectional mask restriction on eager (#42325)\r\n    * [`Attn Masks`] Add skip option for non-packed sequences (#42367)\r\n    * [`Mistral Tokenizers`] Fix tokenizer detection (#42389)\r\n    * [`FA`] Cleanup loading logic (#41427)\r\n    * [`CI`] Add to run slow (#42459)\r\n* @ydshieh\r\n    * [testing] update `test_longcat_generation_cpu` (#41368)\r\n    * [testing] Fix `JetMoeIntegrationTest` (#41377)\r\n    * Pickle - part 2 (#41476)\r\n    * Try to remove `pickle` - `BloomTokenizerFast` (#41466)\r\n    * [testing] reduce runtime of `HunYuanMoEV1IntegrationTest:test_model_generation` (#41373)\r\n    * delete some tokenizer tests using pickle (#41514)\r\n    * torch 2.9 don't ❤️ torchcodec 💔  (#41610)\r\n    * Update a dataset reop link (#41618)\r\n    * Remove the head masking block in some vision models (#41620)\r\n    * improve `utils/check_bad_commit.py` (#41658)\r\n    * torch 2.9 still don't ❤️ torchcodec 0.8 💔 (#41686)\r\n    * path validation for security reason (#41256)\r\n    * pin torchcodec on CI docker image (#41703)\r\n    * further improve `utils/check_bad_commit.py` (#41658) (#41690)\r\n    * Revert \"Remove upper version bound of pandas\" (#41744)\r\n    * Fix bark after #41445 (#41645)\r\n    * flash attn pytest marker (#41781)\r\n    * unpin torch/torchcodec for CircleCI (#41839)\r\n    * further reducing flakiness in `utils/check_bad_commit.py` (#41658)  (#41815)\r\n    * CI workflow for Flash Attn (#41857)\r\n    * Update some workflow files (#41892)\r\n    * Minor fix in docker image build workflow (#41949)\r\n    * Run slow v2 (#41914)\r\n    * Fix `detectron2` installation in docker files (#41975)\r\n    * Fix `autoawq[kernels]` installation in quantization docker file (#41978)\r\n    * Fix `torchcodec` version in quantization docker file (#41988)\r\n    * Fix `run slow v2`: empty report when there is only one model (#42002)\r\n    * Fix `torch+deepspeed` docker file (#41985)\r\n    * fix `deeepspeed` in AMD docker file (#42025)\r\n    * Change trigger time for AMD CI (#42034)\r\n    * Remove some custom datasets defined in codebase (#41511)\r\n    * Cleanup workflow - part 1 (#42023)\r\n    * Fix `pr_slow_ci_suggestion.yml` after #42023 (#42049)\r\n    * Avoid explicit checkout in workflow (#42057)\r\n    * Be careful at explicit checkout actions (#42060)\r\n    * Fix another `Argument list too long` in `pr_slow_ci_suggestion.yml` (#42061)\r\n    * Revert back to use GitHub context  (#42066)\r\n    * Fix inconsistency of commit sha during the workflow run (#42074)\r\n    * Revert \"permissions worflows fix\" (#42110)\r\n    * pin `pytest<9` for now (#42162)\r\n    * Update `test_dynamic_cache_exportability_multiple_run` (failing on torch 2.10 nightly) (#42212)\r\n    * Reduce timing on CircleCI - part 1 (Use @slow for IntegrationTests) (#42206)\r\n    * Make tests run in less time by reducing `batch_size` (#42213)\r\n    * Revert \"Make tests run in less time by reducing `batch_size`\" (#42258)\r\n    * delete already deprecated models (#42235)\r\n    * Remove doc files of other langs for deleted models (#42276)\r\n    * [testing] fix `cwm` (#42261)\r\n* @cyyever\r\n    * Remove unnecessary list comprehension (#41305)\r\n    * Remove unused function patameters (#41358)\r\n    * Use accelerator API to free device memory (#41195)\r\n    * Remove Python 3.9 classifier (#41410)\r\n    * Remove KERAS_NLP_IMPORT_ERROR (#41468)\r\n    * Import Callable from collections.abc (#41130)\r\n    * Remove infer_device (#41088)\r\n    * Fix Latex typesetting in documentation (#41177)\r\n    * Fix typsetting and content of llm_tutorial_optimization.md (#41172)\r\n    * More markdown file fixes (#41599)\r\n    * Format MarkDown documentation and tiny fixes (#41638)\r\n    * Fix typos in documentation (#41641)\r\n    * Fix confusing cls assignment (#41642)\r\n    * Use | for Optional and Union typing (#41646)\r\n    * Remove  require_torch_bf16_gpu (#40979)\r\n    * Fix MarkDown syntax (#41676)\r\n    * Use | for Optional and Union typing  (#41675)\r\n    * Enable faiss-cpu on Windows (#41678)\r\n    * Fix Pylint warnings (#41644)\r\n    * Enable  FURB rules in ruff (#41395)\r\n    * Remove upper version bound of pandas (#41677)\r\n    * Fix documentation issues (#41726)\r\n    * Apply RUFF PIE rules (#41727)\r\n    * Replace Optional and Union typing with | in some source files (#42294)\r\n    * Replace Optional and Union typing with | in some source files (#42372)\r\n* @yao-matrix\r\n    * make some ut cases pass on xpu w/ latest torch (#41337)\r\n    * fix asr ut failures (#41332)\r\n    * enable new model uts to xpu and fix some failures on xpu (#41386)\r\n    * enable some falcon-mamba uts on xpu (#41428)\r\n    * enhance patched_tearDown to support python 3.11+ (#41429)\r\n    * fix gemma3n case failure (#41426)\r\n    * upgrade xpu docker file to torch 2.8 (#41551)\r\n    * make apollo test case pass (#41805)\r\n    * extend bitnet cases to xpu, all 8 cases pass (#41831)\r\n    * extend 2 trainer test cases to xpu (#41829)\r\n    * extend 2 blip2 and falcon_h1 test cases to xpu (#41825)\r\n    * make lfm2_moe integration test pass on XPU (#41796)\r\n    * fix some ut failures on XPU w/ torch 2.9 (#41923)\r\n    * fix some ut failures on XPU w/ torch 2.9 (#41941)\r\n    * fix prepare_config_and_inputs_for_common bug in llava test (#41942)\r\n    * make recurrent_gemma and voxtral cases pass on xpu (#41958)\r\n    * extend fp_quant cases to xpu (#41833)\r\n    * fix tensor device placement issue of 2 UT cases (#41921)\r\n    * fix continuous batching issues, extend ut cases to xpu (#41830)\r\n* @MekkCyber\r\n    * [kernels] Kernel Config  (#41232)\r\n    * Fixing comments in __init__ file (#41414)\r\n    * [kernels] Cleanup deta kernel (#41470)\r\n    * Cleaning hub kernels  (#41477)\r\n    * Remove DISABLE_KERNEL_MAPPING flag (#41475)\r\n    * [kernels] Remove RWKV kernel finally ! (#41493)\r\n    * [kernels] rm yoso kernel (#41495)\r\n    * [kernels] rm mra kernels (#41507)\r\n    * Revert \"add rmsnorm kernels support for Intel XPU\" (#41579)\r\n    * [kernels] refactor function kernel calling (#41577)\r\n    * Erroring when KernelConfig is passed without use_kernels = True (#41657)\r\n    * Small Fix for imports  (#41411)\r\n    * [kernels] Add version to function mapping (#41685)\r\n    * [quantization] fix compressed_tensors tests (#41780)\r\n    * [quantization] Skip Fp8 tests when hardware capability < 8.9 (#41785)\r\n    * [quantization] fix torchao tests after 0.14.0 release (#41777)\r\n    * revert changes in _is_package_available (#41891)\r\n    * [kernels] Add Tests & CI for kernels (#41765)\r\n    * [kernels] change import time in KernelConfig (#42004)\r\n    * [kernels] Fix XPU layernorm kernel (#41583)\r\n    * [core] Fix torchao  (#42289)\r\n    * [core] fix mxfp4 (#42382)\r\n    * [fp8] fix scales param name (#42434)\r\n    * [quantization] make torchao tests slow (#42482)\r\n* @paulpak58\r\n    * [Cache] lfm2 cache: allocate empty kv layers during init (#41396)\r\n    * [Model] Lfm2Moe (#41401)\r\n* @gante\r\n    * 🚨 [v5] Prune `prune_heads` (#41417)\r\n    * [v5] rm `utils/tf_ops/` (#41402)\r\n    * [causallm tester] automate pipeline mappings + bloom tests (#41318)\r\n    * 🚨 [v5] `generate` delegates default cache initialization to the model (#41505)\r\n* @zRzRzRzRzRzRzR\r\n    * Update GLM-4.1V MMRope implementation (#41182)\r\n    * Update GLM-4.6 doc (#41471)\r\n    * Add aux loss for GLM-4.5V (#41564)\r\n    * 4.1V Model and GLM-4.5V Model Conversion Code Updates (#41784)\r\n    * GLM-V update with new processor (#42122)\r\n* @jacobkahn\r\n    * Add Code World Model (CWM) (#41199)\r\n* @molbap\r\n    * Update philosophy (#41438)\r\n    * [QoL] modular conversion shows LoC saved (#41500)\r\n    * Double router compute? (#41653)\r\n    * Add vision contribution guide (#41456)\r\n    * Modernize CLIP modeling code  (#41546)\r\n    * handle inputs from Siglip/Siglip2 non-automapped encoder layers (#41930)\r\n    * Fix processor test for glm (#42233)\r\n    * Tiny doc fix (#42296)\r\n    * tiny fix for deepseekocr support [vllm] (#42423)\r\n* @Wauplin\r\n    * Bump to hfh 1.0.0.rc5 to fix test (#41508)\r\n    * Migrate transformers cli to Typer (#41487)\r\n    * Remove deprecated `use_auth_token` parameter (#41666)\r\n    * added more breaking changes\r\n    * [cleanup] Don't use Repository in create_dummy_models.py script (#42380)\r\n    * [cleanup] Remove deprecated load config from file (#42383)\r\n    * [cleanup] Offline mode and cache dir from `huggingface_hub` constants + cleanup in `PushToHubMixin` (#42391)\r\n* @remi-or\r\n    * Restore cuda graphs to continuous batching (#41421)\r\n    * Fix an import error with PreTrainModel (#41571)\r\n    * Add __iter__ to DynamicCache (#41569)\r\n    * Gemma3 fixes (#41572)\r\n    * Benchmark overhaul (#41408)\r\n    * Fix fp32_ln for various models (#41605)\r\n    * Fix EncoderDecoder cache (#41612)\r\n    * Switch to CB if cache_implementation == paged (#41655)\r\n    * Small changes to benchmarking script (#41662)\r\n    * Bump AMD docker (#41792)\r\n    * Add a safeguard around a flaky test in gemma2 (#41811)\r\n    * Use indices as position_ids in modernebert (#41789)\r\n    * Move the Mi355 to regular docker (#41989)\r\n    * More data in benchmarking (#41848)\r\n    * Reduce the number of benchmark in the CI (#42008)\r\n    * New docker from AMD (#42208)\r\n    * Add prefix sharing to continuous batching (#42094)\r\n    * Update torchcodec to match torchaudio version (#42288)\r\n    * Gemma3 hybrid fix (#42287)\r\n    * Make benchmarking lighter: clean-up result files and remove non-needed arguments (#42357)\r\n    * Many small fixes for the CI (#42364)\r\n    * Benchmark simplification (#42408)\r\n* @lkhl\r\n    * [model] Add VideoLLaMA3 implementation (#40499)\r\n* @philiproeleveld\r\n    * Add `logits_to_keep` to many older CausalLM models (#41335)\r\n* @AlphaOrOmega\r\n    * Adding superglue fast image processing (#41394)\r\n* @echarlaix\r\n    * [v5] Remove deprecated tranformers.onnx (#41700)\r\n* @Aravind-11\r\n    * Add GLPNImageProcessorFast  (#41725)\r\n    * T5 migration to new masking interface (#41804)\r\n    * 🚨 Remove generic output_attentions warning (#42334)\r\n* @DeXtAr47-oss\r\n    * add fuyu fast image processors (#41817)\r\n* @lashahub\r\n    * [models] Add AudioFlamingo3 integration (#40290)\r\n* @lilin-1\r\n    * Docs/i18n updates (#42006)\r\n* @burtenshaw\r\n    * [MODEL] Nanochat implementation (#41634)\r\n* @itazap\r\n    * rm slow tokenizers (#40936)\r\n","publishedAt":"2026-01-26T10:17:10.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.0.0","media":[]},{"id":"rel_YsOzmPcE-wv9XBHtDZ10u","version":"v5.0.0rc3","title":"Release candidate v5.0.0rc3","summary":"# Release candidate v5.0.0rc3\r\n\r\n## New models:\r\n\r\n* [GLM-4.7] GLM-Lite Supoort by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull...","content":"# Release candidate v5.0.0rc3\r\n\r\n## New models:\r\n\r\n* [GLM-4.7] GLM-Lite Supoort by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43031\r\n* [GLM-Image] AR Model Support for GLM-Image by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/43100\r\n* Add LWDetr model by @sbucaille in https://github.com/huggingface/transformers/pull/40991\r\n* Add LightOnOCR model implementation by @baptiste-aubertin in https://github.com/huggingface/transformers/pull/41621\r\n\r\n## What's Changed\r\n\r\nWe are getting closer and closer to the official release! \r\nThis RC is focused on removing more of the deprecated stuff, fixing some minors issues, doc updates.\r\n\r\n* Update Japanese README to match English version by @lilin-1 in https://github.com/huggingface/transformers/pull/43069\r\n* [docs] Deploying by @stevhliu in https://github.com/huggingface/transformers/pull/42263\r\n* [docs] inference engines by @stevhliu in https://github.com/huggingface/transformers/pull/42932\r\n* Fix typos: Remove duplicate duplicate words words by @efeecllk in https://github.com/huggingface/transformers/pull/43040\r\n* [style] Rework ruff rules and update all files by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43144\r\n* [CB] Minor fix in kwargs by @remi-or in https://github.com/huggingface/transformers/pull/43147\r\n* [Bug] qwen2_5_omni: cap generation length to be less than the max_position_embedding in DiT by @sniper35 in https://github.com/huggingface/transformers/pull/43068\r\n* Fix some deprecated practices in torch 2.9 by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43167\r\n* Fix Fuyu processor width dimension bug in `_get_num_multimodal_tokens` by @Abhinavexists in https://github.com/huggingface/transformers/pull/43137\r\n* Inherit from PreTrainedTokenizerBase by @juliendenize in https://github.com/huggingface/transformers/pull/43143\r\n* Generation config boolean defaults by @zucchini-nlp in https://github.com/huggingface/transformers/pull/43000\r\n* Fix failing `BartModelIntegrationTest` by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43160\r\n* fix failure of llava/pixtral by @sywangyi in https://github.com/huggingface/transformers/pull/42985\r\n* GemmaTokenizer: remove redundant whitespace pre-tokenizer by @vaibhav-research in https://github.com/huggingface/transformers/pull/43106\r\n* Support `auto_doctring` in Processors by @yonigozlan in https://github.com/huggingface/transformers/pull/42101\r\n* Fix failing `BitModelIntegrationTest` by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43164\r\n* [`Fp8`] Fix experts by @vasqu in https://github.com/huggingface/transformers/pull/43154\r\n* Docs: improve wording for documentation build instructions by @Sailnagale in https://github.com/huggingface/transformers/pull/43007\r\n* [makefile] Cleanup and improve the rules by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43171\r\n* Some new models added stuff that was already removed by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43179\r\n* Fixes and compilation warning in torchao docs by @merveenoyan in https://github.com/huggingface/transformers/pull/42909\r\n* [cache] Remove all deprecated classes by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43168\r\n* Bump huggingface_hub minimal version by @Wauplin in https://github.com/huggingface/transformers/pull/43188\r\n* Rework check_config_attributes.py by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43191\r\n* Fix generation config validation by @zucchini-nlp in https://github.com/huggingface/transformers/pull/43175\r\n* [style] Use 'x | y' syntax for processors as well by @Wauplin in https://github.com/huggingface/transformers/pull/43189\r\n* Remove deprecated objects by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43170\r\n* fix chunked prefill implementation issue-43082 by @marcndo in https://github.com/huggingface/transformers/pull/43132\r\n* Reduce add_dates verbosity by @yonigozlan in https://github.com/huggingface/transformers/pull/43184\r\n* Add support for MiniMax-M2 by @rogeryoungh in https://github.com/huggingface/transformers/pull/42028\r\n* Fix failing `salesforce-ctrl`, `xlm` & `gpt-neo` model generation tests by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43180\r\n* Less verbose library helpers by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43197\r\n* run all test files on CircleCI by @ydshieh in https://github.com/huggingface/transformers/pull/43146\r\n* Clamp temperature to >=1.0 for Dia generation by @Haseebasif7 in https://github.com/huggingface/transformers/pull/43029\r\n* Fix spelling typos in comments and code by @raimbekovm in https://github.com/huggingface/transformers/pull/43046\r\n* [docs] llama.cpp by @stevhliu in https://github.com/huggingface/transformers/pull/43185\r\n* [docs] gptq formatting fix by @victorywwong in https://github.com/huggingface/transformers/pull/43216\r\n* Grouped beam search from config params by @zucchini-nlp in https://github.com/huggingface/transformers/pull/42472\r\n* [`Generate`] Allow custom config values in generate config by @vasqu in https://github.com/huggingface/transformers/pull/43181\r\n* Fix failing `Pix2StructIntegrationTest` by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43229\r\n* Fix missing UTF-8 encoding in check_repo.py for Windows compatibility by @aarushisingh04 in https://github.com/huggingface/transformers/pull/43123\r\n* [Tokenizer] Change default value of return_dict to True in doc string for apply_chat_template by @kashif in https://github.com/huggingface/transformers/pull/43223\r\n* Fix failing `PhiIntegrationTests` by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43214\r\n* Use `HF_TOKEN` directly and remove `require_read_token`  by @ydshieh in https://github.com/huggingface/transformers/pull/43233\r\n* Fix failing `Owlv2ModelIntegrationTest` & `OwlViTModelIntegrationTest` by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43182\r\n* Fix flashattn wrt quantized models by @SunMarc in https://github.com/huggingface/transformers/pull/43145\r\n* Remove unused imports by @cyyever in https://github.com/huggingface/transformers/pull/43078\r\n* Fix unsafe torch.load() in _load_rng_state allowing arbitrary code execution by @ColeMurray in https://github.com/huggingface/transformers/pull/43140\r\n* Reapply modular to examples by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43234\r\n* More robust diff checks in `add_dates` by @yonigozlan in https://github.com/huggingface/transformers/pull/43199\r\n* docs: fix grammatical error in README.md by @davidfertube in https://github.com/huggingface/transformers/pull/43236\r\n* Fix typo: seperately → separately in lw_detr converter by @skyvanguard in https://github.com/huggingface/transformers/pull/43235\r\n* Qwen-VL video processor accepts min/max pixels by @zucchini-nlp in https://github.com/huggingface/transformers/pull/43228\r\n* Deprecate dtype per sub config by @zucchini-nlp in https://github.com/huggingface/transformers/pull/42990\r\n* Remove more deprecated objects/args by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43195\r\n* [CB] Soft-reset offloading by @remi-or in https://github.com/huggingface/transformers/pull/43150\r\n* Make benchmark-v2 to be device agnostic, to support more torch built-in devices like xpu by @yao-matrix in https://github.com/huggingface/transformers/pull/43153\r\n* Fix benchmark script by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43253\r\n* Adding to run slow by @IlyasMoutawwakil in https://github.com/huggingface/transformers/pull/43250\r\n* Fix failing `Vip-llava` model integration test by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43252\r\n* Remove deprecated and unused `position_ids` in all `apply_rotary_pos_emb` by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43255\r\n* fix `_get_test_info` in `testing_utils.py` by @ydshieh in https://github.com/huggingface/transformers/pull/43259\r\n* Fix failing `Hiera`, `SwiftFormer` & `LED` Model integration tests by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43225\r\n* [style] Fix init isort and align makefile and CI by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43260\r\n* [docs] tensorrt-llm by @stevhliu in https://github.com/huggingface/transformers/pull/43176\r\n* [consistency] Ensure models are added to the `_toctree.yml` by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43264\r\n* Fix failing  `PegasusX`, `Mvp` & `LED` model integration tests by @Sai-Suraj-27 in https://github.com/huggingface/transformers/pull/43245\r\n* [CB] Ensure parallel decoding test passes using FA by @remi-or in https://github.com/huggingface/transformers/pull/43277\r\n* fix crash in when running FSDP2+TP by @sywangyi in https://github.com/huggingface/transformers/pull/43226\r\n* [ci] Fixing some failing tests for important models by @Abdennacer-Badaoui in https://github.com/huggingface/transformers/pull/43231\r\n\r\n## New Contributors\r\n* @efeecllk made their first contribution in https://github.com/huggingface/transformers/pull/43040\r\n* @sniper35 made their first contribution in https://github.com/huggingface/transformers/pull/43068\r\n* @Abhinavexists made their first contribution in https://github.com/huggingface/transformers/pull/43137\r\n* @vaibhav-research made their first contribution in https://github.com/huggingface/transformers/pull/43106\r\n* @Sailnagale made their first contribution in https://github.com/huggingface/transformers/pull/43007\r\n* @rogeryoungh made their first contribution in https://github.com/huggingface/transformers/pull/42028\r\n* @Haseebasif7 made their first contribution in https://github.com/huggingface/transformers/pull/43029\r\n* @victorywwong made their first contribution in https://github.com/huggingface/transformers/pull/43216\r\n* @aarushisingh04 made their first contribution in https://github.com/huggingface/transformers/pull/43123\r\n* @ColeMurray made their first contribution in https://github.com/huggingface/transformers/pull/43140\r\n* @davidfertube made their first contribution in https://github.com/huggingface/transformers/pull/43236\r\n* @skyvanguard made their first contribution in https://github.com/huggingface/transformers/pull/43235\r\n* @baptiste-aubertin made their first contribution in https://github.com/huggingface/transformers/pull/41621\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v5.0.0rc2...v5.0.0rc3","publishedAt":"2026-01-26T10:02:55.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.0.0rc3","media":[]},{"id":"rel_lQt0-zVYecRkL8i4JrkFw","version":"v4.57.6","title":"Patch release v4.57.6","summary":"## What's Changed\r\nAnother fix for qwen vl models that prevented correctly loading the associated model type - this works together with https://github...","content":"## What's Changed\r\nAnother fix for qwen vl models that prevented correctly loading the associated model type - this works together with https://github.com/huggingface/transformers/pull/41808 of the previous patch release.\r\n\r\n* Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again by @i3hz in https://github.com/huggingface/transformers/pull/41758\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.57.5...v4.57.6","publishedAt":"2026-01-16T10:40:02.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.6","media":[]},{"id":"rel_d5_OKBLsH3-bSieB4UuAQ","version":"v4.57.5","title":"Patch release v4.57.5","summary":"## What's Changed\r\nShould not have said last patch :wink: These should be the last remaining fixes that got lost in between patches and the transition...","content":"## What's Changed\r\nShould not have said last patch :wink: These should be the last remaining fixes that got lost in between patches and the transition to v5. \r\n\r\n* QwenVL: add skipped keys in setattr as well by @zucchini-nlp in https://github.com/huggingface/transformers/pull/41808\r\n* Fix lr_scheduler_parsing by @SunMarc in https://github.com/huggingface/transformers/pull/41322\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.57.4...v4.57.5","publishedAt":"2026-01-13T13:29:13.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.5","media":[]},{"id":"rel_l-0__i1h9IqxBpf3-cO5V","version":"v4.57.4","title":"Patch release v4.57.4","summary":"## What's Changed\r\nLast patch release for v4: We have a few small fixes for remote generation methods (e.g. group beam search), vLLM, and an offline t...","content":"## What's Changed\r\nLast patch release for v4: We have a few small fixes for remote generation methods (e.g. group beam search), vLLM, and an offline tokenizer fix (if it's already been cached).\r\n\r\n* Grouped beam search from config params by @zucchini-nlp in https://github.com/huggingface/transformers/pull/42472\r\n* Handle decorator with optional arguments better @hmellor in https://github.com/huggingface/transformers/pull/42512\r\n* fix: make mistral base check conditional to fix offline loading by @Killusions in https://github.com/huggingface/transformers/pull/42880\r\n\r\n## New Contributors\r\n* @Killusions made their first contribution in https://github.com/huggingface/transformers/pull/42880\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.57.3...v4.57.4","publishedAt":"2026-01-13T11:07:40.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.4","media":[]},{"id":"rel_VbrdvI4olNsbMC8adYSmc","version":"v5.0.0rc2","title":"Release candidate 5.0.0rc2","summary":"## What's Changed\r\n\r\nThis release candidate is focused on fixing `AutoTokenizer`, expanding the dynamic weight loading support, and improving performa...","content":"## What's Changed\r\n\r\nThis release candidate is focused on fixing `AutoTokenizer`, expanding the dynamic weight loading support, and improving performances with MoEs!\r\n\r\n## MoEs and performances:\r\n<img width=\"2048\" height=\"1451\" alt=\"image\" src=\"https://github.com/user-attachments/assets/3ed2508e-3eb1-4f13-8717-cd9027d12a39\" />\r\n\r\n* batched and grouped experts implementations by @IlyasMoutawwakil in https://github.com/huggingface/transformers/pull/42697\r\n* Optimize MoEs for decoding using batched_mm by @IlyasMoutawwakil in https://github.com/huggingface/transformers/pull/43126\r\n\r\n## Tokenization:\r\nThe main issue with the tokenization refactor is that `tokenizer_class` are now \"enforced\" when in most cases they are wrong. This took a while to properly isolate and now we try to use `TokenizersBackend` whenever we can. #42894 has a much more detailed description of the big changes!\r\n\r\n\r\n* use `TokenizersBackend` by @ArthurZucker in https://github.com/huggingface/transformers/pull/42894\r\n* Fix convert_tekken_tokenizer by @juliendenize in https://github.com/huggingface/transformers/pull/42592\r\n* refactor more tokenizers - v5 guide update by @itazap in https://github.com/huggingface/transformers/pull/42768\r\n* [`Tokenizers`] Change treatment of special tokens by @vasqu in https://github.com/huggingface/transformers/pull/42903\r\n\r\n\r\n## Core\r\nHere we focused on boosting the performances of loading weights on device! \r\n* [saving] Simplify general logic by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42766\r\n* Do not rely on config for inferring model dtype by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42838\r\n* Improve BatchFeature: stack list and lists of torch tensors by @yonigozlan in https://github.com/huggingface/transformers/pull/42750\r\n* Remove tied weights from internal attribute if they are not tied by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42871\r\n* Enforce call to `post_init` and fix all of them by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42873\r\n* Simplify tie weights logic by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42895\r\n* Add buffers to `_init_weights` for ALL models by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42309\r\n* [loading] Really initialize on meta device for huge perf gains by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42941\r\n* Do not use accelerate hooks if the device_map has only 1 device by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43019\r\n* Move missing weights and non-persistent buffers to correct device earlier by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43021\r\n\r\n## New models\r\n* Sam: Perception Encoder Audiovisual by @eustlb in https://github.com/huggingface/transformers/pull/42905\r\n* adds jais2 model support by @sarathc-cerebras in https://github.com/huggingface/transformers/pull/42684\r\n* Add Pixio pre-trained models by @LiheYoung in https://github.com/huggingface/transformers/pull/42795\r\n* [`Ernie 4.5`] Ernie VL models by @vasqu in https://github.com/huggingface/transformers/pull/39585\r\n* [loading][TP] Fix device placement at loading-time, and simplify sharding primitives by @Cyrilvallez in https://github.com/huggingface/transformers/pull/43003\r\n* GLM-ASR  Support by @zRzRzRzRzRzRzR in https://github.com/huggingface/transformers/pull/42875\r\n\r\n## Quantization\r\n* [Devstral] Make sure FP8 conversion works correctly by @patrickvonplaten in https://github.com/huggingface/transformers/pull/42715\r\n* Fp8 dq by @SunMarc in https://github.com/huggingface/transformers/pull/42926\r\n* [Quantization] Removing misleading int8 quantization in Finegrained FP8 by @MekkCyber in https://github.com/huggingface/transformers/pull/42945\r\n* Fix deepspeed + quantization by @SunMarc in https://github.com/huggingface/transformers/pull/43006\r\n\r\n## Breaking changes\r\nMostly around processors!\r\n* 🚨 Fix ConvNeXt image processor default interpolation to BICUBIC by @lukepayyapilli in https://github.com/huggingface/transformers/pull/42934\r\n* 🚨 Fix EfficientNet image processor default interpolation to BICUBIC by @lukepayyapilli in https://github.com/huggingface/transformers/pull/42956\r\n* Add fast version of `convert_segmentation_map_to_binary_masks` to EoMT by @simonreise in https://github.com/huggingface/transformers/pull/43073\r\n* 🚨Fix MobileViT image processor default interpolation to BICUBIC by @lukepayyapilli in https://github.com/huggingface/transformers/pull/43024\r\n\r\nThanks again to everyone ! \r\n## New Contributors\r\n* @ZX-ModelCloud made their first contribution in https://github.com/huggingface/transformers/pull/42833\r\n* @AYou0207 made their first contribution in https://github.com/huggingface/transformers/pull/42863\r\n* @wasertech made their first contribution in https://github.com/huggingface/transformers/pull/42864\r\n* @preetam1407 made their first contribution in https://github.com/huggingface/transformers/pull/42685\r\n* @Taise228 made their first contribution in https://github.com/huggingface/transformers/pull/41416\r\n* @CandiedCode made their first contribution in https://github.com/huggingface/transformers/pull/42885\r\n* @sarathc-cerebras made their first contribution in https://github.com/huggingface/transformers/pull/42684\r\n* @nandan2003 made their first contribution in https://github.com/huggingface/transformers/pull/42318\r\n* @LiheYoung made their first contribution in https://github.com/huggingface/transformers/pull/42795\r\n* @majiayu000 made their first contribution in https://github.com/huggingface/transformers/pull/42928\r\n* @lukepayyapilli made their first contribution in https://github.com/huggingface/transformers/pull/42934\r\n* @leaderofARS made their first contribution in https://github.com/huggingface/transformers/pull/42966\r\n* @qianyue76 made their first contribution in https://github.com/huggingface/transformers/pull/43095\r\n* @stefgina made their first contribution in https://github.com/huggingface/transformers/pull/43033\r\n* @HuiyingLi made their first contribution in https://github.com/huggingface/transformers/pull/43084\r\n* @raimbekovm made their first contribution in https://github.com/huggingface/transformers/pull/43038\r\n* @PredictiveManish made their first contribution in https://github.com/huggingface/transformers/pull/43053\r\n* @pushkar-hue made their first contribution in https://github.com/huggingface/transformers/pull/42736\r\n* @vykhovanets made their first contribution in https://github.com/huggingface/transformers/pull/43042\r\n* @tanmay2004 made their first contribution in https://github.com/huggingface/transformers/pull/42737\r\n* @atultw made their first contribution in https://github.com/huggingface/transformers/pull/43061\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v5.0.0rc1...v5.0.0rc2","publishedAt":"2026-01-08T10:33:33.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.0.0rc2","media":[]},{"id":"rel_vJEc6FQiTpzQAd4WkHDDc","version":"v5.0.0rc1","title":"Release candidate 5.0.0rc1","summary":"## What's Changed\r\n\r\nThis release candidate was focused mostly on `quantization` support with the new dynamic weight loader, and a few notable 🚨 brea...","content":"## What's Changed\r\n\r\nThis release candidate was focused mostly on `quantization` support with the new dynamic weight loader, and a few notable 🚨 breaking changes🚨:\r\n\r\n1. Default dtype for any model when using `from_pretrained` is now `auto`! \r\n* Default auto 🚨 🚨  by @ArthurZucker in https://github.com/huggingface/transformers/pull/42805\r\n2. Default shard size when saving a model is now 50GB:\r\n* 🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42734\r\nThis is now as fast as before thanks to xet, and is just more convenient on the hub.\r\n3. Kwargs. They are fundamental to enable integration with vllm and other toosl:\r\n* Every model forward() should have **kwargs by @Rocketknight1 in https://github.com/huggingface/transformers/pull/42603\r\n\r\n\r\n### Dynamic weight loader updates:\r\nMostly QOL and fixed + support back CPU offloading.\r\n* mark params as _is_hf_initialized with DS Zero3 from weight conversion by @winglian in https://github.com/huggingface/transformers/pull/42626\r\n* [loading] Allow loading to happen without threading by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42619\r\n* [loading] Correctly load params during offloading & careful memory considerations by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42632\r\n* allow registration of custom checkpoint conversion mappings by @winglian in https://github.com/huggingface/transformers/pull/42634\r\n\r\n### New models:\r\n* Add FastVLM by @camilla-deckard in https://github.com/huggingface/transformers/pull/41112\r\n* Lasr model by @eustlb in https://github.com/huggingface/transformers/pull/42648\r\n* [Model] Add PaddleOCR-VL Model Support by @zhang-prog in https://github.com/huggingface/transformers/pull/42178\r\n\r\n\r\n### Some notable quantization fixes:\r\nMostly added support for `fbgemme` , `quanto`, \r\n* Fix fp8 + some enhancement by @SunMarc in https://github.com/huggingface/transformers/pull/42455\r\n* Fix eetq quanto quant methods by @SunMarc in https://github.com/huggingface/transformers/pull/42557\r\n* [Quantization] per tensor quantization kernel by @MekkCyber in https://github.com/huggingface/transformers/pull/42560\r\n* [Quantization] fix fbgemm by @MekkCyber in https://github.com/huggingface/transformers/pull/42561\r\n* [Quantization] Fix FP8 experts replacing by @MekkCyber in https://github.com/huggingface/transformers/pull/42654\r\n* [Quantization] Fix Static FP8 Quantization by @MekkCyber in https://github.com/huggingface/transformers/pull/42775\r\n* [core] fix fp-quant by @MekkCyber in https://github.com/huggingface/transformers/pull/42613\r\n\r\n### Peft:\r\nThe dynamic weight loader broke small things, this adds glue for all models but MoEs.\r\n* FIX Error when trying to load non-LoRA PEFT by @BenjaminBossan in https://github.com/huggingface/transformers/pull/42663\r\n* Fix PEFT integration with new weight loader by @Cyrilvallez in https://github.com/huggingface/transformers/pull/42701\r\n\r\n### Misc\r\nTokenization needed more refactoring, this time its a lot cleaner!\r\n* Refactor-tokenization-more by @ArthurZucker in https://github.com/huggingface/transformers/pull/42563\r\n* Only default `rope_parameters` to empty `dict` if there is something to put in it by @hmellor in https://github.com/huggingface/transformers/pull/42651\r\n\r\nWe omitted a lot of other commits for clarity, but thanks to everyone and the new contributors! \r\n\r\n## New Contributors\r\n* @camilla-deckard made their first contribution in https://github.com/huggingface/transformers/pull/41112\r\n* @Aaraviitkgp made their first contribution in https://github.com/huggingface/transformers/pull/42466\r\n* @ngazagna-qc made their first contribution in https://github.com/huggingface/transformers/pull/40691\r\n* @arrdel made their first contribution in https://github.com/huggingface/transformers/pull/42577\r\n* @marconaguib made their first contribution in https://github.com/huggingface/transformers/pull/42587\r\n* @Xiao-Chenguang made their first contribution in https://github.com/huggingface/transformers/pull/42436\r\n* @Furkan-rgb made their first contribution in https://github.com/huggingface/transformers/pull/42465\r\n* @mertunsall made their first contribution in https://github.com/huggingface/transformers/pull/42615\r\n* @anranlee99 made their first contribution in https://github.com/huggingface/transformers/pull/42438\r\n* @UserChen666 made their first contribution in https://github.com/huggingface/transformers/pull/42335\r\n* @efazal made their first contribution in https://github.com/huggingface/transformers/pull/41723\r\n* @Harrisonyong made their first contribution in https://github.com/huggingface/transformers/pull/36416\r\n* @hawon223 made their first contribution in https://github.com/huggingface/transformers/pull/42384\r\n* @Bissmella made their first contribution in https://github.com/huggingface/transformers/pull/42647\r\n* @AgainstEntropy made their first contribution in https://github.com/huggingface/transformers/pull/42689\r\n* @dongluw made their first contribution in https://github.com/huggingface/transformers/pull/42642\r\n* @hqkqn32 made their first contribution in https://github.com/huggingface/transformers/pull/42620\r\n* @zhang-prog made their first contribution in https://github.com/huggingface/transformers/pull/42178\r\n\r\n**Full Changelog**: https://github.com/huggingface/transformers/compare/v5.0.0rc0...v5.0.0rc1","publishedAt":"2026-01-08T10:15:16.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.0.0rc1","media":[]},{"id":"rel_VWVdVNclnJYaCoPhu5gCS","version":"v5.0.0rc0","title":"Transformers v5.0.0rc0","summary":"## Transformers v5 release notes\r\n\r\n<img width=\"1800\" height=\"1013\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7b5187d7-6945-4108-a54...","content":"## Transformers v5 release notes\r\n\r\n<img width=\"1800\" height=\"1013\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7b5187d7-6945-4108-a546-6d1d7bfb55e3\" />\r\n\r\n- Highlights\r\n- Significant API changes: dynamic weight loading, tokenization\r\n- Backwards Incompatible Changes\r\n- Bugfixes and improvements\r\n\r\n## Highlights\r\n\r\nWe are excited to announce the initial release of Transformers v5. This is the first major release in five years, and the release is significant: 800 commits have been pushed to `main` since the latest minor release. This release removes a lot of long-due deprecations, introduces several refactors that significantly simplify our APIs and internals, and comes with a large number of bug fixes.\r\n\r\nWe give an overview of our focus for this release in the [following blogpost](https://huggingface.co/blog/transformers-v5). In these release notes, we'll focus directly on the refactors and new APIs coming with v5.\r\n\r\nThis release is a release candidate (RC). It is not the final v5 release, and we will push on pypi as a pre-release. This means that the current release is purely opt-in, as installing `transformers` without specifying this exact release will install the latest version instead (v4.57.3 as of writing).\r\n\r\nIn order to install this release, please do so with the following:\r\n\r\n```shell\r\npip install transformers --pre\r\n```\r\n\r\nFor us to deliver the best package possible, it is imperative that we have feedback on how the toolkit is currently working for you. Please try it out, and [open an issue](https://github.com/huggingface/transformers/issues/) in case you're facing something inconsistent/a bug.\r\n\r\nTransformers version 5 is a community endeavor, and this is the last mile. Let's ship this together!\r\n\r\n## Significant API changes\r\n\r\n> [!NOTE]\r\n> 👀 Nothing is final and things are still actively in movement. We have a section dedicated to what is planned for future release candidates, yet is known not to work in the RC0. Look for \"Disclaimers for the RC0\".\r\n> \r\n> We'll be eagerly awaiting your feedback in our GitHub issues!\r\n\r\n### Dynamic weight loading\r\n\r\nWe introduce a new weight loading API in `transformers`, which significantly improves on the previous API. This\r\nweight loading API is designed to apply operations to the checkpoints loaded by transformers.\r\n\r\nInstead of loading the checkpoint exactly as it is serialized within the model, these operations can reshape, merge,\r\nand split the layers according to how they're defined in this new API. These operations are often a necessity when\r\nworking with quantization or parallelism algorithms.\r\n\r\nThis new API is centered around the new `WeightConverter` class:\r\n\r\n```python\r\nclass WeightConverter(WeightTransform):\r\n    operations: list[ConversionOps]\r\n    source_keys: Union[str, list[str]]\r\n    target_keys: Union[str, list[str]]\r\n```\r\n\r\nThe weight converter is designed to apply a list of operations on the source keys, resulting in target keys. A common\r\noperation done on the attention layers is to fuse the query, key, values layers. Doing so with this API would amount\r\nto defining the following conversion:\r\n\r\n```python\r\nconversion = WeightConverter(\r\n    [\"self_attn.q_proj\", \"self_attn.k_proj\", \"self_attn.v_proj\"],  # The input layers\r\n    \"self_attn.qkv_proj\",  # The single layer as output\r\n    operations=[Concatenate(dim=0)],\r\n)\r\n```\r\n\r\nIn this situation, we apply the `Concatenate` operation, which accepts a list of layers as input and returns a single \r\nlayer. \r\n\r\nThis allows us to define a mapping from architecture to a list of weight conversions. Applying those weight conversions\r\ncan apply arbitrary transformations to the layers themselves. This significantly simplified the `from_pretrained` method\r\nand helped us remove a lot of technical debt that we accumulated over the past few years.\r\n\r\nThis results in several improvements:\r\n- Much cleaner definition of transformations applied to the checkpoint\r\n- Reversible transformations, so loading and saving a checkpoint should result in the same checkpoint\r\n- Faster model loading thanks to scheduling of tensor materialization\r\n- Enables complex mix of transformations that wouldn't otherwise be possible (such as quantization + MoEs, or TP + MoEs)\r\n\r\nWhile this is being implemented, expect varying levels of support across different release candidates.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/41580\r\n\r\n\r\n### Tokenization\r\n\r\nJust as we moved towards a single backend library for model definition, we want our tokenizers, and the `Tokenizer` object to be a lot more intuitive. With v5, tokenizer definition is much simpler; one can now initialize an empty `LlamaTokenizer` and train it directly on your corpus.\r\n\r\nDefining a new tokenizer object should be as simple as this:\r\n\r\n```python\r\nfrom transformers import TokenizersBackend, generate_merges\r\nfrom tokenizers import pre_tokenizers, Tokenizer\r\nfrom tokenizers.model import BPE\r\n\r\nclass Llama5Tokenizer(TokenizersBackend):\r\n    def __init__(self, unk_token=\"<unk>\",bos_token=\"<s>\", eos_token=\"</s>\", vocab=None, merges=None ):\r\n        if vocab is None:\r\n            self._vocab = {\r\n                str(unk_token): 0,\r\n                str(bos_token): 1,\r\n                str(eos_token): 2,\r\n            }\r\n\r\n        else:\r\n            self._vocab = vocab\r\n\r\n        if merges is not None:\r\n            self._merges = merges\r\n        else:\r\n            self._merges = generate_merges(filtered_vocab)\r\n\r\n        self._tokenizer = Tokenizer(\r\n            BPE(vocab=self._vocab, merges=self._merges, fuse_unk=True)\r\n        )\r\n        self._tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(\r\n            replacement=\"▁\", prepend_scheme=_get_prepend_scheme(self.add_prefix_space, self), split=False\r\n        )\r\n        super().__init__(\r\n            tokenizer_object=self._tokenizer,\r\n            unk_token=unk_token,\r\n            bos_token=bos_token,\r\n            eos_token=eos_token,\r\n        )\r\n```\r\n\r\nOnce the tokenizer is defined as above, you can load it with the following: `Llama5Tokenizer()`. Doing this returns you an empty, trainable tokenizer that follows the definition of the authors of `Llama5` (it does not exist yet :wink:).\r\n\r\nThe above is the main motivation towards refactoring tokenization: we want tokenizers to behave similarly to models: trained or empty, and with exactly what is defined in their class definition.\r\n\r\n### Backend Architecture Changes: moving away from the slow/fast tokenizer separation\r\n\r\nUp to now, transformers maintained two parallel implementations for many tokenizers:\r\n- \"Slow\" tokenizers (`tokenization_<model>.py`) - Python-based implementations, often using [SentencePiece](https://github.com/google/sentencepiece) as the backend.\r\n- \"Fast\" tokenizers (`tokenization_<model>_fast.py`) - Rust-based implementations using the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library.\r\n\r\nIn v5, we consolidate to a single tokenizer file per model: `tokenization_<model>.py`. This file will use the most appropriate backend available:\r\n\r\n1. **TokenizersBackend** (preferred): Rust-based tokenizers from the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library. In general it provides optimal performance, but it also offers a lot more features that are commonly adopted across the ecosystem:\r\n  - handling additional tokens\r\n  - a full python API for setting and updating \r\n  - automatic parallelization,\r\n  - automatic offsets\r\n  - customization\r\n  - training\r\n2. **SentencePieceBackend**: for tokenizers requiring the `sentencepiece` library. It inherits from `PythonBackend`. \r\n3. **PythonBackend**: a Python implementations of the features provided by `tokenizers`. Basically allows adding tokens.\r\n4. **MistralCommonBackend**: relies on `MistralCommon`'s tokenization library. (Previously known as the `MistralCommonTokenizer`)\r\n\r\nThe `AutoTokenizer` automatically selects the appropriate backend based on available files and dependencies. This is transparent, you continue to use `AutoTokenizer.from_pretrained()` as before. This allows transformers to be future-proof and modular to easily support future backends.\r\n\r\n### Defining a tokenizers outside of the existing backends\r\n\r\nWe enable users and tokenizer builders to define their own tokenizers from top to bottom. Tokenizers are usually defined using a backend such as `tokenizers`, `sentencepiece` or `mistral-common`, but we offer the possibility to design the tokenizer at a higher-level, without relying on those backends.\r\n\r\nTo do so, you can import the `PythonBackend` (which was previously known as `PreTrainedTokenizer`). This class encapsulates all the logic related to added tokens, encoding, and decoding.\r\n\r\nIf you want something even higher up the stack, then `PreTrainedTokenizerBase` is what `PythonBackend` inherits from. It contains the very basic tokenizer API features: \r\n- `encode`\r\n- `decode`\r\n- `vocab_size`\r\n- `get_vocab`\r\n- `convert_tokens_to_ids`\r\n- `convert_ids_to_tokens`\r\n- `from_pretrained`\r\n- `save_pretrained`\r\n- among a few others\r\n\r\n### API Changes\r\n\r\n#### 1. Direct tokenizer initialization with vocab and merges\r\n\r\nStarting with v5, we now enable initializing blank, untrained `tokenizers`-backed tokenizers:\r\n\r\n```py\r\nfrom transformers import LlamaTokenizer\r\n\r\ntokenizer = LlamaTokenizer()\r\n```\r\n\r\nThis tokenizer will therefore follow the definition of the `LlamaTokenizer` as defined in its class definition. It can then be trained on a corpus as can be seen in [the `tokenizers` documentation](https://huggingface.co/docs/tokenizers/training_from_memory).\r\n\r\nThese tokenizers can also be initialized from vocab and merges (if necessary), like the previous \"slow\" tokenizers:\r\n\r\n```py\r\nfrom transformers import LlamaTokenizer\r\n\r\nvocab = {\"<unk>\": 0, \"<s>\": 1, \"</s>\": 2, \"hello\": 3, \"world\": 4}\r\nmerges = [(\"h\", \"e\"), (\"l\", \"l\"), (\"o\", \" \")]\r\n\r\ntokenizer = LlamaTokenizer(vocab=vocab, merges=merges)\r\n```\r\n\r\nThis tokenizer will behave as a Llama-like tokenizer, with an updated vocabulary. This allows comparing different tokenizer classes with the same vocab; therefore enabling the comparison of different pre-tokenizers, normalizers, etc.\r\n\r\n⚠️ The `vocab_file` (as in, a path towards a file containing the vocabulary) cannot be used to initialize the `LlamaTokenizer` as loading from files is reserved to the `from_pretrained` method.\r\n\r\n#### 2. Simplified decoding API\r\n\r\nThe `batch_decode` and `decode` methods have been unified to reflect behavior of the `encode` method. Both single and batch decoding now use the same `decode` method. See an example of the new behavior below:\r\n\r\n```python\r\nfrom transformers import AutoTokenizer\r\ntokenizer = AutoTokenizer.from_pretrained(\"t5-small\") \r\ninputs = [\"hey how are you?\", \"fine\"]\r\ntokenizer.decode(tokenizer.encode(inputs))\r\n```\r\n\r\nGives:\r\n```diff\r\n- 'hey how are you?</s> fine</s>'\r\n+ ['hey how are you?</s>', 'fine</s>']\r\n```\r\n\r\nWe expect `encode` and `decode` to behave, as two sides of the same coin: `encode`, `process`, `decode`,  should work. \r\n\r\n> [!NOTE]\r\n> A common use-case would be: `encode`, `model.generate`, `decode`.  However, using `generate` would return `list[list[int]]`, which would then be incompatible with `decode`.\r\n\r\n#### 3. Unified encoding API\r\n\r\nThe `encode_plus` method is deprecated in favor of the single `__call__` method.\r\n\r\n#### 4. `apply_chat_template` returns `BatchEncoding`\r\n\r\nPreviously, `apply_chat_template` returned `input_ids` for backward compatibility. Starting with v5, it now consistently returns a `BatchEncoding` dict like other tokenizer methods.\r\n\r\n```python\r\n# v5\r\nmessages = [\r\n    {\"role\": \"user\", \"content\": \"Hello!\"},\r\n    {\"role\": \"assistant\", \"content\": \"Hi there!\"}\r\n]\r\n\r\n# Now returns BatchEncoding with input_ids, attention_mask, etc.\r\noutputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\")\r\nprint(outputs.keys())  # dict_keys(['input_ids', 'attention_mask'])\r\n```\r\n\r\n#### 5. Removed legacy configuration file saving:\r\n\r\nWe simplify the serialization of tokenization attributes:\r\n\r\n- `special_tokens_map.json` - special tokens are now stored in `tokenizer_config.json`.\r\n- `added_tokens.json` - added tokens are now stored in `tokenizer.json`.\r\n- `added_tokens_decoder` is only stored when there is no `tokenizer.json`.\r\n\r\nWhen loading older tokenizers, these files are still read for backward compatibility, but new saves use the consolidated format. We're gradually moving towards consolidating attributes to fewer files so that other libraries and implementations may depend on them more reliably.\r\n\r\n#### 6. Model-Specific Changes\r\n\r\nSeveral models that had identical tokenizers now import from their base implementation:\r\n\r\n- **LayoutLM** → uses BertTokenizer\r\n- **LED** → uses BartTokenizer  \r\n- **Longformer** → uses RobertaTokenizer\r\n- **LXMert** → uses BertTokenizer\r\n- **MT5** → uses T5Tokenizer\r\n- **MVP** → uses BartTokenizer\r\n\r\nThese modules will eventually be removed altogether.\r\n\r\n**Removed T5-specific workarounds**\r\n\r\nThe internal `_eventually_correct_t5_max_length` method has been removed. T5 tokenizers now handle max length consistently with other models.\r\n\r\n### Testing Changes\r\n\r\nA few testing changes specific to tokenizers have been applied:\r\n- Model-specific tokenization test files now focus on integration tests.\r\n- Common tokenization API tests (e.g., `add_tokens`, `encode`, `decode`) are now centralized and automatically applied across all tokenizers. This reduces test duplication and ensures consistent behavior\r\n\r\nFor legacy implementations, the original BERT Python tokenizer code (including `WhitespaceTokenizer`, `BasicTokenizer`, etc.) is preserved in `bert_legacy.py` for reference purposes.\r\n\r\n#### 7. Deprecated / Modified Features\r\n\r\n**Special Tokens Structure:**\r\n- `SpecialTokensMixin`: Merged into `PreTrainedTokenizerBase` to simplify the tokenizer architecture.\r\n- `special_tokens_map`: Now only stores named special token attributes (e.g., `bos_token`, `eos_token`). Use `extra_special_tokens` for additional special tokens (formerly `additional_special_tokens`). `all_special_tokens` includes both named and extra tokens.\r\n\r\n```python\r\n# v4\r\ntokenizer.special_tokens_map  # Included 'additional_special_tokens'\r\n\r\n# v5\r\ntokenizer.special_tokens_map  # Only named tokens\r\ntokenizer.extra_special_tokens  # Additional tokens\r\n```\r\n\r\n- `special_tokens_map_extended` and `all_special_tokens_extended`: Removed. Access `AddedToken` objects directly from `_special_tokens_map` or `_extra_special_tokens` if needed.\r\n- `additional_special_tokens`: Still accepted for backward compatibility but is automatically converted to `extra_special_tokens`.\r\n\r\n**Deprecated Methods:**\r\n- `sanitize_special_tokens()`: Already deprecated in v4, removed in v5.\r\n- `prepare_seq2seq_batch()`: Deprecated; use `__call__()` with `text_target` parameter instead.\r\n\r\n```python\r\n# v4\r\nmodel_inputs = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, max_length=128)\r\n\r\n# v5\r\nmodel_inputs = tokenizer(src_texts, text_target=tgt_texts, max_length=128, return_tensors=\"pt\")\r\nmodel_inputs[\"labels\"] = model_inputs.pop(\"input_ids_target\")\r\n```\r\n\r\n- `BatchEncoding.words()`: Deprecated; use `word_ids()` instead.\r\n\r\n**Removed Methods:**\r\n- `create_token_type_ids_from_sequences()`: Removed from base class. Subclasses that need custom token type ID creation should implement this method directly.\r\n- `clean_up_tokenization()`: Removed from base class. Now defined at model class level for models that need it (e.g., PLBart, CLVP, Wav2Vec2).\r\n- `prepare_for_model()`, `build_inputs_with_special_tokens()`, `truncate_sequences()`: Moved from `tokenization_utils_base.py` to `tokenization_python.py` for `PythonBackend` tokenizers. `TokenizersBackend` provides model-ready input via `tokenize()` and `encode()`, so these methods are no longer needed in the base class.\r\n- `_switch_to_input_mode()`, `_switch_to_target_mode()`, `as_target_tokenizer()`: Removed from base class. Use `__call__()` with `text_target` parameter instead.\r\n\r\n```python\r\n# v4\r\nwith tokenizer.as_target_tokenizer():\r\n    labels = tokenizer(tgt_texts, ...)\r\n\r\n# v5\r\nlabels = tokenizer(text_target=tgt_texts, ...)\r\n```\r\n\r\n- `parse_response()`: Removed from base class.\r\n\r\n## Disclaimers for the RC0\r\n\r\n### PEFT + MoE:\r\n\r\nBecause we are switching from the naive MOE (`nn.ModuleList` for experts) we currently have an issue with MoEs that have adapters. For more details see https://github.com/huggingface/transformers/issues/42491#issuecomment-3591485649. \r\n\r\n_We aim for this to be fixed and released in a following release candidate in the week that follows RC0._\r\n\r\n### Tensor parallel and Expert parallel + MoE\r\n\r\nWe are streamlining the MoE support with vLLM; while this is being implemented, tensor parallelism and expert parallelism aren't working as expected.\r\nThis is known and actively being worked on.\r\n\r\n_We aim for this to be fixed and released in a following release candidate in the week that follows RC0._\r\n\r\n### Custom pretrained models:\r\nFor anyone inheriting from a `transformers` `PreTrainedModel`, the weights are automatically initialized with the common scheme: \r\n```python\r\n\r\n    @torch.no_grad()\r\n    def _init_weights(self, module):\r\n        \"\"\"\r\n        Initialize the weights. This is quite general on purpose, in the spirit of what we usually do. For more complex\r\n        initialization scheme, it should be overridden by the derived `PreTrainedModel` class. In case a model adds an explicit\r\n        `nn.Parameter`, this method should also be overridden in order to initialize it correctly.\r\n        \"\"\"\r\n        if hasattr(self.config, \"initializer_range\"):\r\n            std = self.config.initializer_range or 0.02\r\n        elif hasattr(self.config, \"init_std\"):\r\n            std = self.config.init_std\r\n        elif hasattr(self.config, \"initializer_factor\"):\r\n            std = self.config.initializer_factor\r\n        else:\r\n            # 0.02 is the standard default value across the library\r\n            std = getattr(self.config.get_text_config(), \"initializer_range\", 0.02)\r\n\r\n        if isinstance(module, (nn.Linear, nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose1d, nn.ConvTranspose2d)):\r\n            if getattr(module, \"weight\", None) is not None:\r\n                init.normal_(module.weight, mean=0.0, std=std)\r\n            if getattr(module, \"bias\", None) is not None:\r\n                init.zeros_(module.bias)\r\n        elif isinstance(module, nn.Embedding):\r\n            if getattr(module, \"weight\", None) is not None:\r\n                init.normal_(module.weight, mean=0.0, std=std)\r\n                # Here we need the check explicitly, as we slice the weight in the `zeros_` call, so it looses the flag\r\n                if module.padding_idx is not None and not getattr(module.weight, \"_is_hf_initialized\", False):\r\n                    init.zeros_(module.weight[module.padding_idx])\r\n        elif isinstance(module, nn.MultiheadAttention):\r\n            # This uses torch's original init\r\n            module._reset_parameters()\r\n        # We cannot use `isinstance` on the RMSNorms or LayerNorms, as they usually are custom modules which change names\r\n        # between modelings (because they are prefixed with the model name)\r\n        elif (\r\n            isinstance(module, (nn.GroupNorm, nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d))\r\n            or \"LayerNorm\" in module.__class__.__name__\r\n            or \"RMSNorm\" in module.__class__.__name__\r\n        ):\r\n            # Norms can exist without weights (in which case they are None from torch primitives)\r\n            if hasattr(module, \"weight\") and module.weight is not None:\r\n                init.ones_(module.weight)\r\n            if hasattr(module, \"bias\") and module.bias is not None:\r\n                init.zeros_(module.bias)\r\n```\r\n\r\nIf you want to avoid that, for now you should just do:\r\n\r\n```python\r\nclass CustomModel(Qwen3VLForConditionalGeneration):\r\n    def __init__(self, *args, **kwargs):\r\n        super().__init__(*args, **kwargs)\r\n        self.action_head = nn.Linear(1024, 7)\r\n        self.positional_embedding = nn.Parameter(torch.randn(16, 1152))\r\n        self.post_init()\r\n    \r\n    def _init_weights(self, module):\r\n        pass \r\n\r\n```\r\nThere is a tracker for that here: https://github.com/huggingface/transformers/issues/42418.\r\n\r\n## Library-wide changes with lesser impact\r\n\r\n### `use_auth_token`\r\n\r\nThe `use_auth_token` argument/parameter is deprecated in favor of `token` everywhere.\r\nYou should be able to search and replace `use_auth_token` with `token` and get the same logic.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/41666\r\n\r\n### Attention-related features\r\n\r\nWe decided to remove some features for the upcoming v5 as they are currently only supported in a few old models and no longer integrated in current model additions. It's recommended to stick to v4.x in case you need them. Following features are affected:\r\n- No more head masking, see [#41076](https://github.com/huggingface/transformers/pull/41076). This feature allowed to turn off certain heads during the attention calculation and only worked for eager.\r\n- No more relative positional biases in Bert-like models, see [#41170](https://github.com/huggingface/transformers/pull/41170). This feature was introduced to allow relative position scores within attention calculations (similar to T5). However, this feature is barely used in official models and a lot of complexity instead. It also only worked with eager.\r\n- No more head pruning, see [#41417](https://github.com/huggingface/transformers/pull/41417) by @gante. As the name suggests, it allowed to prune heads within your attention layers.\r\n\r\n### Updates to supported torch APIs\r\n\r\nWe dropped support for two torch APIs:\r\n- `torchscript` in https://github.com/huggingface/transformers/pull/41688\r\n- `torch.fx` in https://github.com/huggingface/transformers/pull/41683\r\n\r\nThose APIs were deprecated by the PyTorch team, and we're instead focusing on the supported APIs `dynamo` and `export`.\r\n\r\n## Quantization changes\r\n\r\nWe clean up the quantization API in transformers, and significantly refactor the weight loading as highlighted\r\nabove.\r\n\r\nWe drop support for two quantization arguments that have been deprecated for some time:\r\n- `load_in_4bit`\r\n- `load_in_8bit`\r\n\r\nWe remove them in favor of the `quantization_config` argument which is much more complete. As an example, here is how\r\nyou would load a 4-bit bitsandbytes model using this argument:\r\n\r\n```python\r\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\r\n\r\nquantization_config = BitsAndBytesConfig(load_in_4bit=True)\r\n\r\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\r\n    \"meta-llama/Llama-3.2-3B\",\r\n    device_map=\"auto\",\r\n    quantization_config=quantization_config\r\n)\r\n```\r\n\r\n\r\n## Configuration\r\n\r\n- Methods to init a nested config such as `from_xxx_config` are deleted. Configs can be init from the `__init__` method in the same way. See [#41314](https://github.com/huggingface/transformers/pull/41314).\r\n- It is no longer possible to load a config class from a URL file. Configs must be loaded from either a local path or a repo on the Hub. See [#42383](https://github.com/huggingface/transformers/pull/42383).\r\n- All parameters for configuring model's rotary embedding are now stored under `mode.rope_parameters`, including the `rope_theta` and `rope_type`. Model's `config.rope_parameters` is a simple dictionaty in most cases, and can also be a nested dict in special cases (i.e. Gemma3 and ModernBert) with different rope parameterization for each layer type. Trying to get `config.rope_theta` will throw an attribute error from now on. See [#39847](https://github.com/huggingface/transformers/pull/39847) and [#42255](https://github.com/huggingface/transformers/pull/42255)\r\n- Qwen-VL family configuration is in a nested format and trying to access keys directly will throw an error (e.g. `config.vocab_size`). Users are expected to access keys from their respective sub-configs (`config.text_config.vocab_size`).\r\n- Configurations of non-generative models (any model that doesn't call `model.generate()`) will no longer have a `generation_config` and `model.config.generation_config` will throw an attribute error.\r\n\r\n## Processing\r\n\r\n### Tokenization\r\n\r\n- Slow tokenizer files (aka: `tokenization_<model>.py` ) will be removed in favor of using fast tokenizer files `tokenization_<model>_fast.py` --> will be renamed to `tokenization_<model>.py`.  As fast tokenizers are :hugs:`tokenizers` - backend, they include a wider range of features that are maintainable and reliable. \r\n- Other backends (sentence piece, tokenizers, etc.) will be supported with a light layer if loading a fast tokenizer fails\r\n- Remove legacy files like special_tokens_map.json and added_tokens.json\r\n- Remove _eventually_correct_t5_max_length \r\n- `encode_plus` --> `__call__`\r\n- `batch_decode` --> `decode`\r\n\r\n`apply_chat_template` by default returns naked `input_ids` rather than a `BatchEncoding` dict. \r\nThis was inconvenient - it should return a `BatchEncoding` dict like `tokenizer.__call__()`, but we were stuck with \r\nit for backward compatibility. The method now returns a `BatchEncoding`.\r\n\r\nLinked PRs: \r\n- https://github.com/huggingface/transformers/issues/40938\r\n- https://github.com/huggingface/transformers/pull/40936\r\n- https://github.com/huggingface/transformers/pull/41626\r\n\r\n### Processing classes\r\n\r\n- In processing classes each attribute will be serialized under `processor_config.json` as a nested dict, instead of serializing attributes in their own config files. Loading will be supported for all old format processors (https://github.com/huggingface/transformers/pull/41474)\r\n- `XXXFeatureExtractors` classes are completely removed in favor of `XXXImageProcessor` class for all vision models (https://github.com/huggingface/transformers/pull/41174)\r\n- Minor change: `XXXFastImageProcessorKwargs` is removed in favor of `XXXImageProcessorKwargs` which will be shared between fast and slow processors (https://github.com/huggingface/transformers/pull/40931)\r\n\r\n\r\n## Modeling\r\n\r\n- Some `RotaryEmbeddings` layers will start returning a dict of tuples, in case the model uses several RoPE configurations (Gemma2, ModernBert). Each value will be a tuple of \"cos, sin\" per RoPE type.\r\n- Config attribute for `RotaryEmbeddings` layer will be unified and accessed via `config.rope_parameters`. Config attr for `rope_theta` might not be accessible anymore for some models, and instead will be in `config.rope_parameters['rope_theta']`. BC will be supported for a while as much as possible, and in the near future we'll gradually move to the new RoPE format  (https://github.com/huggingface/transformers/pull/39847)\r\n- Vision Language models will not have a shortcut access to its language and vision component from the generative model via `model.language_model`. It is recommended to either access the module with `model.model.language_model` or `model.get_decoder()`. See [#42156](https://github.com/huggingface/transformers/pull/42156/)\r\n\r\n### Generate\r\n\r\n- Old, deprecated output type aliases were removed (e.g. `GreedySearchEncoderDecoderOutput`). We now only have 4 output classes built from the following matrix: decoder-only vs encoder-decoder, uses beams vs doesn't use beams (https://github.com/huggingface/transformers/pull/40998)\r\n- Removed deprecated classes regarding decoding methods that were moved to the Hub due to low usage (constraints and beam scores) (https://github.com/huggingface/transformers/pull/41223)\r\n- If `generate` doesn't receive any KV Cache argument, the default cache class used is now defined by the model (as opposed to always being `DynamicCache`) (https://github.com/huggingface/transformers/pull/41505)\r\n- Generation parameters are no longer accessible via model's config. If generation paramaters are serialized in `config.json` for any old model, it will be loaded back into model's generation config. Users are expected to access or modify generation parameters only with `model.generation_config.do_sample = True`. \r\n\r\n## Trainer\r\n\r\n### New Features\r\n\r\n* **ALST/Ulysses Sequence Parallelism Integration**\r\n  - Added sequence parallelism support via HF Accelerate for training with longer sequences. Enables splitting sequences across devices using ALST (All-to-All Long Sequence Training) and Ulysses algorithms with DeepSpeed.\r\n* **Improved `compute_loss_func` Handling**\r\n  - `compute_loss_func` now always takes priority over the model's built-in loss computation, giving users consistent control over custom loss functions.\r\n* **`num_items_in_batch` in Prediction Step**\r\n  - The `num_items_in_batch` argument is now passed to `compute_loss` during `prediction_step`, enabling proper loss scaling during evaluation.\r\n\r\n### Breaking Changes\r\n\r\n* **`report_to` now defaults to `\"none\"`**\r\n  - Logging integrations are no longer auto-detected by default; users must explicitly specify which reporting backends to use.\r\n\r\n### Removing arguments without deprecation cycle in `TrainingArguments` due to low usage\r\n\r\n- `mp_parameters` -> legacy param that was later on added to the Sagemaker trainer\r\n- `_n_gpu` -> not intended for users to set, we will initialize it correctly instead of putting it in the `TrainingArguments`\r\n- `overwrite_output_dir` - > replaced by `resume_from_checkpoint`, and it was only used in the examples script, no impact on Trainer. \r\n- `logging_dir` -> only used for tensorboard, set `TENSORBOARD_LOGGING_DIR` env var instead\r\n- `jit_mode_eval` -> use `use_torch_compile` instead, as torchscript is not recommended anymore\r\n- `tpu_num_cores`-> It is actually better to remove it, as it is not recommended to set the number of cores. By default, all TPU cores are used . Set `TPU_NUM_CORES` env var instead\r\n- `past_index` -> it was only used for a very small number of models that have special architecture like transformersxl + it was not documented at all how to train those models\r\n- `ray_scope` -> only for a minor arg for ray integration. Set `RAY_SCOPE` var env instead \r\n- `warmup_ratio` -> use `warmup_step` instead. We combined both args together by allowing passing float values in `warmup_step`. \r\n\r\n### Removing deprecated arguments in `TrainingArguments`\r\n\r\n- `fsdp_min_num_params` and `fsdp_transformer_layer_cls_to_wrap` -> use `fsdp_config`\r\n- `tpu_metrics_debug` -> `debug` \r\n- `push_to_hub_token` -> `hub_token`\r\n- `push_to_hub_model_id` and `push_to_hub_organization` -> `hub_model_id`\r\n- `include_inputs_for_metrics` -> `include_for_metrics`\r\n- `per_gpu_train_batch_size` -> `per_device_train_batch_size`\r\n- `per_gpu_eval_batch_size` -> `per_device_eval_batch_size`\r\n- `use_mps_device` -> mps will be used by default if detected\r\n- `fp16_backend` and `half_precision_backend` -> we will only rely on `torch.amp` as everything has been upstreamed to torch\r\n- `no_cuda` -> `use_cpu`\r\n- ` include_tokens_per_second` -> `include_num_input_tokens_seen`\r\n- `use_legacy_prediction_loop` -> we only use `evaluation_loop` function from now on\r\n\r\n### Removing deprecated arguments in `Trainer`\r\n\r\n- `tokenizer` in initialization -> `processing_class`\r\n- `model_path` in train() -> `resume_from_checkpoint`\r\n\r\n### Removed features for `Trainer`\r\n\r\n- sigpot integration for hp search was removed as the library was archived + the api stopped working\r\n- drop support for sagemaker API <1.10\r\n- bump accelerate minimum version to 1.1.0 \r\n- bump peft minimum version to 0.18.0\r\n- bump bitsandbytes minimum version to 0.46.1\r\n\r\n###  New defaults for `Trainer`\r\n\r\n- `use_cache` in the model config will be set to `False`. You can still change the cache value through `TrainingArguments` `usel_cache` argument if needed. \r\n\r\n## Pipeline\r\n\r\n- Image text to text pipelines will no longer accept images as a separate argument along with conversation chats. Image data has to be embedded in the chat's \"content\" field. See [#42359](https://github.com/huggingface/transformers/pull/42359)\r\n\r\n## PushToHubMixin\r\n\r\n- removed deprecated `organization` and `repo_url` from `PushToHubMixin`. You must pass a `repo_id` instead.\r\n- removed `ignore_metadata_errors` from `PushToMixin`. In practice if we ignore errors while loading the model card, we won't be able to push the card back to the Hub so it's better to fail early and not provide the option to fail later.\r\n- `push_to_hub` do not accept `**kwargs` anymore. All accepted parameters are explicitly documented.\r\n- arguments of `push_to_hub` are now keyword-only to avoid confusion. Only `repo_id` can be positional since it's the main arg.\r\n- removed `use_temp_dir` argument from `push_to_hub`. We now use a tmp dir in all cases.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42391.\r\n\r\n## CLI\r\n\r\nThe deprecated `transformers-cli ...` command was deprecated, `transformers ...` is now the only CLI entry point.\r\n\r\n`transformers` CLI has been migrated to `Typer`, making it easier to maintain + adding some nice features out of \r\nthe box (improved `--help` section, autocompletion).\r\n\r\nBiggest breaking change is in `transformers chat`. This command starts a terminal UI to interact with a chat model. \r\nIt used to also be able to start a Chat Completion server powered by `transformers` and chat with it. In this revamped \r\nversion, this feature has been removed in favor of `transformers serve`. The goal of splitting `transformers chat` \r\nand `transformers serve` is to define clear boundaries between client and server code. It helps with maintenance \r\nbut also makes the commands less bloated. The new signature of `transformers chat` is:\r\n\r\n```\r\nUsage: transformers chat [OPTIONS] BASE_URL MODEL_ID [GENERATE_FLAGS]...\r\n\r\nChat with a model from the command line.\r\n```\r\n\r\nIt works hand in hand with `transformers serve`, which means that if `transformers serve` is running on its default endpoint, `transformers chat` can be launched as follows:\r\n\r\n```sh\r\ntransformers chat HuggingFaceTB/SmolLM3-3B\r\n```\r\n\r\nIt can however use any OpenAI API compatible HTTP endpoint:\r\n\r\n```sh\r\ntransformers chat HuggingFaceTB/SmolLM3-3B https://router.huggingface.co/v1\r\n```\r\n\r\nLinked PRs: \r\n- https://github.com/huggingface/transformers/pull/40997\r\n- https://github.com/huggingface/transformers/pull/41487\r\n\r\n### Removal of the `run` method\r\n\r\nThe `transformers run` (previously `transformers-cli run`) is an artefact of the past, was not documented nor tested,\r\nand isn't part of any public documentation. We're removing it for now and ask you to please let us know in case\r\nthis is a method you are using; in which case we should bring it back with better support.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42447\r\n\r\n## Environment variables\r\n\r\n- Legacy environment variables like `TRANSFORMERS_CACHE`, `PYTORCH_TRANSFORMERS_CACHE`, and `PYTORCH_PRETRAINED_BERT_CACHE` have been removed. Please use `HF_HOME` instead.\r\n- Constants `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_PREFIX`, and `HUGGINGFACE_CO_RESOLVE_ENDPOINT` have been removed. Please use `huggingface_hub.constants.ENDPOINT` instead.\r\n\r\nLinked PR: https://github.com/huggingface/transformers/pull/42391.\r\n\r\n## Requirements update\r\n\r\n`transformers` v5 pins the `huggingface_hub` version to `>=1.0.0`. See this [migration guide](https://huggingface.co/docs/huggingface_hub/concepts/migration) to learn more about this major release. Here are to main aspects to know about:\r\n- switched the HTTP backend from `requests` to `httpx`. This change was made to improve performance and to support both synchronous and asynchronous requests the same way. If you are currently catching `requests.HTTPError` errors in your codebase, you'll need to switch to `httpx.HTTPError`.\r\n- related to 1., it is not possible to set proxies from your script. To handle proxies, you must set the `HTTP_PROXY` / `HTTPS_PROXY` environment variables\r\n- `hf_transfer` and therefore `HF_HUB_ENABLE_HF_TRANSFER` have been completed dropped in favor of `hf_xet`. This should be transparent for most users. Please let us know if you notice any downside!\r\n\r\n`typer-slim` has been added as required dependency, used to implement both `hf` and `transformers` CLIs.\r\n\r\n## New model additions in v5\r\n\r\n### CWM\r\n\r\n<img width=\"809\" height=\"471\" alt=\"image\" src=\"https://github.com/user-attachments/assets/58bb9c70-d481-48ed-ab8f-6553be7c240f\" />\r\n\r\nThe Code World Model (CWM) model was proposed in [CWM: An Open-Weights LLM for Research on Code Generation with World Models](https://ai.facebook.com/research/publications/cwm) by Meta FAIR CodeGen Team. CWM is an LLM for code generation and reasoning about code that has, in particular, been trained to better represent and reason about how code and commands affect the state of a program or system. Specifically, we mid-trained CWM on a large number of observation-action trajectories from Python execution traces and agentic interactions in containerized environments. We post-trained with extensive multi-task RL in verifiable coding, math, and multi-turn software engineering environments.\r\n\r\n* Add Code World Model (CWM)  by @jacobkahn in #41199\r\n\r\n### SAM3\r\n\r\n<img width=\"1505\" height=\"915\" alt=\"image\" src=\"https://github.com/user-attachments/assets/eec48633-f02b-464a-ae5c-c65473387e53\" />\r\n\r\nSAM3 (Segment Anything Model 3) was introduced in [SAM 3: Segment Anything with Concepts](https://ai.meta.com/research/publications/sam-3-segment-anything-with-concepts/).\r\n\r\nThe SAM3 addition adds four new architectures:\r\n- Sam3\r\n- Sam3Tracker\r\n- Sam3TrackerVideo\r\n- Sam3Video\r\n\r\nSAM3 performs Promptable Concept Segmentation (PCS) on images. PCS takes text and/or image exemplars as input (e.g., \"yellow school bus\"), and predicts instance and semantic masks for every single object matching the concept.\r\n\r\nSam3Tracker and Sam3TrackerVideo perform Promptable Visual Segmentation (PVS) on images. PVS takes interactive visual prompts (points, boxes, masks) or text inputs to segment a specific object instance per prompt. This is the task that SAM 1 and SAM 2 focused on, and SAM 3 improves upon it. Sam3Tracker and Sam3TrackerVideo are updated versions of SAM2 Video that maintain the same API while providing improved performance and capabilities.\r\n\r\nSAM3 Video performs Promptable Concept Segmentation (PCS) on videos. PCS takes text as input (e.g., \"yellow school bus\"), and predicts instance and semantic masks for every single object matching the concept, while preserving object identities across video frames. The model combines a detection module (SAM3) with a tracking module (SAM2-style tracker) to enable robust object tracking across video frames using text prompts.\r\n\r\n* Add SAM3 to 🤗 Transformers  by @yonigozlan in #42285\r\n\r\n### LFM2 MoE\r\n\r\n<img width=\"1080\" height=\"849\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a9fa1b81-114d-4054-9699-5083ac69d830\" />\r\n\r\nLFM2-MoE is a Mixture-of-Experts (MoE) variant of [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38). The LFM2 family is optimized for on-device inference by combining short‑range, input‑aware gated convolutions with grouped‑query attention (GQA) in a layout tuned to maximize quality under strict speed and memory constraints.\r\n\r\nLFM2‑MoE keeps this fast backbone and introduces sparse MoE feed‑forward networks to add representational capacity without significantly increasing the active compute path. The first LFM2-MoE release is LFM2-8B-A1B, with 8.3B total parameters and 1.5B active parameters. The model excels in quality (comparable to 3-4B dense models) and speed (faster than other 1.5B class models).\r\n\r\n* [Model] Lfm2Moe  by @paulpak58 in #41401\r\n\r\n### VideoLlama 3\r\n\r\n<img width=\"812\" height=\"366\" alt=\"image\" src=\"https://github.com/user-attachments/assets/21c82c6e-cf0a-4d6c-a707-b9e57663ca85\" />\r\n\r\nThe [VideoLLaMA3](https://huggingface.co/papers/2501.13106) model is a major update to [VideoLLaMA2](https://huggingface.co/papers/2406.07476) from Alibaba DAMO Academy.\r\n\r\n* [model] Add VideoLLaMA3 implementation  by @lkhl in #40499\r\n\r\n### AudioFlamingo 3\r\n\r\n<img width=\"621\" height=\"475\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c9616758-b3aa-41d0-bd58-695966ba146d\" />\r\n\r\nAudio Flamingo 3 (AF3) is a fully open large audio–language model designed for robust understanding and reasoning over speech, environmental sounds, and music. AF3 pairs a Whisper-style audio encoder with a causal language model and performs replace-in-place audio–text fusion: the processor aligns post-pool audio frames to a dedicated placeholder token and the model replaces those token slots with projected audio embeddings during the forward pass.\r\n\r\nThe model checkpoint is available at: [nvidia/audio-flamingo-3-hf](https://huggingface.co/nvidia/audio-flamingo-3-hf)\r\n\r\nHighlights:\r\n\r\n- Unified audio encoder across speech, sound, and music.\r\n- Long-audio support via windowing and post-pool alignment (up to 10 minutes maximum). The model processes audio in 30-second windows with a hard limit of 20 windows (10 minutes total). Audio longer than 10 minutes will be truncated.\r\n- Deterministic fusion that preserves sequence length by replacing audio placeholder tokens with audio embeddings.\r\n\r\n* [models] Add AudioFlamingo3 integration  by @lashahub in #40290\r\n\r\n### Nanochat\r\n\r\n[NanoChat](https://huggingface.co/karpathy/nanochat-d32) is a compact decoder-only transformer model designed for educational purposes and efficient training. The model features several fundamental architectural innovations which are common in modern transformer models. Therefore, it is a good model to use as a starting point to understand the principles of modern transformer models. NanoChat is a variant of the [Llama](https://huggingface.co/docs/transformers/en/model_doc/llama) architecture, with simplified attention mechanism and normalization layers.\r\n\r\n* [MODEL] Nanochat implementation  by @burtenshaw in #41634\r\n\r\n## Bugfixes and improvements\r\n\r\n* `JetMoe` Fix jetmoe after #40132  by @ArthurZucker in #41324\r\n* Fixed tiny incorrect import in `gemma3`  by @Sai-Suraj-27 in #41354\r\n* Rope for Qwen2--5-vl  by @zucchini-nlp in #41173\r\n* 🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence  by @Cyrilvallez in #41268\r\n* Standardize `PretrainedConfig` to `PreTrainedConfig`  by @Cyrilvallez in #41300\r\n* Fix trainer for py3.9  by @SunMarc in #41359\r\n* Check model inputs - hidden states  by @zucchini-nlp in #40994\r\n* [`ModularChecker`] QOL for the modular checker  by @ArthurZucker in #41361\r\n* Fixing a typo for BLT model  by @Narsil in #41325\r\n* :rotating_light: [`v5`] Remove relative position embeddings (for bert like models)  by @vasqu in #41170\r\n* Fix typo in model proposal template  by @Ombucha in #41352\r\n* Better typehints for `apply_chat_template`  by @Samoed in #41355\r\n* 🚨 Remove BetterTransformer  by @Cyrilvallez in #41367\r\n* [testing] update `test_longcat_generation_cpu`  by @ydshieh in #41368\r\n* Fix flash_attention.py: wrong argument passing for attn_implementation  by @TKONIY in #41347\r\n* Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939  by @sonianuj287 in #41284\r\n* Fixes in check_model_inputs, GPTBigCodeModel and ImageGPTModel  by @IlyasMoutawwakil in #40811\r\n* Remove unnecessary list comprehension  by @cyyever in #41305\r\n* make some ut cases pass on xpu w/ latest torch  by @yao-matrix in #41337\r\n* Remove unused function patameters  by @cyyever in #41358\r\n* [`CB`] Refactors the way we access paged  by @ArthurZucker in #41370\r\n* serve: add non-streaming mode to /v1/responses; stream event parity; remove placeholder logprobs  by @antznette1 in #41353\r\n* Update from pretrained error when loading  by @ArthurZucker in #33380\r\n* [`v5`] Sync Bert and Bart eager attention  by @vasqu in #41248\r\n* fix asr ut failures  by @yao-matrix in #41332\r\n* fix resample in asr pipeline  by @yhzx233 in #41298\r\n* Correct numerical regression in vision embeddings  by @i3hz in #41374\r\n* [kernels] Kernel Config   by @MekkCyber in #41232\r\n* [Cache] lfm2 cache: allocate empty kv layers during init  by @paulpak58 in #41396\r\n* Fix test for model with dotted name and relative imports  by @st81 in #41343\r\n* Prefer raising `TypeError` exception for invalid type  by @Sai-Suraj-27 in #41346\r\n* [v5] Bump accelerate to 1.1.0   by @SunMarc in #41234\r\n* Fix incorrect assignment in `update_device_map` for GPTQ quantizer  by @Sai-Suraj-27 in #41328\r\n* [v5] Delete left traces of feature extractor  by @zucchini-nlp in #41321\r\n* Remove deprecation warning  by @Cyrilvallez in #41425\r\n* Fix overriding common_kwargs defaults in processor calls  by @yonigozlan in #41381\r\n* v5 dev version  by @LysandreJik in #41436\r\n* Tiny Cleanup - Removed duplicate class field definition's  by @Sai-Suraj-27 in #41293\r\n* 🚨🚨 Remove all traces of legacy cache format  by @Cyrilvallez in #41378\r\n* 🚨 [v5] Prune `prune_heads`  by @gante in #41417\r\n* [v5] Bump min version of bitsandbytes to 0.46.1   by @SunMarc in #41283\r\n* Fixing comments in __init__ file  by @MekkCyber in #41414\r\n* Use accelerator API to free device memory  by @cyyever in #41195\r\n* enable new model uts to xpu and fix some failures on xpu  by @yao-matrix in #41386\r\n* [torchao] Add regex support for ModuleFqnToConfig  by @jerryzh168 in #41242\r\n* :facepalm: CB nit!   by @ArthurZucker in #41413\r\n* Remove Python 3.9 classifier  by @cyyever in #41410\r\n* [`JetMoe`] Fix KV head repetition and padding free  by @vasqu in #41423\r\n* [testing] Fix `JetMoeIntegrationTest`  by @ydshieh in #41377\r\n* Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation  by @ErfanBaghaei in #40837\r\n* Validate processing kwargs with @strict from huggingface_hub   by @zucchini-nlp in #40793\r\n* Update hqq.md  by @prathamesh-chavan-22 in #41452\r\n* enable some falcon-mamba uts on xpu  by @yao-matrix in #41428\r\n* Fix generate outputs and simplify cache tests  by @Cyrilvallez in #41440\r\n* Fix doc  by @Cyrilvallez in #41457\r\n* 🚨 [v5] Rename left traces of `past_key_value` in BERT-like models  by @zucchini-nlp in #41448\r\n* Subconfig is a class attribute  by @zucchini-nlp in #41308\r\n* [v5] rm `utils/tf_ops/`  by @gante in #41402\r\n* Update GLM-4.1V MMRope implementation  by @zRzRzRzRzRzRzR in #41182\r\n* [kernels] Cleanup deta kernel  by @MekkCyber in #41470\r\n* 🚨 [v5] Rendundant code in nested configs  by @zucchini-nlp in #41314\r\n* Remove KERAS_NLP_IMPORT_ERROR  by @cyyever in #41468\r\n* Fix auto model configuration for encoder of perceptionlm  by @fschlatt in #41464\r\n* Fix tests fsdp  by @SunMarc in #41422\r\n* Import Callable from collections.abc  by @cyyever in #41130\r\n* Pickle - part 2  by @ydshieh in #41476\r\n* Remove infer_device  by @cyyever in #41088\r\n* Change RT-Detr docs to reflect fixed 640x640 input size  by @konstantinos-p in #41364\r\n* Cleaning hub kernels   by @MekkCyber in #41477\r\n* [v5] remove load_in_4bit and load_in_8bit  by @SunMarc in #41287\r\n* :rotating_light: [`Attention Masks`] Bidirectional masks for encoder and encoder-decoder models  by @vasqu in #41265\r\n* [Fix] Fix test file error  by @YangKai0616 in #40973\r\n* enhance patched_tearDown to support python 3.11+  by @yao-matrix in #41429\r\n* RT-Detr correct 2d positional embeddings for non-square images  by @konstantinos-p in #41380\r\n* Fix bnb fsdp loading for pre-quantized checkpoint  by @SunMarc in #41415\r\n* Remove SigOpt  by @SunMarc in #41479\r\n* Remove `past_index`  by @SunMarc in #41384\r\n* Remove deprecated args in Trainer for v5  by @SunMarc in #41404\r\n* Update GLM-4.6 doc  by @zRzRzRzRzRzRzR in #41471\r\n* `report_to` default changed to \"none\" + cleaning deprecated env var  by @SunMarc in #41375\r\n* deprecate `overwrite_output_dir`  by @SunMarc in #41323\r\n* [`CI`] Fix copies on main  by @vasqu in #41486\r\n* [Trainer] deprecate ray scope  by @SunMarc in #41403\r\n* deprecate `jit_mode_eval`  by @SunMarc in #41376\r\n* Remove `local_rank` arg from `TrainingArguments`  by @SunMarc in #41382\r\n* Update philosophy  by @molbap in #41438\r\n* Remove DISABLE_KERNEL_MAPPING flag  by @MekkCyber in #41475\r\n* Streaming should be handled at the request-level rather than at the istance level  by @LysandreJik in #41444\r\n* fix bnb model loading  by @jiqing-feng in #41499\r\n* [kernels] Remove RWKV kernel finally !  by @MekkCyber in #41493\r\n* [kernels] rm yoso kernel  by @MekkCyber in #41495\r\n* Try to remove `pickle` - `BloomTokenizerFast`  by @ydshieh in #41466\r\n* Fixed tiny incorrect imports in `glm4v`  by @Sai-Suraj-27 in #41483\r\n* [Parakeet] unnecessary warning & auto mapping  by @eustlb in #41412\r\n* [causallm tester] automate pipeline mappings + bloom tests  by @gante in #41318\r\n* Fix some tests  by @Cyrilvallez in #41503\r\n* fix gemma3n case failure  by @yao-matrix in #41426\r\n* [voxtral] language detection + skipping lang:xx  by @eustlb in #41225\r\n* Set `truncation` to `False` in Qwen3Omni to avoid default truncation  by @BakerBunker in #41473\r\n* [QoL] modular conversion shows LoC saved  by @molbap in #41500\r\n* More trainer cleaning   by @SunMarc in #41489\r\n* Bump to hfh 1.0.0.rc5 to fix test  by @Wauplin in #41508\r\n* Revert `local_rank` deletion and some cleaning  by @SunMarc in #41504\r\n* Fix detectron2 import  by @Cyrilvallez in #41510\r\n* add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc  by @benkeene in #41484\r\n* Remove outdated flags  by @Cyrilvallez in #41512\r\n* remove `tpu_num_cores`  by @SunMarc in #41383\r\n* Allow optuna's catch kwargs passthrough  by @nicha-api in #41496\r\n* Fix Latex typesetting in documentation  by @cyyever in #41177\r\n* [testing] reduce runtime of `HunYuanMoEV1IntegrationTest:test_model_generation`  by @ydshieh in #41373\r\n* [Qwen3VL] fix: hidden_states in place modification error  by @HollowMan6 in #41535\r\n* Add MLlama fast image processor  by @yonigozlan in #41391\r\n* Fixed Type-hints in function defintions  by @Sai-Suraj-27 in #41525\r\n* [SAM] Fix typing hints   by @zucchini-nlp in #41506\r\n* Restore cuda graphs to continuous batching  by @remi-or in #41421\r\n* Add AMD developer cloud support  by @fan-amd in #41126\r\n* Enable modular files from other libraries  by @regisss in #41372\r\n* 🚨 [v5] `generate` delegates default cache initialization to the model  by @gante in #41505\r\n* Fixed typos and formatting  by @julian-st in #34215\r\n* Add VideoMAE video processor   by @Aki-07 in #41534\r\n* [`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff  by @ArthurZucker in #41445\r\n* Remove references to AutoModelForVision2Seq  by @Rocketknight1 in #41513\r\n* [Qwen3VL] fix device mismatch error for FSDP2 training  by @HollowMan6 in #41536\r\n* Patch MistralCommonTokenizer  by @juliendenize in #41439\r\n* Fix an import error with PreTrainModel  by @remi-or in #41571\r\n* [Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting  by @danielquintas8 in #41420\r\n* [kernels] rm mra kernels  by @MekkCyber in #41507\r\n* delete some tokenizer tests using pickle  by @ydshieh in #41514\r\n* Add DINOv3Backbone for ConvNext variant  by @merveenoyan in #40651\r\n* Add conditional checks to _check_and_adjust_attn_implementation()  by @zheliuyu in #41542\r\n* add rmsnorm kernels support for Intel XPU  by @kaixuanliu in #41563\r\n* Revert \"add rmsnorm kernels support for Intel XPU\"  by @MekkCyber in #41579\r\n* [VisionEncoderDecoderModel] Update loss function  by @NielsRogge in #40863\r\n* Add __iter__ to DynamicCache  by @remi-or in #41569\r\n* Revert some breaking changes bnb  by @SunMarc in #41581\r\n* Fix typsetting and content of llm_tutorial_optimization.md  by @cyyever in #41172\r\n* Gemma3 fixes  by @remi-or in #41572\r\n* Benchmark overhaul  by @remi-or in #41408\r\n* Enable non-streaming mode in `transformers serve`  by @LysandreJik in #41446\r\n* [device_map] Accelerate loading by computing device_map much faster  by @Cyrilvallez in #41548\r\n* Add `logits_to_keep` to many older CausalLM models  by @philiproeleveld in #41335\r\n* fix some case failures lead by \"`torch.compile` recompiled part of th…  by @sywangyi in #41558\r\n* remove ray_scope and check_quantized_param  by @SunMarc in #41587\r\n* Update issue template   by @SunMarc in #41573\r\n* [`Docs`] Fix changed references  by @vasqu in #41614\r\n* Import `expand_device_map` instead of redefining it  by @Cyrilvallez in #41608\r\n* Fix trainer simple tests  by @SunMarc in #41449\r\n* More markdown file fixes  by @cyyever in #41599\r\n* torch 2.9 don't ❤️ torchcodec 💔   by @ydshieh in #41610\r\n* Update a dataset reop link  by @ydshieh in #41618\r\n* Add fast path for bidirectional mask creation to fix regression  by @i3hz in #41586\r\n* enable sdpa enable gqa logic for Ascend NPU  by @FightingZhen in #41601\r\n* Fix video processing channel format  by @zucchini-nlp in #41603\r\n* [chat template] update when \"push_to_hub\"  by @zucchini-nlp in #39815\r\n* Remove the head masking block in some vision models  by @ydshieh in #41620\r\n* Remove deprecated code  by @SunMarc in #41616\r\n* Fix quantization base class   by @SunMarc in #41613\r\n* [docs] Duplicate entry  by @stevhliu in #41591\r\n* Update executorch.md  by @jackzhxng in #41582\r\n* Add Backbone API fine-tuning tutorial  by @merveenoyan in #41590\r\n* 🚨 [v5] Toggle the serialization format in processors  by @zucchini-nlp in #41474\r\n* Add aux loss for GLM-4.5V  by @zRzRzRzRzRzRzR in #41564\r\n* Allow passing `tp_plan` in `from_pretrained` directly  by @Cyrilvallez in #41435\r\n* Fix tokenization test  by @Cyrilvallez in #41649\r\n* Remove randomly added script  by @Cyrilvallez in #41650\r\n* Add missing dates to docs  by @yonigozlan in #41576\r\n* Migrate transformers cli to Typer  by @Wauplin in #41487\r\n* Fix FP-Quant quantization fallback CPU dispatch.  by @BlackSamorez in #41619\r\n* fix check inputs for text2text pipeline  by @jiqing-feng in #41556\r\n* [`Executorch`] Simplify for encoder models  by @vasqu in #41627\r\n* [`Ernie 4.5 Moe`] Fix Moe and offloading  by @vasqu in #41385\r\n* [CI] Build translated docs  by @stevhliu in #41632\r\n* Fix fp32_ln for various models  by @remi-or in #41605\r\n* Adjust device logging level and add minor fixes  by @mario-koddenbrock in #41636\r\n* Fix EncoderDecoder cache  by @remi-or in #41612\r\n* Format MarkDown documentation and tiny fixes  by @cyyever in #41638\r\n* Fix typos in documentation  by @cyyever in #41641\r\n* Fix confusing cls assignment  by @cyyever in #41642\r\n* Double router compute?  by @molbap in #41653\r\n* [kernels] refactor function kernel calling  by @MekkCyber in #41577\r\n* [Fix] Deepseek V3 expert bias routing  by @fjosw in #41647\r\n* purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet  by @Vaibhavs10 in #41656\r\n* [`Masks`] Fix mask handling in eager for vision models  by @vasqu in #41625\r\n* Use | for Optional and Union typing  by @cyyever in #41646\r\n* Switch to CB if cache_implementation == paged  by @remi-or in #41655\r\n* Add in-out modalities as class attribute per model  by @zucchini-nlp in #41366\r\n* Fix dtype casting with quantization  by @Cyrilvallez in #41665\r\n* Fix serving continuous batching  by @SunMarc in #41624\r\n* Small changes to benchmarking script  by @remi-or in #41662\r\n* Improve package version check  by @Cyrilvallez in #41661\r\n* improve `utils/check_bad_commit.py`  by @ydshieh in #41658\r\n* Erroring when KernelConfig is passed without use_kernels = True  by @MekkCyber in #41657\r\n* [Trainer] [Breaking change] `use_cache` default to `False`  by @SunMarc in #41585\r\n* 🌐 [i18n-KO] Translated `chat_extras.md` to Korean  by @Judy-Choi in #39863\r\n* 🌐 [i18n-KO] Translated sam_hq.md to Korean  by @HyunZ118 in #41340\r\n* [i18n-KO] Translated `big_bird.md` to Korean  by @ssum21 in #40445\r\n* 🌐 [i18n-KO] Translated `code_llama.md` to Korean  by @Judy-Choi in #40558\r\n* 🌐 [i18n-KO] Translated llama4.md to Korean  by @TaskerJang in #40396\r\n* :globe_with_meridians: [i18n-KO] Translated `ko-LFM2.md` to Korean  by @ssum21 in #41502\r\n* Adding superglue fast image processing  by @AlphaOrOmega in #41394\r\n* Fix ckpt in docs  by @zucchini-nlp in #41659\r\n* torch 2.9 still don't ❤️ torchcodec 0.8 💔  by @ydshieh in #41686\r\n* Remove deprecated `use_auth_token` parameter  by @Wauplin in #41666\r\n* Remove  require_torch_bf16_gpu  by @cyyever in #40979\r\n* path validation for security reason  by @ydshieh in #41256\r\n* 🚨 Remove torchscript support  by @Cyrilvallez in #41688\r\n* Fix MarkDown syntax  by @cyyever in #41676\r\n* Use | for Optional and Union typing   by @cyyever in #41675\r\n* 🚨 [v5] Refactor RoPE for layer types  by @zucchini-nlp in #39847\r\n* Enable faiss-cpu on Windows  by @cyyever in #41678\r\n* Fix Pylint warnings  by @cyyever in #41644\r\n* 🚨 Remove torch.fx support  by @Cyrilvallez in #41683\r\n* Remove skipped tests without parents  by @Cyrilvallez in #41691\r\n* Enable  FURB rules in ruff  by @cyyever in #41395\r\n* Remove upper version bound of pandas  by @cyyever in #41677\r\n* [`Attn`] Allow dynamic causality in SDPA via Kwargs  by @vasqu in #41692\r\n* Simplify GQA conditions in sdpa_attention.py  by @justinchuby in #41699\r\n* [docs] Manual tp-plan  by @stevhliu in #41674\r\n* 🌐 [i18n-KO] Translated gemma3n.md to Korean  by @HyunZ118 in #40873\r\n* pin torchcodec on CI docker image  by @ydshieh in #41703\r\n* Update `run_name` docs in TrainingArguments  by @tobiasofsn in #41705\r\n* further improve `utils/check_bad_commit.py`  by @ydshieh in #41658) \r\n* feat: add benchmark v2 ci with results pushed to dataset  by @McPatate in #41672\r\n* Gemma3 conversion script maintenance  by @RyanMullins in #41704\r\n* Fix Qwen3-Omni inference when mixing video and image inputs in one batch  by @BakerBunker in #41741\r\n* Fix typo in LFM-VL  by @zucchini-nlp in #41742\r\n* Revert \"Remove upper version bound of pandas\"  by @ydshieh in #41744\r\n* [doc] remove broken notebooks on AMD Dev Cloud  by @pagezyhf in #41743\r\n* Update type hints in tokenization_utils.py to use | syntax  by @faizan842 in #41713\r\n* Fix documentation issues  by @cyyever in #41726\r\n* Apply RUFF PIE rules  by @cyyever in #41727\r\n* Small Fix for imports   by @MekkCyber in #41411\r\n* Docs(zh-hans): Refine wording for professionalism in README  by @Ri-Nai in #40943\r\n* Add vision contribution guide  by @molbap in #41456\r\n* upgrade xpu docker file to torch 2.8  by @yao-matrix in #41551\r\n* [v5] Delete `videos` from image processing classes   by @zucchini-nlp in #41607\r\n* Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again  by @i3hz in #41758\r\n* [kernels] Add version to function mapping  by @MekkCyber in #41685\r\n* Reduce warning noise caused by Tensor.new_tensor  by @st81 in #41748\r\n* Fix graphormer model compilation with Cython 3.1.4  by @alexmalyshev in #41671\r\n* Update type hints in modeling_rope_utils.py to use | syntax  by @faizan842 in #41714\r\n* [v5] Remove deprecated tranformers.onnx  by @echarlaix in #41700\r\n* Modernize CLIP modeling code   by @molbap in #41546\r\n* Simplify pipeline padding logic  by @Rocketknight1 in #41667\r\n* Chat response parsing  by @Rocketknight1 in #40894\r\n* Add LightGlue fast image processor  by @yonigozlan in #41670\r\n* Fix bark after #41445  by @ydshieh in #41645\r\n* Remove invalid `@staticmethod` from module-level get_device_and_memory_breakdown  by @albertvillanova in #41747\r\n* Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular  by @albertvillanova in #41757\r\n* fix: Gemma 3 weights conversion vision and multimodal projector paths  by @RyanMullins in #41767\r\n* [v5] Delete legacy chat template saving  by @zucchini-nlp in #41648\r\n* [quantization] fix compressed_tensors tests  by @MekkCyber in #41780\r\n* [quantization] Skip Fp8 tests when hardware capability < 8.9  by @MekkCyber in #41785\r\n* Swap columns and rows of the grid layout in LFM2-VL  by @ankke in #41755\r\n* fix type annotation typo in docstring  by @johntheprime in #41788\r\n* Fix chat schema tests  by @Rocketknight1 in #41793\r\n* Fix attention mask in mamba layers  by @zucchini-nlp in #41790\r\n* [quantization] fix torchao tests after 0.14.0 release  by @MekkCyber in #41777\r\n* [`Onnx docs`] Remove some traces  by @vasqu in #41791\r\n* flash attn pytest marker  by @ydshieh in #41781\r\n* Bump AMD docker  by @remi-or in #41792\r\n* make apollo test case pass  by @yao-matrix in #41805\r\n* Add a safeguard around a flaky test in gemma2  by @remi-or in #41811\r\n* Fix Qwen3Next dtype API usage  by @SrijanUpadhyay in #41735\r\n* [Trainer] remove env vars   by @SunMarc in #41697\r\n* Fixed grammar mistakes  by @FrogWarlord in #41799\r\n* Fixed some grammar mistakes  by @FrogWarlord in #41802\r\n* transformers cli default flag fix  by @ArjunPimpale in #41761\r\n* Deprecate warmup_ratio  by @SunMarc in #41326\r\n* transformers serve quantization docs + some api fixes for bitsandbytes  by @SunMarc in #41253\r\n* [Parakeet] add output_attention_mask  by @eustlb in #41694\r\n* unpin torch/torchcodec for CircleCI  by @ydshieh in #41839\r\n* extend bitnet cases to xpu, all 8 cases pass  by @yao-matrix in #41831\r\n* extend 2 trainer test cases to xpu  by @yao-matrix in #41829\r\n* extend 2 blip2 and falcon_h1 test cases to xpu  by @yao-matrix in #41825\r\n* further reducing flakiness in `utils/check_bad_commit.py`  by @ydshieh in #41658)  \r\n* Remove redundant code from Qwen3VLProcessor  by @Xqle in #41836\r\n* Fix MXFP4 quantizer to support variable num_local_experts and hidden_size  by @marksverdhei in #41795\r\n* Fix Qwen2Audio flash attention mask format for generation  by @Abdennacer-Badaoui in #41843\r\n* Fix const parsing for dict inputs in chat schemas  by @Rocketknight1 in #41824\r\n* Share embedding modules in BART, not only weights  by @githubnemo in #41821\r\n* Fix TypeError: find_adapter_config_file() got an unexpected keyword argument '_adapter_model_path'  by @albertvillanova in #41604\r\n* :rotating_light: [`Clip`] Fix masking and enable flash attention on all model types  by @vasqu in #41750\r\n* CI workflow for Flash Attn  by @ydshieh in #41857\r\n* Fix torch.no_grad decorator in VLMS  by @yaswanth19 in #41888\r\n* Fix installation cmds in docs  by @yaswanth19 in #41887\r\n* revert changes in _is_package_available  by @MekkCyber in #41891\r\n* make lfm2_moe integration test pass on XPU  by @yao-matrix in #41796\r\n* Fix: avoid duplicate token in maybe_load_adapters  by @luaenrique in #41903\r\n* speed up loading checkpoints for zero stage 3  by @ri938 in #41850\r\n* evaluate>=0.4.6 is needed  by @stas00 in #41920\r\n* Add 6 huggingface notebooks on AMD dev cloud  by @fan-amd in #41883\r\n* Fix invalid examples in QwenVL model docstrings and add Qwen3VL example  by @Xqle in #41812\r\n* Allow parse_response to accept token IDs  by @Rocketknight1 in #41849\r\n* Fix Florence2 conversion script model_type KeyError  by @i3hz in #41866\r\n* Update some workflow files  by @ydshieh in #41892\r\n* fix some ut failures on XPU w/ torch 2.9  by @yao-matrix in #41923\r\n* Cache latest pytorch amd image locally on mi325 CI runner cluster  by @jitesh-gupta in #41926\r\n* Minor fix in docker image build workflow  by @ydshieh in #41949\r\n* fix some ut failures on XPU w/ torch 2.9  by @yao-matrix in #41941\r\n* Fix rope_parameters for gemma3 weights conversion script  by @douglas-reid in #41922\r\n* Fix: Gemma3TextConfig rope scaling assignments  by @RyanMullins in #41934\r\n* fix prepare_config_and_inputs_for_common bug in llava test  by @yao-matrix in #41942\r\n* Fix: prevent .gitignore truncation in run_clm_no_trainer.py  by @luaenrique in #41957\r\n* V4.57.1 training ci: Refactor `test_tensor_parallel.py`  by @3outeille in #41918\r\n* [v5] Return a BatchEncoding dict from apply_chat_template by default  by @Rocketknight1 in #41626\r\n* make recurrent_gemma and voxtral cases pass on xpu  by @yao-matrix in #41958\r\n* Fix typo in image_processing_lfm2_vl_fast  by @yonigozlan in #41940\r\n* Run slow v2  by @ydshieh in #41914\r\n* Fix `detectron2` installation in docker files  by @ydshieh in #41975\r\n* Fix `autoawq[kernels]` installation in quantization docker file  by @ydshieh in #41978\r\n* add support for saving encoder only so any parakeet model can be loaded for inference  by @nithinraok in #41969\r\n* Use indices as position_ids in modernebert  by @remi-or in #41789\r\n* test tensor parallel: make tests for dense model more robust  by @3outeille in #41968\r\n* fix: dict[RopeParameters] to dict[str, RopeParameters]  by @RyanMullins in #41963\r\n* docs: add continuous batching page  by @McPatate in #41847\r\n* Fix `torchcodec` version in quantization docker file  by @ydshieh in #41988\r\n* [kernels] Add Tests & CI for kernels  by @MekkCyber in #41765\r\n* Move the Mi355 to regular docker  by @remi-or in #41989\r\n* More data in benchmarking  by @remi-or in #41848\r\n* fix (CI): Refactor SSH runners  by @glegendre01 in #41991\r\n* fix 3 failed test cases for video_llama_3 model on Intel XPU  by @kaixuanliu in #41931\r\n* Integrate colqwen2.5 using colqwen2 modelling code  by @sahil-kabir in #40600\r\n* Fixed wrong padding value in OWLv2  by @gjamesgoenawan in #41938\r\n* Fix `run slow v2`: empty report when there is only one model  by @ydshieh in #42002\r\n* [kernels] change import time in KernelConfig  by @MekkCyber in #42004\r\n* DOC Fix typo in argument name: pseudoquant  by @BenjaminBossan in #41994\r\n* Fix `torch+deepspeed` docker file  by @ydshieh in #41985\r\n* Correct syntax error in trainer.md  by @Yacklin in #42001\r\n* Reduce the number of benchmark in the CI  by @remi-or in #42008\r\n* Fix continuous batching tests  by @Rocketknight1 in #42012\r\n* add back `logging_dir`  by @SunMarc in #42013\r\n* Fix issue with from pretrained and kwargs in image processors  by @yonigozlan in #41997\r\n* Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors  by @MilkClouds in #41871\r\n* Add GLPNImageProcessorFast   by @Aravind-11 in #41725\r\n* add fuyu fast image processors  by @DeXtAr47-oss in #41817\r\n* [kernels] Fix XPU layernorm kernel  by @MekkCyber in #41583\r\n* [v5] Deprecate Text2Text and related pipelines  by @Rocketknight1 in #41996\r\n* [FPQuant] MXFP8 and MXFP4 backwards support  by @BlackSamorez in #41897\r\n* fix `deeepspeed` in AMD docker file  by @ydshieh in #42025\r\n* CodeQL workflow for security analysis  by @paulinebm in #42015\r\n* [tests] Add Context-parallel CI tests  by @kashif in #41860\r\n* extend fp_quant cases to xpu  by @yao-matrix in #41833\r\n* Change trigger time for AMD CI  by @ydshieh in #42034\r\n* Fix the order of methods in processor loading  by @zucchini-nlp in #42031\r\n* 🔴  Isolate prefill from generation loops  by @manueldeprada in #40652\r\n* update `huggingface_hub` dependency version  by @hanouticelina in #42033\r\n* Remove some custom datasets defined in codebase  by @ydshieh in #41511\r\n* Cleanup workflow - part 1  by @ydshieh in #42023\r\n* Fix `pr_slow_ci_suggestion.yml` after #42023  by @ydshieh in #42049\r\n* Fix AutoImageProcessor.register and documentation in auto processing modules  by @MilkClouds in #41864\r\n* Fix Qwen3-Omni RoPE  by @zucchini-nlp in #41778\r\n* Avoid explicit checkout in workflow  by @ydshieh in #42057\r\n* Annoying typo in attention error message  by @manueldeprada in #42037\r\n* Be careful at explicit checkout actions  by @ydshieh in #42060\r\n* Fix another `Argument list too long` in `pr_slow_ci_suggestion.yml`  by @ydshieh in #42061\r\n* Fix KeyError in GPT-OSS weight conversion script  by @Aznix07 in #42007\r\n* Fix KeyError in _is_package_available for packages with dotted names  by @yashwantbezawada in #42050\r\n* Revert back to use GitHub context   by @ydshieh in #42066\r\n* Fix missing arg in check_docstring  by @yonigozlan in #42054\r\n* [deepspeed tests fixes]  by @stas00 in #41925\r\n* Fix logic in setting self.fsdp when it is False  by @roychan in #41974\r\n* fix tensor device placement issue of 2 UT cases  by @yao-matrix in #41921\r\n* add workflow to check permissions and advise a set of permissions req…  by @paulinebm in #42071\r\n* Fix security issue 5  by @paulinebm in #42072\r\n* Fix inconsistency of commit sha during the workflow run  by @ydshieh in #42074\r\n* QwenVL: add skipped keys in `setattr` as well  by @zucchini-nlp in #41808\r\n* permissions worflows fix  by @paulinebm in #42080\r\n* 4.1V Model and GLM-4.5V Model Conversion Code Updates  by @zRzRzRzRzRzRzR in #41784\r\n* feat(ci): add continuous batching to benchmarks  by @McPatate in #41916\r\n* Fix modular docstring for Mixtral  by @diegoakel in #42041\r\n* Fix Auto classes to support dynamically registered processors  by @MilkClouds in #41865\r\n* Reinstate self.scaling in Gemma3nTextAttention  by @RyanMullins in #41751\r\n* [v5] 🚨Refactor subprocessors handling in processors  by @yonigozlan in #41633\r\n* add xpu support in test_modeling_janus.py::JanusIntegrationTest::test…  by @sywangyi in #41986\r\n* Revert \"permissions worflows fix\"  by @ydshieh in #42110\r\n* Fix return metadata checking logic  by @Xqle in #42108\r\n* Correctly handle unbatched audio inputs in Gemma3nAudioFeatureExtractor  by @kho in #42076\r\n* [Bugfix] fix qwen3vl expand generation with video  by @JJJYmmm in #42089\r\n* Fix base model prefix in VLMs  by @zucchini-nlp in #42059\r\n* fix continuous batching issues, extend ut cases to xpu  by @yao-matrix in #41830\r\n* 📝 docs(smolvlm): fix variable name in batch inference example  by @gorkachea in #42123\r\n* fix qwen2vl/qwen3vl video processor temporal padding when num_frames%temporal_patch_size!=1  by @yaogang2060 in #42083\r\n* [`Attn Masks`] Non-vmap default for attention masks  by @vasqu in #41852\r\n* Fix GPT-2 Flash Attention 2 generation with left-padding  by @Abdennacer-Badaoui in #41966\r\n* Fix model name test for compressed tensors   by @SunMarc in #42128\r\n* Fix MaskFormer/Mask2Former fast image processors  by @yonigozlan in #41393\r\n* Remove unused functions in `image_transforms.py`  by @yaswanth19 in #42044\r\n* update deps table  by @ArthurZucker in #42120\r\n* fix: improve video processing fps assignment logic  by @Xqle in #42009\r\n* Fix T5Gemma module structure  by @Cyrilvallez in #42145\r\n* DataCollatorForLanguageModeling warning error fixed  by @mjaliz in #42144\r\n* Bugfix/remove emojis from print  by @7amim in #42091\r\n* Avoid mutating user-provided arguments in preprocessing utils  by @LeonardoEmili in #42126\r\n* Enforce check_auto_docstring  by @yonigozlan in #41635\r\n* Add dinov3 autobackbone  by @vijayabhaskar-ev in #41276\r\n* Fix logic error in `prepare_inputs_for_generation` cache slicing condition  by @albertvillanova in #41764\r\n* :rotating_light: Fix gradient checkpointing for several models and improve test robustness    by @githubnemo in #41818\r\n* [`T5Gemma`] Fix cross attention cache  by @vasqu in #41890\r\n* T5 migration to new masking interface  by @Aravind-11 in #41804\r\n* fix: improve visibility of ValueError root causes in model config loading  by @scottzh8 in #41972\r\n* add xpu to valid hardware for torch.compile  by @sywangyi in #42079\r\n* extend test_beam_search_early_stop_heuristic case to other device  by @sywangyi in #42078\r\n* fix failure of tests/models/shieldgemma2/test_modeling_shieldgemma2.p…  by @sywangyi in #42022\r\n* Fixes Flash Attention implementation for models   by @i3hz in #42149\r\n* fix test failure of speculative_generation on xpu  by @sywangyi in #42052\r\n* add rmsnorm kernels support for npu  by @zheliuyu in #42106\r\n* update torchao doc  by @jiqing-feng in #42139\r\n* feat(kernels): add opt-out flag to disable kernels hub usage through the lib  by @mfuntowicz in #41990\r\n* handle inputs from Siglip/Siglip2 non-automapped encoder layers  by @molbap in #41930\r\n* Add slow to some examples tests   by @SunMarc in #42164\r\n* fix(ci): unexpected keyword argument `streaming`  by @McPatate in #42102\r\n* pin `pytest<9` for now  by @ydshieh in #42162\r\n* Docs/i18n updates  by @lilin-1 in #42006\r\n* Fix in-place modification of user-input in SAM2 embed boxes  by @xenova in #42173\r\n* [`Pop2Piano`] Fix cache usage  by @vasqu in #42170\r\n* Fix helper fn for new processor config format  by @zucchini-nlp in #42085\r\n* Remove unnecessary slicing in sdpa_attention_forward  by @justinchuby in #41900\r\n* [`PEFT`] Fix prefix tuning  by @vasqu in #41696\r\n* [typo] fix mrope-interleave annotation to avoid ambiguity  by @JJJYmmm in #42177\r\n* Update transformers to support `FqnToConfig`  by @jcaip in #41894\r\n* [`PEFT`] Fix the general test for prefix tuning  by @vasqu in #42185\r\n* [TP] Fix parameter detection issue and some invalid TP-plans  by @Cyrilvallez in #42129\r\n* Refactor weight loading  by @ArthurZucker in #41580\r\n* 🚨 Delete deprecations with end-cycle in v4.xx and v5.0  by @zucchini-nlp in #41681\r\n* Add AutoTokenizer mapping for mistral3 and ministral  by @patrickvonplaten in #42198\r\n* Fix checkpoint loading with DeepSpeed ZeRO3  by @tohtana in #42201\r\n* [`Pop2Piano`] Fix tied weights  by @vasqu in #42193\r\n* New docker from AMD  by @remi-or in #42208\r\n* Add cross links for model contribution  by @zucchini-nlp in #42207\r\n* Stop inheriting tests!  by @Rocketknight1 in #42192\r\n* Refactor check_auto_docstring using AST  by @yonigozlan in #41432\r\n* [`BLT`] Fix cache usage  by @vasqu in #42188\r\n* Update `test_dynamic_cache_exportability_multiple_run` (failing on torch 2.10 nightly)  by @ydshieh in #42212\r\n* Much more efficient and clear weight initialization and tie weights  by @Cyrilvallez in #42191\r\n* GLM-V update with new processor  by @zRzRzRzRzRzRzR in #42122\r\n* Fix initialization guard for pytest  by @Cyrilvallez in #42234\r\n* Fix TP plans for MoE models  by @Cyrilvallez in #42236\r\n* Add prefix sharing to continuous batching  by @remi-or in #42094\r\n* Loading optimization  by @Cyrilvallez in #42239\r\n* calls `AttentionMaskConverter._unmask_unattended` for xpu device before  by @kaixuanliu in #42230\r\n* FIX Broken PEFT adapter loading  by @BenjaminBossan in #42187\r\n* Fix processor test for glm  by @molbap in #42233\r\n* Fix UnboundLocalError in RT-DETR loss computation  by @yashwantbezawada in #42224\r\n* Stop inheriting tests (again)  by @Rocketknight1 in #42247\r\n* [loading] Fix device when source and target are different  by @Cyrilvallez in #42246\r\n* Reduce timing on CircleCI - part 1 (Use @slow for IntegrationTests)  by @ydshieh in #42206\r\n* 🚨 Delete generation params from model config  by @zucchini-nlp in #41695\r\n* Allow VLMs to have a correct `base_model`  by @zucchini-nlp in #41589\r\n* Make tests run in less time by reducing `batch_size`  by @ydshieh in #42213\r\n* Revert \"Make tests run in less time by reducing `batch_size`\"  by @ydshieh in #42258\r\n* Cleanup reference to TFBertTokenizer and TFGPT2Tokenizer  by @Rocketknight1 in #42182\r\n* delete already deprecated models  by @ydshieh in #42235\r\n* Fix bnb for the weights refactor  by @SunMarc in #42043\r\n* Fix looping in torch guard decorator  by @Cyrilvallez in #42260\r\n* 🚨  Generalize `get_decoder()` for multimodal and delete redundant code 🔪   by @zucchini-nlp in #42156\r\n* Audio Flamingo3 - fix attention masking  by @zucchini-nlp in #42278\r\n* Add support for torch device objects in device validator  by @yonigozlan in #42267\r\n* Remove doc files of other langs for deleted models  by @ydshieh in #42276\r\n* [testing] fix `cwm`  by @ydshieh in #42261\r\n* fix a typo: pbd -> pdb  by @jaeminoh in #42268\r\n* Enable glm46v UTs on XPU  by @YangKai0616 in #42274\r\n* [testing] fix some cases in xpu  by @sywangyi in #42273\r\n* Remove random flag  by @Cyrilvallez in #42282\r\n* Fix accelerate integration  by @Cyrilvallez in #42264\r\n* Fix validation checks order in benchmark_v2  by @Abdennacer-Badaoui in #42280\r\n* Update torchcodec to match torchaudio version  by @remi-or in #42288\r\n* Use `torch.get_autocast_dtype` instead of `torch.get_autocast_gpu_dtype`  by @qgallouedec in #42055\r\n* perf: Optimization for Min-p sampling implementation  by @casinca in #42248\r\n* Fix device_map computation part 2  by @Cyrilvallez in #42290\r\n* Fixed the docstring for `WhisperFeatureExtractor`  by @TopCoder2K in #42286\r\n* avoiding conditional indexing in positionalencoding to avoid possibil…  by @ppadjinTT in #42090\r\n* ENH: Add support for LoRA hotswapping  by @BenjaminBossan in #41297\r\n* Fix Break change of AWQ FusedModules due to Attention Refactor  by @fanqiNO1 in #41909\r\n* Remove error string test that was failing  by @Rocketknight1 in #42301\r\n* Properly protect the is_compiling checks  by @Cyrilvallez in #42304\r\n* Remove outdated methods in modeling_utils.py  by @Cyrilvallez in #42302\r\n* Fix Mac mps dataloader_num_workers > 1 causes RuntimeError: _share_filename_: only available on CPU  by @AmitMY in #38819\r\n* Fix the init_weights for the MoE models  by @Cyrilvallez in #42306\r\n* Update link to generation strategies documentation  by @omkar-334 in #42252\r\n* Update conversion mapping to separate renaming from converting  by @ArthurZucker in #42254\r\n* fix(granitemoe*): Only create block_sparse_moe if num_local_experts > 0  by @gabe-l-hart in #42036\r\n* [SAM3 Video] Add support for multi prompts   by @yonigozlan in #42293\r\n* Add Pix2Struct fast image processor  by @yonigozlan in #42020\r\n* Fix post processing methods in  keypoints matching models  by @yonigozlan in #42018\r\n* fix tests/models/xcodec/test_modeling_xcodec.py::XcodecIntegrationTest  by @sywangyi in #42272\r\n* [loading] Fix device detection  by @Cyrilvallez in #42323\r\n* Fix typo from side_dict to size_dict  by @nihui in #42319\r\n* HF Trainer: ALST/Ulysses sequence parallelism integration via HF Accelerate  by @stas00 in #41832\r\n* Fix gpt2 modeling tests  by @Abdennacer-Badaoui in #42321\r\n* [loading] Use fewer threads by default for much better performances  by @Cyrilvallez in #42324\r\n* Allow LayoutLMV3Processor to accept rescale_factor  by @Rocketknight1 in #42305\r\n* Correctly create tied key mapping in post_init, and dynamic tie weight  by @Cyrilvallez in #42270\r\n* [`CI`] Skip `EfficientLoFTR` test  by @vasqu in #42327\r\n* [XPU] Add flash_attn2 support for XPU  by @YangKai0616 in #41956\r\n* [`Attn Masks`] Lift bidirectional mask restriction on eager  by @vasqu in #42325\r\n* fix bug when gemma3n model run on multiple device  by @kaixuanliu in #42303\r\n* Fix ChineseCLIPModel.get_text_features  by @JiangJQ2000 in #42351\r\n* Gemma3 hybrid fix  by @remi-or in #42287\r\n* fix(benchmarks): correct sdpa_backend inconsistency and attn_implementation for continuous batching  by @engmohamedsalah in #42339\r\n* Auto convert tekken.json  by @ArthurZucker in #42299\r\n* [loading] Re-add and improve disk offloading support  by @Cyrilvallez in #42242\r\n* Fix typo - indentation in JSON dump example  by @anthropikos in #42332\r\n* Fix tied weight for Bart (for BC)  by @Cyrilvallez in #42355\r\n* Fix reference to yelp dataset  by @JuanFKurucz in #42349\r\n* Fix documentation reference to pytorch max memory allocated  by @JuanFKurucz in #42350\r\n* Fix reference to imagenet 1k dataset  by @JuanFKurucz in #42348\r\n* Fix typos  by @omahs in #42354\r\n* Protect `torch.distributed` imports  by @Cyrilvallez in #42361\r\n* Expand npu device for KernelConfig  by @zheliuyu in #42358\r\n* Replace Optional and Union typing with | in some source files  by @cyyever in #42294\r\n* Fix code examples to load gpt 1 openai community model  by @JuanFKurucz in #42347\r\n* fix tekken pattern matching  by @ArthurZucker in #42363\r\n* Fixed-wrong-ZeRO3-json-snippet-found-in-deepspeed-markdown-file  by @Yacklin in #42346\r\n* Make benchmarking lighter: clean-up result files and remove non-needed arguments  by @remi-or in #42357\r\n* Add image processor fast vitpose  by @yonigozlan in #42021\r\n* Small tp fix  by @ArthurZucker in #42366\r\n* Remove test inheritance for EfficientLoftr, rename KeypointMatchingOutput to model specific name  by @yonigozlan in #42365\r\n* Tiny doc fix  by @molbap in #42296\r\n* Fix TimesFM patch normalization instability  by @AnMakc in #42099\r\n* [core] Fix torchao   by @MekkCyber in #42289\r\n* Fix tp  by @ArthurZucker in #42368\r\n* [`Attn Masks`] Add skip option for non-packed sequences  by @vasqu in #42367\r\n* 📚 docs(granite-speech): add comprehensive usage examples  by @gorkachea in #42125\r\n* Xcodec fix  by @eustlb in #42095\r\n* Replace Optional and Union typing with | in some source files  by @cyyever in #42372\r\n* [`Mistral Tokenizers`] Fix tokenizer detection  by @vasqu in #42389\r\n* misc don't recreate it  by @ArthurZucker in #42394\r\n* [SAM3] Fix precompute vision_embeds or text_embeds for inference  by @yonigozlan in #42407\r\n* 🚨 Image-text pipeline expects correctly formatted chat  by @zucchini-nlp in #42359\r\n* Many small fixes for the CI  by @remi-or in #42364\r\n* [core] fix mxfp4  by @MekkCyber in #42382\r\n* fixed json syntax error for zero2 configuration file found in deepspeed.md  by @Yacklin in #42406\r\n* GLM4V - delete duplicate config attribute  by @zucchini-nlp in #42416\r\n* 🚨 Remove generic output_attentions warning  by @Aravind-11 in #42334\r\n* Bart config doesn't need generation parameters  by @zucchini-nlp in #42337\r\n* Simplify and standardize processor tests  by @yonigozlan in #41773\r\n* Clean bnb integration using weight converter  by @SunMarc in #42426\r\n* Any to any pipeline and auto-mapping  by @zucchini-nlp in #40884\r\n* Fix processor usage + add chat_template support to TTS pipeline, and shift common chat template logic to base class.  by @ebezzam in #42326\r\n* [fp8] fix scales param name  by @MekkCyber in #42434\r\n* Fix an edge case for `get_encoder()`  by @zucchini-nlp in #42295\r\n* Disable loss rounding in training stats log  by @AnMakc in #42104\r\n* Benchmark simplification  by @remi-or in #42408\r\n* Future annotations break FastAPI  by @LysandreJik in #42450\r\n* [cleanup] Don't use Repository in create_dummy_models.py script  by @Wauplin in #42380\r\n* [cleanup] Remove deprecated load config from file  by @Wauplin in #42383\r\n* [`FA`] Cleanup loading logic  by @vasqu in #41427\r\n* tiny fix for deepseekocr support [vllm]  by @molbap in #42423\r\n* fix: Restore explicit .keys() calls for TensorDict compatibility  by @pankajbaid567 in #42373\r\n* Transformers serve -> list all generative models from the cache   by @LysandreJik in #42146\r\n* 🚨 [v5][PEFT] Bump min version requirement of PEFT to  0.18.0  by @BenjaminBossan in #41889\r\n* [cleanup] Offline mode and cache dir from `huggingface_hub` constants + cleanup in `PushToHubMixin`  by @Wauplin in #42391\r\n* Correctly return finish reason length when finished  by @LysandreJik in #42157\r\n* FIX: Minimal fix for loading PEFT weights  by @BenjaminBossan in #42387\r\n* Let's break Qwen-VL 🚨    by @zucchini-nlp in #42420\r\n* [`CI`] Add to run slow  by @vasqu in #42459\r\n* Fix the \"test_offline\" test  by @LysandreJik in #42458\r\n* `transformers chat` launched without base_url has a direct tie to localhost:8000  by @LysandreJik in #42463\r\n* update with more recent tts models  by @Deep-unlearning in #42328\r\n* rm slow tokenizers  by @itazap in #40936\r\n* [loading/saving] Reverse all loading operations when saving  by @Cyrilvallez in #42396\r\n* Fix T5 tests: use generation_config for generation parameters  by @Abdennacer-Badaoui in #42419\r\n* remove reference to TF models from docs  by @zucchini-nlp in #42443\r\n* [Trainer] use output.loss when using liger-kernel  by @kashif in #42444\r\n* replace source_keys and target_keys  by @SunMarc in #42471\r\n* Update migration guide - generation config  by @zucchini-nlp in #42470\r\n* 🚨 Move `rotary_partial_emb` to RopeParams and delete unnecessary code 🔪   by @zucchini-nlp in #42255\r\n* Fix doc builds  by @Rocketknight1 in #42478\r\n* extend CwmIntegrationTest to xpu  by @sywangyi in #42314\r\n* add require_deterministic_for_xpu to make the case pass in xpu  by @sywangyi in #42439\r\n* Skip failing irrelevant test for ColQwen2  by @Rocketknight1 in #42480\r\n* [quantization] make torchao tests slow  by @MekkCyber in #42482\r\n* Fix gpt2 tokenizer `add_prefix_space` default value   by @SunMarc in #42481\r\n\r\n## Significant community contributions\r\n\r\nThe following contributors have made significant changes to the library over the last release:\r\n\r\n* @ArthurZucker\r\n    * `JetMoe` Fix jetmoe after #40132 (#41324)\r\n    * [`ModularChecker`] QOL for the modular checker (#41361)\r\n    * [`CB`] Refactors the way we access paged (#41370)\r\n    * Update from pretrained error when loading (#33380)\r\n    * :facepalm: CB nit!  (#41413)\r\n    * [`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff (#41445)\r\n    * update deps table (#42120)\r\n    * Refactor weight loading (#41580)\r\n    * Update conversion mapping to separate renaming from converting (#42254)\r\n    * Auto convert tekken.json (#42299)\r\n    * fix tekken pattern matching (#42363)\r\n    * Small tp fix (#42366)\r\n    * Fix tp (#42368)\r\n    * misc don't recreate it (#42394)\r\n* @vasqu\r\n    * :rotating_light: [`v5`] Remove relative position embeddings (for bert like models) (#41170)\r\n    * [`v5`] Sync Bert and Bart eager attention (#41248)\r\n    * [`JetMoe`] Fix KV head repetition and padding free (#41423)\r\n    * :rotating_light: [`Attention Masks`] Bidirectional masks for encoder and encoder-decoder models (#41265)\r\n    * [`CI`] Fix copies on main (#41486)\r\n    * [`Docs`] Fix changed references (#41614)\r\n    * [`Executorch`] Simplify for encoder models (#41627)\r\n    * [`Ernie 4.5 Moe`] Fix Moe and offloading (#41385)\r\n    * [`Masks`] Fix mask handling in eager for vision models (#41625)\r\n    * [`Attn`] Allow dynamic causality in SDPA via Kwargs (#41692)\r\n    * [`Onnx docs`] Remove some traces (#41791)\r\n    * :rotating_light: [`Clip`] Fix masking and enable flash attention on all model types (#41750)\r\n    * [`Attn Masks`] Non-vmap default for attention masks (#41852)\r\n    * [`T5Gemma`] Fix cross attention cache (#41890)\r\n    * [`Pop2Piano`] Fix cache usage (#42170)\r\n    * [`PEFT`] Fix prefix tuning (#41696)\r\n    * [`PEFT`] Fix the general test for prefix tuning (#42185)\r\n    * [`Pop2Piano`] Fix tied weights (#42193)\r\n    * [`BLT`] Fix cache usage (#42188)\r\n    * [`CI`] Skip `EfficientLoFTR` test (#42327)\r\n    * [`Attn Masks`] Lift bidirectional mask restriction on eager (#42325)\r\n    * [`Attn Masks`] Add skip option for non-packed sequences (#42367)\r\n    * [`Mistral Tokenizers`] Fix tokenizer detection (#42389)\r\n    * [`FA`] Cleanup loading logic (#41427)\r\n    * [`CI`] Add to run slow (#42459)\r\n* @ydshieh\r\n    * [testing] update `test_longcat_generation_cpu` (#41368)\r\n    * [testing] Fix `JetMoeIntegrationTest` (#41377)\r\n    * Pickle - part 2 (#41476)\r\n    * Try to remove `pickle` - `BloomTokenizerFast` (#41466)\r\n    * [testing] reduce runtime of `HunYuanMoEV1IntegrationTest:test_model_generation` (#41373)\r\n    * delete some tokenizer tests using pickle (#41514)\r\n    * torch 2.9 don't ❤️ torchcodec 💔  (#41610)\r\n    * Update a dataset reop link (#41618)\r\n    * Remove the head masking block in some vision models (#41620)\r\n    * improve `utils/check_bad_commit.py` (#41658)\r\n    * torch 2.9 still don't ❤️ torchcodec 0.8 💔 (#41686)\r\n    * path validation for security reason (#41256)\r\n    * pin torchcodec on CI docker image (#41703)\r\n    * further improve `utils/check_bad_commit.py` (#41658) (#41690)\r\n    * Revert \"Remove upper version bound of pandas\" (#41744)\r\n    * Fix bark after #41445 (#41645)\r\n    * flash attn pytest marker (#41781)\r\n    * unpin torch/torchcodec for CircleCI (#41839)\r\n    * further reducing flakiness in `utils/check_bad_commit.py` (#41658)  (#41815)\r\n    * CI workflow for Flash Attn (#41857)\r\n    * Update some workflow files (#41892)\r\n    * Minor fix in docker image build workflow (#41949)\r\n    * Run slow v2 (#41914)\r\n    * Fix `detectron2` installation in docker files (#41975)\r\n    * Fix `autoawq[kernels]` installation in quantization docker file (#41978)\r\n    * Fix `torchcodec` version in quantization docker file (#41988)\r\n    * Fix `run slow v2`: empty report when there is only one model (#42002)\r\n    * Fix `torch+deepspeed` docker file (#41985)\r\n    * fix `deeepspeed` in AMD docker file (#42025)\r\n    * Change trigger time for AMD CI (#42034)\r\n    * Remove some custom datasets defined in codebase (#41511)\r\n    * Cleanup workflow - part 1 (#42023)\r\n    * Fix `pr_slow_ci_suggestion.yml` after #42023 (#42049)\r\n    * Avoid explicit checkout in workflow (#42057)\r\n    * Be careful at explicit checkout actions (#42060)\r\n    * Fix another `Argument list too long` in `pr_slow_ci_suggestion.yml` (#42061)\r\n    * Revert back to use GitHub context  (#42066)\r\n    * Fix inconsistency of commit sha during the workflow run (#42074)\r\n    * Revert \"permissions worflows fix\" (#42110)\r\n    * pin `pytest<9` for now (#42162)\r\n    * Update `test_dynamic_cache_exportability_multiple_run` (failing on torch 2.10 nightly) (#42212)\r\n    * Reduce timing on CircleCI - part 1 (Use @slow for IntegrationTests) (#42206)\r\n    * Make tests run in less time by reducing `batch_size` (#42213)\r\n    * Revert \"Make tests run in less time by reducing `batch_size`\" (#42258)\r\n    * delete already deprecated models (#42235)\r\n    * Remove doc files of other langs for deleted models (#42276)\r\n    * [testing] fix `cwm` (#42261)\r\n* @cyyever\r\n    * Remove unnecessary list comprehension (#41305)\r\n    * Remove unused function patameters (#41358)\r\n    * Use accelerator API to free device memory (#41195)\r\n    * Remove Python 3.9 classifier (#41410)\r\n    * Remove KERAS_NLP_IMPORT_ERROR (#41468)\r\n    * Import Callable from collections.abc (#41130)\r\n    * Remove infer_device (#41088)\r\n    * Fix Latex typesetting in documentation (#41177)\r\n    * Fix typsetting and content of llm_tutorial_optimization.md (#41172)\r\n    * More markdown file fixes (#41599)\r\n    * Format MarkDown documentation and tiny fixes (#41638)\r\n    * Fix typos in documentation (#41641)\r\n    * Fix confusing cls assignment (#41642)\r\n    * Use | for Optional and Union typing (#41646)\r\n    * Remove  require_torch_bf16_gpu (#40979)\r\n    * Fix MarkDown syntax (#41676)\r\n    * Use | for Optional and Union typing  (#41675)\r\n    * Enable faiss-cpu on Windows (#41678)\r\n    * Fix Pylint warnings (#41644)\r\n    * Enable  FURB rules in ruff (#41395)\r\n    * Remove upper version bound of pandas (#41677)\r\n    * Fix documentation issues (#41726)\r\n    * Apply RUFF PIE rules (#41727)\r\n    * Replace Optional and Union typing with | in some source files (#42294)\r\n    * Replace Optional and Union typing with | in some source files (#42372)\r\n* @yao-matrix\r\n    * make some ut cases pass on xpu w/ latest torch (#41337)\r\n    * fix asr ut failures (#41332)\r\n    * enable new model uts to xpu and fix some failures on xpu (#41386)\r\n    * enable some falcon-mamba uts on xpu (#41428)\r\n    * enhance patched_tearDown to support python 3.11+ (#41429)\r\n    * fix gemma3n case failure (#41426)\r\n    * upgrade xpu docker file to torch 2.8 (#41551)\r\n    * make apollo test case pass (#41805)\r\n    * extend bitnet cases to xpu, all 8 cases pass (#41831)\r\n    * extend 2 trainer test cases to xpu (#41829)\r\n    * extend 2 blip2 and falcon_h1 test cases to xpu (#41825)\r\n    * make lfm2_moe integration test pass on XPU (#41796)\r\n    * fix some ut failures on XPU w/ torch 2.9 (#41923)\r\n    * fix some ut failures on XPU w/ torch 2.9 (#41941)\r\n    * fix prepare_config_and_inputs_for_common bug in llava test (#41942)\r\n    * make recurrent_gemma and voxtral cases pass on xpu (#41958)\r\n    * extend fp_quant cases to xpu (#41833)\r\n    * fix tensor device placement issue of 2 UT cases (#41921)\r\n    * fix continuous batching issues, extend ut cases to xpu (#41830)\r\n* @MekkCyber\r\n    * [kernels] Kernel Config  (#41232)\r\n    * Fixing comments in __init__ file (#41414)\r\n    * [kernels] Cleanup deta kernel (#41470)\r\n    * Cleaning hub kernels  (#41477)\r\n    * Remove DISABLE_KERNEL_MAPPING flag (#41475)\r\n    * [kernels] Remove RWKV kernel finally ! (#41493)\r\n    * [kernels] rm yoso kernel (#41495)\r\n    * [kernels] rm mra kernels (#41507)\r\n    * Revert \"add rmsnorm kernels support for Intel XPU\" (#41579)\r\n    * [kernels] refactor function kernel calling (#41577)\r\n    * Erroring when KernelConfig is passed without use_kernels = True (#41657)\r\n    * Small Fix for imports  (#41411)\r\n    * [kernels] Add version to function mapping (#41685)\r\n    * [quantization] fix compressed_tensors tests (#41780)\r\n    * [quantization] Skip Fp8 tests when hardware capability < 8.9 (#41785)\r\n    * [quantization] fix torchao tests after 0.14.0 release (#41777)\r\n    * revert changes in _is_package_available (#41891)\r\n    * [kernels] Add Tests & CI for kernels (#41765)\r\n    * [kernels] change import time in KernelConfig (#42004)\r\n    * [kernels] Fix XPU layernorm kernel (#41583)\r\n    * [core] Fix torchao  (#42289)\r\n    * [core] fix mxfp4 (#42382)\r\n    * [fp8] fix scales param name (#42434)\r\n    * [quantization] make torchao tests slow (#42482)\r\n* @paulpak58\r\n    * [Cache] lfm2 cache: allocate empty kv layers during init (#41396)\r\n    * [Model] Lfm2Moe (#41401)\r\n* @gante\r\n    * 🚨 [v5] Prune `prune_heads` (#41417)\r\n    * [v5] rm `utils/tf_ops/` (#41402)\r\n    * [causallm tester] automate pipeline mappings + bloom tests (#41318)\r\n    * 🚨 [v5] `generate` delegates default cache initialization to the model (#41505)\r\n* @zRzRzRzRzRzRzR\r\n    * Update GLM-4.1V MMRope implementation (#41182)\r\n    * Update GLM-4.6 doc (#41471)\r\n    * Add aux loss for GLM-4.5V (#41564)\r\n    * 4.1V Model and GLM-4.5V Model Conversion Code Updates (#41784)\r\n    * GLM-V update with new processor (#42122)\r\n* @jacobkahn\r\n    * Add Code World Model (CWM) (#41199)\r\n* @molbap\r\n    * Update philosophy (#41438)\r\n    * [QoL] modular conversion shows LoC saved (#41500)\r\n    * Double router compute? (#41653)\r\n    * Add vision contribution guide (#41456)\r\n    * Modernize CLIP modeling code  (#41546)\r\n    * handle inputs from Siglip/Siglip2 non-automapped encoder layers (#41930)\r\n    * Fix processor test for glm (#42233)\r\n    * Tiny doc fix (#42296)\r\n    * tiny fix for deepseekocr support [vllm] (#42423)\r\n* @Wauplin\r\n    * Bump to hfh 1.0.0.rc5 to fix test (#41508)\r\n    * Migrate transformers cli to Typer (#41487)\r\n    * Remove deprecated `use_auth_token` parameter (#41666)\r\n    * added more breaking changes\r\n    * [cleanup] Don't use Repository in create_dummy_models.py script (#42380)\r\n    * [cleanup] Remove deprecated load config from file (#42383)\r\n    * [cleanup] Offline mode and cache dir from `huggingface_hub` constants + cleanup in `PushToHubMixin` (#42391)\r\n* @remi-or\r\n    * Restore cuda graphs to continuous batching (#41421)\r\n    * Fix an import error with PreTrainModel (#41571)\r\n    * Add __iter__ to DynamicCache (#41569)\r\n    * Gemma3 fixes (#41572)\r\n    * Benchmark overhaul (#41408)\r\n    * Fix fp32_ln for various models (#41605)\r\n    * Fix EncoderDecoder cache (#41612)\r\n    * Switch to CB if cache_implementation == paged (#41655)\r\n    * Small changes to benchmarking script (#41662)\r\n    * Bump AMD docker (#41792)\r\n    * Add a safeguard around a flaky test in gemma2 (#41811)\r\n    * Use indices as position_ids in modernebert (#41789)\r\n    * Move the Mi355 to regular docker (#41989)\r\n    * More data in benchmarking (#41848)\r\n    * Reduce the number of benchmark in the CI (#42008)\r\n    * New docker from AMD (#42208)\r\n    * Add prefix sharing to continuous batching (#42094)\r\n    * Update torchcodec to match torchaudio version (#42288)\r\n    * Gemma3 hybrid fix (#42287)\r\n    * Make benchmarking lighter: clean-up result files and remove non-needed arguments (#42357)\r\n    * Many small fixes for the CI (#42364)\r\n    * Benchmark simplification (#42408)\r\n* @lkhl\r\n    * [model] Add VideoLLaMA3 implementation (#40499)\r\n* @philiproeleveld\r\n    * Add `logits_to_keep` to many older CausalLM models (#41335)\r\n* @AlphaOrOmega\r\n    * Adding superglue fast image processing (#41394)\r\n* @echarlaix\r\n    * [v5] Remove deprecated tranformers.onnx (#41700)\r\n* @Aravind-11\r\n    * Add GLPNImageProcessorFast  (#41725)\r\n    * T5 migration to new masking interface (#41804)\r\n    * 🚨 Remove generic output_attentions warning (#42334)\r\n* @DeXtAr47-oss\r\n    * add fuyu fast image processors (#41817)\r\n* @lashahub\r\n    * [models] Add AudioFlamingo3 integration (#40290)\r\n* @lilin-1\r\n    * Docs/i18n updates (#42006)\r\n* @burtenshaw\r\n    * [MODEL] Nanochat implementation (#41634)\r\n* @itazap\r\n    * rm slow tokenizers (#40936)\r\n","publishedAt":"2025-12-01T18:14:54.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v5.0.0rc0","media":[]},{"id":"rel_tHFPfQ0gamZ1ZzZw66ts3","version":"v4.57.3","title":"Patch release v4.57.3","summary":"There was a hidden bug when loading models with `local_files_only=True` and a typo related to the recent patch. \r\n\r\nThe main fix is: https://github.co...","content":"There was a hidden bug when loading models with `local_files_only=True` and a typo related to the recent patch. \r\n\r\nThe main fix is: https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a.\r\n\r\nWe are really sorry that this slipped through, our CIs just did not catch it.\r\n\r\nAs it affects a lot of users we are gonna yank the previous release","publishedAt":"2025-11-25T15:51:36.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.3","media":[]},{"id":"rel_9vRkW13eR-sZUvTZrrFst","version":"v4.57.2","title":"Patch Release v4.57.2","summary":"This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:\r\n\r\n- Add AutoTokenizer mapping for mistral3 and ...","content":"This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:\r\n\r\n- Add AutoTokenizer mapping for mistral3 and ministral (#42198)\r\n- Auto convert tekken.json (#42299)\r\n- fix tekken pattern matching (#42363)\r\n- Check model inputs - hidden states (#40994)\r\n- Remove invalid `@staticmethod` from module-level get_device_and_memory_breakdown (#41747)","publishedAt":"2025-11-24T17:54:34.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.2","media":[]},{"id":"rel__6uXilh3K08jg5FdHBTEz","version":"v4.57.1","title":"Patch release v4.57.1","summary":"This patch most notably fixes an issue with an optional dependency (`optax`), which resulted in parsing errors with `poetry`. It contains the followin...","content":"This patch most notably fixes an issue with an optional dependency (`optax`), which resulted in parsing errors with `poetry`. It contains the following fixes:\r\n\r\n- [fix optax dep issue](https://github.com/huggingface/transformers/commit/0645c9ec3188e000aecf5060e2cdabcc156bb794)\r\n- [remove offload_state_dict from kwargs](https://github.com/huggingface/transformers/commit/a92b1e8a45e1863b95c5e2caa12f5597aee80279)\r\n- Fix bnb fsdp loading for pre-quantized checkpoint (#41415)\r\n- Fix tests fsdp (#41422)\r\n- Fix trainer for py3.9 (#41359)","publishedAt":"2025-10-14T15:39:34.000Z","url":"https://github.com/huggingface/transformers/releases/tag/v4.57.1","media":[]}],"pagination":{"page":1,"pageSize":20,"totalPages":6,"totalItems":104},"summaries":{"rolling":{"windowDays":90,"summary":"Transformers shipped v5.0 as a major overhaul after five years, overhauling tokenization APIs and introducing dynamic weight loading with quantization support, while simultaneously accelerating a wave of multimodal and specialized model integrations. Gemma 4 arrived with vision capabilities handling variable image sizes via spatial 2D RoPE, VidEoMT landed as a lightweight video segmentation encoder achieving 5-10x speedups, and the library absorbed a steady stream of domain-specific architectures—from speech (VibeVoice ASR, VoxtralRealtime) and document understanding (PP-DocLayoutV3, UVDoc) to mixture-of-experts variants (EXAONE-MoE, GLM-5) and multilingual models (EuroBERT). Concurrent v5 RCs prioritized MoE performance optimizations using batched expert implementations and resolved tokenizer class enforcement issues by preferring the TokenizersBackend, while the v4 line stabilized with targeted fixes for model loading and generation methods.","releaseCount":12,"generatedAt":"2026-04-07T17:27:18.483Z"},"monthly":[{"year":2026,"month":3,"summary":"March shipped new model support across vision, audio, and language domains. VidEoMT brought lightweight video segmentation running at 160 FPS through query propagation across frames, while EuroBERT added multilingual encoding with an 8192-token context window. The month also integrated PaddlePaddle models, Mistral 4, and Jina Embeddings v3 alongside specialized models for document layout, speech recognition, and time-series forecasting.","releaseCount":2,"generatedAt":"2026-04-07T17:27:20.129Z"}]}}