Home/Hugging Face

Hugging Face

$npx @buildinternet/releases get hugging-face

Overview Releases Sources

Show prereleases

May 9, 2026

v1.4.0 ↗

viaTRL

Features

Chunked cross-entropy loss for SFT (up to –50% VRAM)

A new loss_type="chunked_nll" option drastically reduces peak activation memory in SFT by avoiding the full [batch × seq × vocab] logits tensor. Ignored-label tokens are dropped before the lm_head matmul, and the cross-entropy is computed over the remaining tokens in checkpointed chunks (default chunk_size=256, the sweet spot consistent across model sizes and sequence lengths).

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3-4B",
    args=SFTConfig(loss_type="chunked_nll"),
    train_dataset=dataset,
)
trainer.train()

Peak GPU memory, AdamW fp32:

Model	Hardware	Seq	`nll`	`chunked_nll`
Qwen3-1.7B + LoRA	1×H100 80GB	2048	47.9 GB	12.3 GB (3.9× less)
Qwen3-4B	1×H100 80GB	16384	OOM	63.8 GB
Qwen3-14B	8×H100 FSDP2	16384	58.9 GB	38.9 GB (1.5× less)
Qwen3-32B	8×H100 FSDP2	8192	OOM	71.2 GB

End-to-end, chunked NLL is consistently as fast or faster than nll — and it unlocks sequence lengths that don't fit at all under the standard path.

The chunked path also supports VLMs (https://github.com/huggingface/trl/pull/5684).

by @qgallouedec in https://github.com/huggingface/trl/pull/5575, https://github.com/huggingface/trl/pull/5676 and https://github.com/huggingface/trl/pull/5684

OpenReward Standard environment adapter (experimental)

A new trl.experimental.openreward adapter plugs any environment speaking the Open Reward Standard (ORS) protocol into any TRL trainer accepting an environment_factory (GRPOTrainer, AsyncGRPOTrainer). One identifier wires all three trainer slots — dataset, factory, reward_func:

from trl import GRPOConfig, GRPOTrainer
from trl.experimental.openreward import OpenRewardEnv

env = OpenRewardEnv("Eigent/SETA")  # or "http://localhost:8000"

trainer = GRPOTrainer(
    model="Qwen/Qwen3-4B",
    args=GRPOConfig(...),
    train_dataset=env.dataset,
    environment_factory=env.factory,
    reward_funcs=env.reward_func,
)

Tools are bound dynamically from JSON Schema at construction (no per-env wrapper code), and env.dataset autoderives task lists from the ORS task endpoints. The same code path works for envs hosted on the OpenReward platform, self-hosted on any container service, or running locally on localhost. A SETA training example is included.

by @adithya-s-k in https://github.com/huggingface/trl/pull/5696

Training-invariance test suite

Unit tests don't catch trainer-level numerical drift (gradient-accumulation normalization bugs, attention-impl divergence (eager ↔ FA2 / kernels)) they silently shift the loss trajectory and users only notice when their run no longer reproduces. (Cf. last year's transformers grad-accum bug, or the "We found two bugs in DeepSpeed" paper.)

A new opt-in pytest -m invariant suite asserts the loss / grad_norm trajectory of short end-to-end SFT/DPO runs against committed reference snapshots, with equivalence classes for configs that should produce identical trajectories (e.g. pdb=1, gas=8 ≡ default; eager ≡ FA2 ≡ kernels). Hardware-pinned to H100 80GB, real pretrained model, full_determinism, fixed seed. Initial coverage: 2 trainers × 2 invariance axes (grad-accum, attn-impl) × gradient-checkpointing equivalence.

by @qgallouedec in https://github.com/huggingface/trl/pull/5686, https://github.com/huggingface/trl/pull/5688 and https://github.com/huggingface/trl/pull/5689

MFU helpers

Three new pure helpers in trl.trainer.utils for measuring training efficiency:

compute_flops_per_token(config, seq_len) — handles dense and MoE (Mixtral, Qwen3-MoE, DeepSeek-V2)
compute_mfu(flops_per_token, tps, world_size, peak_flops) — Model FLOPs Utilization as a percentage
adjusted_mfu(mfu, config, seq_len) — non-causal → causal-corrected (Llama / DS Ulysses convention)

by @AmineDiro in https://github.com/huggingface/trl/pull/5698

GRPO Liger kernel update (Liger 0.8.0)

GRPO's Liger-kernel integration is updated for Liger 0.8.0: delta two-sided clipping, use_bias_correction_kl, and SAPO/VESPO parameters are now forwarded into LigerFusedLinearGRPOLoss. The previous delta + use_liger_kernel guard is removed — both can be combined.

by @kashif in https://github.com/huggingface/trl/pull/5690

Length-normalized DPO sigmoid loss

A new loss_type="sigmoid_norm" option for DPOConfig implements the per-token (length-normalized) DPO loss used by Tülu 3 / OLMo (paper §5.1.2 eq. 6) to mitigate length bias.

from trl import DPOConfig, DPOTrainer

trainer = DPOTrainer(
    model="Qwen/Qwen3-4B",
    args=DPOConfig(loss_type="sigmoid_norm"),
    train_dataset=dataset,
)

by @BrownianNotion in https://github.com/huggingface/trl/pull/5406

Even more training chat templates

Four more model families gain training-compatible chat templates with {% generation %} markers (assistant-only loss masking) and/or response schemas (tool-calling parsing):

Cohere training template by @dschulmeist in https://github.com/huggingface/trl/pull/5627
Cohere2 {% generation %} markers by @qgallouedec in https://github.com/huggingface/trl/pull/5675
Gemma 3 training template by @hwanython in https://github.com/huggingface/trl/pull/5685
Qwen3-2507 training template by @SwayamInSync in https://github.com/huggingface/trl/pull/5574
Qwen2.5 response schema by @aazizyan in https://github.com/huggingface/trl/pull/5728

get_training_chat_template now also accepts a processor (not just a tokenizer) — useful for VLMs (https://github.com/huggingface/trl/pull/5560).

KTO ↔ DPO alignment: closing in on graduation

Another batch of alignment PRs this cycle. KTO and DPO are now structurally aligned across PEFT handling, model initialization, training-arg grouping, ref-logp precomputation, and metric handling — promotion of KTO out of experimental is imminent.

PRs (all by @albertvillanova): #5659, #5660, #5661, #5679, #5701, #5702, #5703, #5704, #5705, #5714.

Other

Reject parallelism_config with cp_size>1 or sp_size>1 in GRPO/RLOO — fail fast at config init with a clear error instead of mid-training crash. By @kashif in https://github.com/huggingface/trl/pull/5699
Fail early for unsupported PEFT + Liger Kernel in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/5709
Explicitly set model_accepts_loss_kwargs=False in DPO and Reward by @albertvillanova in https://github.com/huggingface/trl/pull/5710
Set _tokenizer attribute in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5566
Simplify peft_config handling in core / experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5673 and https://github.com/huggingface/trl/pull/5674
Replace isinstance with is_peft_model / drop redundant is_peft_available by @albertvillanova in https://github.com/huggingface/trl/pull/5682 and https://github.com/huggingface/trl/pull/5683
Reduce inconsistency across trainer test files by @qgallouedec in https://github.com/huggingface/trl/pull/5678
Refactor tiny-model generation scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5637
Revert VLM support in parse_response by @qgallouedec in https://github.com/huggingface/trl/pull/5561

Fixes

5 GB+ CUDA memory leak in activation offloading — OffloadActivations.__exit__ now syncs the compute/offload streams and clears the stash dictionaries, preventing orphaned offload tensors from leaking onto a dead stream (~0.2 GiB/step accumulation observed during QLoRA vision training before the fix). By @butterwecksolutions in https://github.com/huggingface/trl/pull/5694 and https://github.com/huggingface/trl/pull/5700
Fix reverse-KL server path NaN on variable completion length in DistillationTrainer by @k1064190 in https://github.com/huggingface/trl/pull/5594
GKDTrainer: fix return_outputs in the Liger kernel path by @roycho96 in https://github.com/huggingface/trl/pull/4688
GKDTrainer: fix seq-KD wasted teacher forward by @roycho96 in https://github.com/huggingface/trl/pull/5726
GKDTrainer: fix Liger fused JSD path computing wrong loss by @roycho96 in https://github.com/huggingface/trl/pull/5731
Fix missing PEFT validation when passing peft_config to core / experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5664 and https://github.com/huggingface/trl/pull/5665
Fix peft_config type hint in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5666
Fix discarded assertion message in trainer parameter checks by @qgallouedec in https://github.com/huggingface/trl/pull/5677
Fix typo in model name in README by @qgallouedec in https://github.com/huggingface/trl/pull/5711

Documentation and Examples

Upload testing suite for DistillationTrainer by @cmpatino in https://github.com/huggingface/trl/pull/5615

CI

Fix OOM in CI by reducing batch size in VLM SFT tests by @albertvillanova in https://github.com/huggingface/trl/pull/5687
Fix OOM in CI by reducing image size of tiny Gemma 3 model by @albertvillanova in https://github.com/huggingface/trl/pull/5680
Fix OOM in CI test reruns due to GPU memory leak from traceback frame locals by @albertvillanova in https://github.com/huggingface/trl/pull/5681
Add tiny Qwen3-4B-Instruct-2507 by @qgallouedec in https://github.com/huggingface/trl/pull/5586
Align tiny Qwen3 MoE config with Qwen/Qwen3-30B-A3B by @qgallouedec in https://github.com/huggingface/trl/pull/5716
Fix GRPO VLM tests: multimodal training requires conversational prompts by @kaixuanliu in https://github.com/huggingface/trl/pull/5550
Regenerate invariance data + relax the tolerance by @qgallouedec in https://github.com/huggingface/trl/pull/5688

New Contributors

@dschulmeist made their first contribution in https://github.com/huggingface/trl/pull/5627
@k1064190 made their first contribution in https://github.com/huggingface/trl/pull/5594
@butterwecksolutions made their first contribution in https://github.com/huggingface/trl/pull/5694
@hwanython made their first contribution in https://github.com/huggingface/trl/pull/5685
@BrownianNotion made their first contribution in https://github.com/huggingface/trl/pull/5406
@adithya-s-k made their first contribution in https://github.com/huggingface/trl/pull/5696
@roycho96 made their first contribution in https://github.com/huggingface/trl/pull/4688
@aazizyan made their first contribution in https://github.com/huggingface/trl/pull/5728

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5648
Align KTO with DPO: Remove model_init parameter by @albertvillanova in https://github.com/huggingface/trl/pull/5659
Align KTO with DPO: Remove preprocess_logits_for_metrics parameter by @albertvillanova in https://github.com/huggingface/trl/pull/5660
Add tiny Qwen3-4B-Instruct-2507 by @qgallouedec in https://github.com/huggingface/trl/pull/5586
Chunked cross-entropy loss for SFT (up to –50% VRAM) by @qgallouedec in https://github.com/huggingface/trl/pull/5575
Fix missing PEFT validation when passing peft_config to core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5664
Fix missing PEFT availability check when passing peft_config to experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5665
Align KTO with DPO: Align PEFT handling by @albertvillanova in https://github.com/huggingface/trl/pull/5661
Set _tokenizer attribute in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5566
Fix peft_config type hint in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5666
Add Cohere training chat template by @dschulmeist in https://github.com/huggingface/trl/pull/5627
Simplify peft_config handling in core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5673
Simplify peft_config handling in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5674
fix(distillation): reverse-KL server path NaN on variable completion length by @k1064190 in https://github.com/huggingface/trl/pull/5594
Fix discarded assertion message in trainer parameter checks by @qgallouedec in https://github.com/huggingface/trl/pull/5677
Align KTO with DPO: Replace direct type check with is_peft_model by @albertvillanova in https://github.com/huggingface/trl/pull/5679
Remove redundant is_peft_available from core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5682
Replace isinstance with is_peft_model in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5683
Upload testing suite for DistillationTrainer by @cmpatino in https://github.com/huggingface/trl/pull/5615
Fix OOM in CI by reducing batch size in VLM SFT tests by @albertvillanova in https://github.com/huggingface/trl/pull/5687
Fix OOM in CI by reducing image size of tiny Gemma3 model by @albertvillanova in https://github.com/huggingface/trl/pull/5680
Fix OOM in CI test reruns due to GPU memory leak from traceback frame locals by @albertvillanova in https://github.com/huggingface/trl/pull/5681
Add training-invariance tests by @qgallouedec in https://github.com/huggingface/trl/pull/5686
Regenerate invariance data + relax the tolerance by @qgallouedec in https://github.com/huggingface/trl/pull/5688
fix: prevent RuntimeError crash in activation offloading for non-contiguous tensors by @butterwecksolutions in https://github.com/huggingface/trl/pull/5694
[GRPO] update Liger-kernel grpo loss (delta, vespo, KL bias correction) by @kashif in https://github.com/huggingface/trl/pull/5690
Extend invariant suite with gradient-checkpointing equivalence by @qgallouedec in https://github.com/huggingface/trl/pull/5689
Add Gemma 3 training chat template by @hwanython in https://github.com/huggingface/trl/pull/5685
Add {% generation %} markers for Cohere2 chat template by @qgallouedec in https://github.com/huggingface/trl/pull/5675
Add length-normalized sigmoid loss type to DPO trainer by @BrownianNotion in https://github.com/huggingface/trl/pull/5406
Add training chat template for Qwen3-2507 by @SwayamInSync in https://github.com/huggingface/trl/pull/5574
Align KTO with DPO: Remove enforcement of causal language models by @albertvillanova in https://github.com/huggingface/trl/pull/5701
Align KTO with DPO: Remove duplicate import of PreTrainedModel by @albertvillanova in https://github.com/huggingface/trl/pull/5702
Align KTO with DPO: Simplify max_length init logic by @albertvillanova in https://github.com/huggingface/trl/pull/5703
Align KTO with DPO: Group training arguments by @albertvillanova in https://github.com/huggingface/trl/pull/5704
Align KTO with DPO: Use _metrics attribute by @albertvillanova in https://github.com/huggingface/trl/pull/5705
Reduce inconsistency across trainer test files by @qgallouedec in https://github.com/huggingface/trl/pull/5678
Refactor tiny-model generation scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5637
Accept processor in get_training_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/5560
Enable chunked NLL loss with PEFT in SFT by @qgallouedec in https://github.com/huggingface/trl/pull/5676
Fix GRPO VLM tests: Multimodal training requires conversational prompts by @kaixuanliu in https://github.com/huggingface/trl/pull/5550
[experimental] Add OpenReward Standard environment adapter by @adithya-s-k in https://github.com/huggingface/trl/pull/5696
GKDTrainer: Fix return_outputs in Liger kernel path and update tests by @roycho96 in https://github.com/huggingface/trl/pull/4688
Reject parallelism_config with cp_size>1 or sp_size>1 in GRPO/RLOO by @kashif in https://github.com/huggingface/trl/pull/5699
Fix typo in model name in README by @qgallouedec in https://github.com/huggingface/trl/pull/5711
Explicitly set model_accepts_loss_kwargs=False in DPO and Reward by @albertvillanova in https://github.com/huggingface/trl/pull/5710
Fail early for unsupported PEFT + Liger Kernel in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/5709
Revert VLM support in parse_response by @qgallouedec in https://github.com/huggingface/trl/pull/5561
Align KTO with DPO: Align _precompute_ref_logps by @albertvillanova in https://github.com/huggingface/trl/pull/5714
fix: prevent 5 GB+ CUDA memory leak in activation offloading by syncing streams and clear stashes in OffloadActivations.exit by @butterwecksolutions in https://github.com/huggingface/trl/pull/5700
Align tiny Qwen3 MoE config with Qwen/Qwen3-30B-A3B by @qgallouedec in https://github.com/huggingface/trl/pull/5716
Add MFU helpers by @AmineDiro in https://github.com/huggingface/trl/pull/5698
[GKD] Fix seq kd wasted teacher forward by @roycho96 in https://github.com/huggingface/trl/pull/5726
Add Qwen2.5 response schema by @aazizyan in https://github.com/huggingface/trl/pull/5728
Enable chunked NLL loss with VLM in SFT by @qgallouedec in https://github.com/huggingface/trl/pull/5684
[GKD] Fix Liger fused JSD path computing wrong loss by @roycho96 in https://github.com/huggingface/trl/pull/5731
Release: v1.4 by @qgallouedec in https://github.com/huggingface/trl/pull/5732

Full Changelog: https://github.com/huggingface/trl/compare/v1.3.0...v1.4.0

May 8, 2026

v1.0.27 ↗

Release v1.0.27

viatimm (pytorch-image-models)

April 23, 2026

Add Gemma4 ViT encoders w/ NaFlex pipeline support (variable aspect/size per image). Thanks Yonghye Kwon
Support DINOv3 weights in NaFlexVit. Thanks Yonghye Kwon
Some improvements to Muon fallback (AdamW/NadamW) lr behavior

What's Changed

🔒 Pin GitHub Actions to commit SHAs by @paulinebm in https://github.com/huggingface/pytorch-image-models/pull/2689
Improve fallback (adamw/nadamw) LR handling for Muon optimizer by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2688
Fix NaFlexVit DINOv3 support: propagate rope_type and rotate_half by @developer0hye in https://github.com/huggingface/pytorch-image-models/pull/2692
chore: bump doc-builder SHA for PR upload workflow by @rtrompier in https://github.com/huggingface/pytorch-image-models/pull/2694
Gemma4 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2697
Add encoder_pool option to gemma4 classification model to toggle soft… by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2698
Fix some performance regressions with torch.compile + Tasks. Fix #2693 by @rwightman in https://github.com/huggingface/pytorch-image-models/pull/2699

New Contributors

@paulinebm made their first contribution in https://github.com/huggingface/pytorch-image-models/pull/2689

Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.26...v1.0.27

May 6, 2026

v1.14.0 ↗

[v1.14.0] Handle Spaces secrets & variables from CLI and other improvements

viahuggingface_hub

🖥️ Manage Space secrets and variables from the CLI

You can now manage Space secrets and environment variables directly from the command line with two new hf spaces subgroups: secrets and variables. Use hf spaces secrets to add, list, and delete write-only secrets, and hf spaces variables to add, list, and delete readable environment variables. Both add commands support multiple -s/-e flags and --secrets-file/-env-file for loading from dotenv files. On the Python side, HfApi.get_space_secrets() returns secret metadata (key, description, updated timestamp) without ever revealing values.

# List secrets (values are write-only — only keys and timestamps are shown)
$ hf spaces secrets ls username/my-space

# Add secrets
$ hf spaces secrets add username/my-space -s OPENAI_API_KEY=sk-...
$ hf spaces secrets add username/my-space --secrets-file .env.secrets

# Delete a secret (confirmation prompt, use --yes to skip)
$ hf spaces secrets delete username/my-space OPENAI_API_KEY --yes

# List, add, and delete variables (values are readable)
$ hf spaces variables ls username/my-space
$ hf spaces variables add username/my-space -e MODEL_ID=gpt2 -e MAX_TOKENS=512
$ hf spaces variables delete username/my-space MAX_TOKENS --yes

[CLI] Add hf spaces secrets and variables subgroups by @davanstrien in #4170
[CLI] Add get_space_secrets + hf spaces secrets ls by @Wauplin in #4182

📚 Documentation: CLI guide · Manage your Space

🪣 Rsync-style trailing slash for bucket folder copies

hf buckets cp now supports rsync-style trailing slash semantics when copying folders. A trailing / on the source path copies only the folder's contents to the destination, while omitting it nests the folder itself — matching the behavior you'd expect from rsync. This makes it possible to flatten directory structures during copies, which was not possible before. Additionally, copy_files now raises an explicit EntryNotFoundError when the source path resolves to no files, instead of silently succeeding with zero operations.

# Without trailing slash: "logs" dir is nested => dst/logs/...
$ hf buckets cp hf://buckets/username/src-bucket/logs hf://buckets/username/dst/

# With trailing slash: only contents of "logs" are copied => dst/...
$ hf buckets cp hf://buckets/username/src-bucket/logs/ hf://buckets/username/dst/

[Buckets] Support rsync-style trailing slash in copy_files by @Wauplin in #4187
[CLI] Raise error when copy_files source doesn't exist by @Wauplin in #4186

📚 Documentation: Buckets guide · CLI guide

💔 Breaking Change

[CLI] Rename hf skills upgrade -> hf skills update by @hanouticelina in #4176 — hf skills upgrade no longer exists; use hf skills update instead.
[CLI] Add out.status() by @hanouticelina in #4171 — status updates (spinners/progress) on hf extensions install and hf spaces dev-mode are now suppressed when using --format json, --quiet, or --format agent.

🖥️ CLI

[CLI] Add hints and example to hf datasets leaderboard by @Wauplin in #4174
[CLI] Shortcut hf update when already on latest version by @julien-c in #4177
[CLI] Remove progress bars on skills update by @Wauplin in #4179
[CLI] Increase default --limit from 10 to 30 for list commands by @Wauplin in #4181
[CLI] Support hf -v to print version by @Wauplin in #4185
[CLI] migrate hf skills to bucket by @hanouticelina in #4175

🐛 Bug and typo fixes

Update typer dependency version in setup.py by @tomaarsen in #4193

🏗️ Internal

Post-release: bump version to 1.14.0.dev0 by @huggingface-hub-bot[bot] in #4172
[Release] Move social drafts to minor-release and archive release notes to bucket by @Wauplin in #4173
Update unit test warnings check to ignore unrelated deprecation warnings by @seanses in #4188
[internal] Untrack useless files by @Wauplin in #4191

May 5, 2026

v5.8.0 ↗

Release 5.8.0

viaTransformers

Release v5.8.0

New Model additions

DeepSeek-V4

DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections for Manifold-Constrained Hyper-Connections (mHC), and bootstraps the first few MoE layers with a static token-id → expert-id hash table. This implementation covers DeepSeek-V4-Flash, DeepSeek-V4-Pro, and their -Base pretrained variants, which share the same architecture but differ in width, depth, expert count and weights.

Links: Documentation | Paper

Add DeepSeek V4 (#45643) by @ArthurZucker in #45643

Gemma 4 Assistant

Gemma 4 Assistant is a small, text-only model that enables speculative decoding for Gemma 4 models using the Multi-Token Prediction (MTP) method and associated candidate generator. The model shares the same Gemma4TextModel backbone as other Gemma 4 models but uses KV sharing throughout the entire model, allowing it to reuse the KV cache populated by the target model and skip the pre-fill phase entirely. This architecture includes cross-attention to make the most of the target model's context, allowing the assistant to accurately predict more drafted tokens per drafting round.

Links: Documentation

First model (#45788) by @SindhuRaghuram97 in #45788

GraniteSpeechPlus

Granite Speech Plus is a variant of Granite Speech that enhances the projector by consuming the concatenation of the encoder's final hidden states with an arbitrary subset of its intermediate hidden states along the feature dimension. It is a multimodal speech-to-text model that can transcribe audio, provide speaker annotation and word level timestamps by responding to text prompts. The model inherits the same architecture components as Granite Speech including the speech encoder, query transformer projector, language model, and optional LoRA adapter.

Links: Documentation

Support for a new Granite-Speech-Plus model (#45695) by @zvik in #45695

Granite4Vision

Granite Vision 4.1 is a vision-language model from IBM Research designed for enterprise-grade document data extraction. It specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction. The model builds on LLaVA-NeXT with architectural innovations including SigLIP2 Vision Encoder, Window Q-Former Projectors, and DeepStack Feature Injection with 8 vision-to-LLM injection points.

Links: Documentation

Add Granite 4.1 Vision (granite4_vision) (#45597) by @artem-spector in #45597

EXAONE-4.5

EXAONE 4.5 is the first open-weight vision language model developed by LG AI Research, integrating a dedicated visual encoder into the existing EXAONE 4.0 framework to expand multimodal capabilities. The model features 33 billion parameters in total, including 1.2 billion parameters from the vision encoder, and achieves competitive performance in general benchmarks while outperforming similar-sized models in document understanding and Korean contextual reasoning. It builds on EXAONE 4.0 with key enhancements including an expanded vocabulary of 153,600 tokens, support for up to 256K token context windows, and a Multi-Token Prediction (MTP) mechanism.

Links: Documentation | Paper | Blog Post

Add EXAONE 4.5 implementations (#45471) by @nuxlear in #45471

PP-FormulaNet

PP-FormulaNet-L and PP-FormulaNet_plus-L are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The models are part of the SLANet series and can be used for image-to-text tasks, specifically for detecting and processing mathematical formulas and table structures from images.

Links: Documentation

[Model] Add PP-FormulaNet Model Support (#45626) by @zhang-prog in #45626

Breaking changes

Apex integration has been removed from the library (including RMSNorm usage in T5 and related models), so users relying on Apex for mixed precision or fused ops should migrate to PyTorch's native equivalents instead.

🚨 Get rid of most Apex references (#45723) by @Rocketknight1

Tokenization

Fixed tokenizer mapping issues for DeepSeek R1 distilled (Qwen2) and DeepSeek OCR models, and resolved a significant performance regression in PreTrainedTokenizer.convert_ids_to_tokens where skip_special_tokens=True was rebuilding the special token set on every iteration, resulting in a ~300x speedup for that code path.

deepseek r1 distilled tokenizer fix for qwen2 mapping (#45741) by @itazap in [#45741]
DeepSeek OCR specifies an incorrect tokenizer class on the Hub (#45739) by @hmellor in [#45739]
PythonBackend slow tokenizer convert_ids_to_tokens fix (#45728) by @i3hz in [#45728]

Bugfixes and improvements

fix: correct spelling in continuous_api docstring (#45749) by @Dhruv908615 in [#45749]
Fix link to modular transformers documentation (#45746) by @SangbumChoi in [#45746]
Gemma4: fix failed test cases (#45568) by @kaixuanliu in [#45568]
Fix CI: Allow more artifacts to be download in CI (#45785) by @ydshieh in [#45785]
Add concurrency to PR CI workflow file (pr-ci-caller.yml) (#45786) by @ydshieh in [#45786]
Reorder decorators for autodoc and dataclass (#45702) by @zucchini-nlp in [#45702]
Unwrap text_config in AutoModelFor*.from_config (#45770) by @jamesbraza in [#45770]
fix: Added Mps support in float fallback backends list (#45687) by @rigen1048 in [#45687]
Github Actions PR CI (caller) (#45476) by @ydshieh in [#45476]
make sure we call check_auto in CI (#45775) by @tarekziade in [#45775]
Fix auto mapping script (#45774) by @Cyrilvallez in [#45774]
[MINISTRAL3] Fix conversion script yarn's apply_scale support. (#45744) by @juliendenize in [#45744]
[nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight (#45591) by @vai-minzhou in [#45591]
fix(utils): Resolve backbone utils test regressions (#45594) by @harshaljanjani in [#45594]
[CB] Better overall script and decode bucketting (#45653) by @remi-or in [#45653]
[docs] model testing (#45152) by @stevhliu in [#45152]
update dev (#45726) by @vasqu in [#45726]
Doc translate to Persian(farsi) (#45664) by @zeoses in [#45664]
[OAI Privacy Filter] Add integration test (#45725) by @vasqu in [#45725]
Speedup Qwen2VLImageProcessor (#45719) by @lgeiger in [#45719]
Remove dead beam-search dummies from dummy_pt_objects.py (#45722) by @jw9603 in [#45722]
chore(typing): add ty type checking for 10 utility files (#45703) by @moonbogi in [#45703]
Llama3 video fix (#45040) by @sywangyi in [#45040]
Fix custom-module copies inheriting read-only permissions (#45686) by @nurpax in [#45686]
Python code in model docs (#45608) by @zucchini-nlp in [#45608]
fix failed test cases for blt model (#45596) by @kaixuanliu in [#45596]
chore(typing): add ty type checking for 3 pipeline files (#45667) by @moonbogi in [#45667]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@artem-spector
- Add Granite 4.1 Vision (granite4_vision) (#45597)
@SindhuRaghuram97
- First model (#45788)
@nuxlear
- Add EXAONE 4.5 implementations (#45471)
@ArthurZucker
- Add DeepSeek V4 (#45643)
@remi-or
- [CB] Better overall script and decode bucketting (#45653)
@zhang-prog
- [Model] Add PP-FormulaNet Model Support (#45626)
@zvik
- Support for a new Granite-Speech-Plus model (#45695)

May 1, 2026

v0.38.0 ↗

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

viaDiffusers

New Pipelines

LLaDA2

LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.

Nucleus-MoE

NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.

Thanks to @sippycoder for the contribution.

Ernie-Image

ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.

Thanks to @HsiaWinter for the contribution.

LongCat-AudioDiT

LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.

Thanks to @RuixiangMa for the contribution.

Ace-Step 1.5

ACE-Step 1.5 generates variable-length stereo audio at 48 kHz (10 seconds to 10 minutes) from text prompts and optional lyrics. The full system pairs a Language Model planner with a Diffusion Transformer (DiT) synthesizer; this pipeline wraps the DiT half of that stack, and consists of three components: an AutoencoderOobleck VAE that compresses waveforms into 25 Hz stereo latents, a Qwen3-based text encoder for prompt and lyric conditioning, and an AceStepTransformer1DModel DiT that operates in the VAE latent space using flow matching.

Thanks to @ChuxiJ for the contribution.

Flux.2 Small Decoder

Make your Flux.2 decoding faster with this new small decoder model from the Black Forest Labs. You can check it out here. It was contributed by @huemin-art in this PR.

Modular Pipeline Support

We added modular support for LTX-2 and Hunyuan 1.5.

Core Library

All commits

[Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226
[LLADA2] documentation fixes by @kashif in #13333
[ci] claude in ci. by @sayakpaul in #13297
[docs] kernels by @stevhliu in #13139
[tests] Tests for conditional pipeline blocks by @sayakpaul in #13247
avoid hardcode device in flux-control example by @kaixuanliu in #13336
fix claude workflow to include id-token with write. by @sayakpaul in #13338
Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337
remove str option for quantization config in torchao by @howardzhang-cv in #13291
[ci] include checkout step in claude review workflow by @sayakpaul in #13352
change minimum version guard for torchao to 0.15.0 by @howardzhang-cv in #13355
[ci] move to assert instead of self.Assert* by @sayakpaul in #13366
[docs] refactor model skill by @stevhliu in #13334
Fix Ulysses SP backward with SDPA by @zhtmike in #13328
Add train flux2 series lora config by @tcaimm in #13011
[docs] Add NeMo Automodel training guide by @pthombre in #13306
Fix: ensure consistent dtype and eval mode in pipeline save/load tests by @YangKai0616 in #13339
[ci] support claude reviewing on forks. by @sayakpaul in #13365
Fix MotionConv2d to cast blur_kernel to input dtype instead of reverse by @YangKai0616 in #13364
chore: update claude_review.yml by @hf-security-analysis[bot] in #13374
corrects single file path validation logic by @andrew-w-ross in #13363
[docs] deprecate pipelines by @stevhliu in #13157
🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #13385
[docs] add auto docstring and parameter templates documentation for m… by @yiyixuxu in #13382
Fix typos and grammar errors in documentation by @GalacticAvenger in #13391
fix(ddim): validate eta is in [0, 1] in DDIMPipeline by @NIK-TIGER-BILL in #13367
Fix Dynamo lru_cache warnings during torch.compile by @jiqing-feng in #13384
[tests] refactor wan autoencoder tests by @sayakpaul in #13371
NucleusMoE-Image by @sippycoder in #13317
Add examples on how to profile a pipeline by @sayakpaul in #13356
Update README.md of the profiling guide by @sayakpaul in #13400
[CI] Refactor Cosmos Transformer Tests by @DN6 in #13335
[tests] refactor autoencoderdc tests by @sayakpaul in #13369
[CI] Hunyuan Transformer Tests Refactor by @DN6 in #13342
Fix VAE offload encode device mismatch in DreamBooth scripts by @azolotenkov in #13417
Remove references to torchao's AffineQuantizedTensor by @andrewor14 in #13405
[tests] fix autoencoderdc tests by @sayakpaul in #13424
[core] fix group offloading when using torchao by @sayakpaul in #13276
Fix IndexError in HunyuanVideo I2V pipeline by @kaixuanliu in #13244
improve Claude CI by @yiyixuxu in #13397
FLUX.2 small decoder by @huemin-art in #13428
[CI] Add PR/Issue Auto Labeler by @DN6 in #13380
[CI] Add GLM Image Transformer Model Tests by @DN6 in #13344
[CI] Use finegrained token for Issue Labeler by @DN6 in #13433
Handle prompt embedding concat in Qwen dreambooth example by @chenyangzhu1 in #13387
fix(qwen-image dreambooth): correct prompt embed repeats when using --with_prior_preservation by @chenyangzhu1 in #13396
Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage by @akshan-main in #13406
[tests] tighten dependency testing. by @sayakpaul in #13332
Fix grammar in LoRA documentation by @Xyc2016 in #13423
Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… by @akshan-main in #13440
[modular] Add LTX Video modular pipeline by @akshan-main in #13378
Add ernie image by @HsiaWinter in #13432
[core] fix fa4 integration by @sayakpaul in #13443
FlashPack by @hlky in #12700
[ptxla] fix pytorch xla inference on TPUs. by @entrpn in #13463
fix some dtype issue for gguf / some gpu backends by @HsiaWinter in #13464
Fix Qwen Image DreamBooth prior-preservation batch ordering by @azolotenkov in #13441
[tests] fix deprecated attention processor testing. by @sayakpaul in #13469
[tests] xfail clip related issues. by @sayakpaul in #13454
[agent] add modular doc by @yiyixuxu in #13410
[tests] fix training tests by @sayakpaul in #13442
fix(profiling): preserve instance isolation when decorating methods by @Akash504-ai in #13471
[Feat] Adds LongCat-AudioDiT pipeline by @RuixiangMa in #13390
Fix Flux2 DreamBooth prior preservation prompt repeats by @azolotenkov in #13415
chore: bump doc-builder SHA for PR upload workflow by @rtrompier in #13476
Remove compile bottlenecks from ZImage pipeline by @hitchhiker3010 in #13461
[chore] Add diffusers-format example to LongCatAudioDiTPipeline by @RuixiangMa in #13483
[core] fix autoencoderkl qwenimage for xla by @sayakpaul in #13480
add PR fork workable by @paulinebm in #13438
Add modular pipeline for HunyuanVideo 1.5 by @akshan-main in #13389
[agents docs] add float64 gotcha by @yiyixuxu in #13472
fix(ernie-image): avoid locals() comprehension scope issue in callback kwargs by @songh11 in #13478
[Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion by @RuixiangMa in #13494
feat: bump safetensors to 0.8.0-rc.0 by @McPatate in #13470
fix(qwen): fix CFG failing when passing neg prompt embeds with none mask by @Sunhill666 in #13379
add an example of spmd for flux on v5e-8 by @sayakpaul in #13474
Add FLUX.2 Klein Inpaint Pipeline by @adi776borate in #13050
[docs] add a mention of torchao and other backends in speed memory docs. by @sayakpaul in #13499
Fix Flux2 non-diffusers guidance LoRA conversion by @yadferhad in #13486
add _native_npu_attention support mask shape like [B,1,1,S] by @chang-zhijie in #13490
fix(freeu): run FFT in float32 for float16 inputs to avoid ComplexHalf by @Ricardo-M-L in #13503
Fix non-deterministic T5 outputs in HiDream pipeline tests by @kaixuanliu in #13534
Fix AuraFlow attn processors applying norm_added_q to key projection by @Ricardo-M-L in #13533
add _repeated_blocks for ErnieImageTransformer2DModel by @kaixuanliu in #13496
[CI] Fix BnB tests by @DN6 in #13481
[tests] fix group offloading with disk tests by @sayakpaul in #13491
[ci] feat: have pr labeler label for closing issues. by @sayakpaul in #13548
Improve trust_remote_code by @hlky in #13448
chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #13555
[ci] simplify release workflow. by @sayakpaul in #13329
[attention backends] fix ring CP for flash and flash 3 by @sayakpaul in #13182
[agents docs] add pipelines.md etc by @yiyixuxu in #13567
Add Ernie-Image modular pipeline by @akshan-main in #13498
[agents docs] update modular.md by @yiyixuxu in #13568
[docs] fix typo in AutoencoderOobleck docs by @ivnvalex in #13642)
Fix ErnieImagePipeline pre-computed prompt_embeds + num_images_per_prompt shape mismatch by @Ricardo-M-L in #13532
feat: support ring attention with arbitrary KV sequence lengths by @songh11 in #13545
[ci] use tokenizers stable installtion in CI. by @sayakpaul in #13562
NucleusMoE docs by @sayakpaul in #13661
Fix UniPC scheduler device mismatch when using offloading by @ParamChordiya in #13489
[Ernie-Image] Add lora support by @asomoza in #13575
Add ACE-Step pipeline for text-to-music generation by @ChuxiJ in #13095
Fix missing latents_bn_std dtype cast in VAE normalization by @adi776borate in #13299
Release: v0.38.0-release by @sayakpaul (direct commit on v0.38.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kashif
- [Discrete Diffusion] Add LLaDA2 pipeline (#13226)
- [LLADA2] documentation fixes (#13333)
@howardzhang-cv
- remove str option for quantization config in torchao (#13291)
- change minimum version guard for torchao to 0.15.0 (#13355)
@sippycoder
- NucleusMoE-Image (#13317)
@DN6
- [CI] Refactor Cosmos Transformer Tests (#13335)
- [CI] Hunyuan Transformer Tests Refactor (#13342)
- [CI] Add PR/Issue Auto Labeler (#13380)
- [CI] Add GLM Image Transformer Model Tests (#13344)
- [CI] Use finegrained token for Issue Labeler (#13433)
- [CI] Fix BnB tests (#13481)
@akshan-main
- Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage (#13406)
- Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440)
- [modular] Add LTX Video modular pipeline (#13378)
- Add modular pipeline for HunyuanVideo 1.5 (#13389)
- Add Ernie-Image modular pipeline (#13498)
@HsiaWinter
- Add ernie image (#13432)
- fix some dtype issue for gguf / some gpu backends (#13464)
@hlky
- FlashPack (#12700)
- Improve trust_remote_code (#13448)
@RuixiangMa
- [Feat] Adds LongCat-AudioDiT pipeline (#13390)
- [chore] Add diffusers-format example to LongCatAudioDiTPipeline (#13483)
- [Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion (#13494)
@adi776borate
- Add FLUX.2 Klein Inpaint Pipeline (#13050)
- Fix missing latents_bn_std dtype cast in VAE normalization (#13299)
@ChuxiJ
- Add ACE-Step pipeline for text-to-music generation (#13095)

Apr 30, 2026

v1.13.0 ↗

[v1.13.0] new CLI commands and formatting, and HF URI parsing

viahuggingface_hub

🖥️ New CLI commands: repo cards, file listings, and dataset leaderboards

This release adds three new CLI capabilities for exploring Hub content. hf models card, hf datasets card, and hf spaces card fetch the README of any repo and print it to stdout, with --metadata (YAML frontmatter as JSON) and --text (prose only) flags for splitting the card into its structured and unstructured parts. Calling hf models ls <repo_id>, hf datasets ls <repo_id>, or hf spaces ls <repo_id> now switches from listing repos to listing files inside that repo, with --tree, -R, -h, and --revision options mirroring the existing hf buckets ls behavior. And hf datasets leaderboard <dataset_id> surfaces model scores submitted to a benchmark dataset, making it easy to compare models by score from the terminal.

# Get model card metadata as JSON
hf models card google/gemma-4-31B-it --metadata --format json

# List files in a model repo (tree view with sizes)
hf models ls meta-llama/Llama-3.2-1B-Instruct --tree -h

# Show top 5 models on SWE-bench
hf datasets leaderboard SWE-bench/SWE-bench_Verified --limit 5

📚 Documentation: CLI guide

[CLI] Add hf models card and hf datasets card commands by @davanstrien in #4118
[CLI] Add file listing to models/datasets/spaces ls by @Wauplin in #4166
[CLI] add hf datasets leaderboard by @hanouticelina in #4154

:rocket: Manage Spaces from the CLI

Three new hf spaces subcommands bring full lifecycle control to the terminal. hf spaces pause and hf spaces restart stop or rebuild a Space (with --factory-reboot for a clean rebuild), and hf spaces settings lets you configure sleep time and hardware in one call. A companion hf spaces hardware command lists all available hardware flavors with pricing, so you can discover options before changing settings. Pause and restart include a confirmation prompt (-y to skip) since they tear down the running container.

# Pause a Space when not in use (not billed while paused)
hf spaces pause username/my-space

# Restart with a GPU
hf spaces settings username/my-space --hardware t4-medium --sleep-time 3600

# List available hardware options
hf spaces hardware

📚 Documentation: CLI guide — Spaces

[CLI] Add spaces lifecycle commands: pause, restart, sleep by @davanstrien in #4155
[CLI] Add hf spaces hardware command by @Wauplin in #4169
[CLI] Add --hardware flag to hf spaces settings by @davanstrien in #4163

:arrows_clockwise: `hf update` replaces the auto-update prompt

The blocking interactive Y/n auto-update prompt at CLI startup is gone. It was catching too many non-interactive contexts (CI runners, Homebrew post-install hooks, Jupyter notebooks) and hanging automation. In its place, a single yellow stderr warning suggests running hf update — a new command that detects how hf was installed (Homebrew, standalone installer, or pip) and runs the right upgrade command. Set HF_HUB_DISABLE_UPDATE_CHECK=1 to silence the startup check entirely, for example in offline CI.

hf update

📚 Documentation: CLI guide — Updating

[CLI] Add hf update + drop interactive update prompt by @Wauplin in #4131

:pencil2: Global output formatting for every command

The --format, --json, and -q / --quiet flags are now handled globally by the CLI framework instead of being declared individually on each command. This means every hf command automatically accepts them — no more per-command --format boilerplate, and the flags are properly documented in a dedicated "Formatting options" section in every --help page. --format auto (the default) picks human for interactive terminals and agent when invoked by an AI agent, making CLI output automatically suitable for both people and tools.

# JSON output for scripting
hf models ls --search bert --limit 2 --json | jq '.[].id'

# IDs only, one per line
hf collections ls --owner nvidia -q

📚 Documentation: CLI guide — Output formatting

[CLI] Make --format / --json / -q global by @Wauplin in #4162

🔗 Centralized `hf://` URI parsing

A new parse_hf_uri function and HfUri dataclass provide a single source of truth for parsing hf://... strings across the library. Whether you reference a model, dataset, space, bucket, or file inside a repo, the parser handles all valid URI shapes — type prefixes, revisions, and paths — and rejects invalid ones with clear error messages. A companion parse_hf_mount / HfMount handles volume mount specifications (hf://...:/mnt:ro). Both are pure string parsers (no network calls) and round-trippable via .to_uri().

from huggingface_hub import parse_hf_uri, parse_hf_mount

parse_hf_uri("hf://datasets/namespace/my-dataset@refs/pr/3/train.json")
# HfUri(type='dataset', id='namespace/my-dataset', revision='refs/pr/3', path_in_repo='train.json')

parse_hf_mount("hf://buckets/my-org/my-bucket/sub/dir:/mnt:ro")
# HfMount(source=HfUri(type='bucket', id='my-org/my-bucket', ...), mount_path='/mnt', read_only=True)

📚 Documentation: HF URIs reference

Centralize hf:// URI parsing by @Wauplin in #4158

🚀 Bucket transport for Jobs script upload

Local scripts uploaded by hf jobs uv run are now stored in a {namespace}/jobs-artifacts bucket and mounted into the job container at /data instead of being base64-encoded into an environment variable. The old bash -c + xargs + base64 -d pipeline was fragile and required manual shell quoting. Bucket transport is simpler, easier to debug, and supports write-back: jobs can persist output artifacts to /data/ since the mount is read-write. The base64 transport path has been fully removed with no fallback.

Add bucket+mount transport for Jobs script upload by @davanstrien in #4025

🖥️ CLI

[CLI] Print help when leaf command with required args is called without args by @Wauplin in #4135

🤖 Inference

[Inference Providers] Add DeepInfra support by @hanouticelina in #4114
Support list[str] inputs in feature_extraction by @SJeffZhang in #4115

📖 Documentation

[CLI] Add benchmark dataset filter examples by @hanouticelina in #4156

🐛 Bug and typo fixes

[BUG FIX]: hf_hub_download crashes when stderr lacks a real file descriptor by @tobocop2 in #4065
[CLI] Fix datasets list table rendering by @hanouticelina in #4157
[CLI] Fix installation method detection for curl-installed hf with Homebrew Python by @Wauplin in #4142
Avoid reuploading preuploaded LFS files in upload-large-folder by @Dev-Jahn in #4165

🏗️ Internal

[Release] Make release-notes job fail loudly on bad model/empty output by @Wauplin in #4138
[Release] Fix bucket URL in social posts Slack notification by @Wauplin in #4139
Post-release: bump version to 1.13.0.dev0 by @huggingface-hub-bot[bot] in #4140
[CI] Fix two flaky Windows tests (root causes, not skips) by @Wauplin in #4141
[Quality] Fix uvx ty check src errors by @Wauplin in #4159
[Release] Mark minor releases as "latest" on GitHub by @Wauplin in #4167

Apr 29, 2026

v1.12.2 ↗

[v1.12.2] Add DeepInfra support for Inference Providers

viahuggingface_hub

[Inference Providers] Add DeepInfra support in https://github.com/huggingface/huggingface_hub/pull/4114 by @hanouticelina

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v1.12.1...v1.12.2

Apr 28, 2026

v5.7.0 ↗

Release v5.7.0

viaTransformers

New Model additions

Laguna

Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing different decoder layers to have different query-head counts while sharing the same KV cache shape, and implements a sigmoid MoE router with auxiliary-loss-free load balancing that uses element-wise sigmoid of gate logits plus learned per-expert bias for router scoring.

Links: Documentation

Laguna XS.2 implementation (#45673) by @joerowell in #45673

DEIMv2

DEIMv2 (DETR with Improved Matching v2) is a real-time object detection model that extends DEIM with DINOv3 features and spans eight model sizes from X to Atto for diverse deployment scenarios. It uses a Spatial Tuning Adapter (STA) for larger variants to convert DINOv3's single-scale output into multi-scale features, while ultra-lightweight models employ pruned HGNetv2 backbones. The unified design achieves superior performance-cost trade-offs, with DEIMv2-X reaching 57.8 AP with only 50.3M parameters and DEIMv2-S being the first sub-10M model to exceed 50 AP on COCO.

Links: Documentation | Paper

model: Add DEIMv2 to Transformers (#44339) by @harshaljanjani in #44339

Attention

Several attention-related bugs were fixed across multiple models, including a cross-attention cache type error in T5Gemma2 for long inputs, incorrect cached forward behavior in Qwen3.5's gated-delta-net linear attention, and a crash in GraniteMoeHybrid when no Mamba layers are present. Attention function dispatch was also updated to align with the latest model implementations.

Fix cross-attention cache layer type for T5Gemma2 long inputs (#45540) by @Beichen-Ma in [#45540]
[Qwen3.5] Fix GDN linear attention multi-token cached forward (#45513) by @kashif in [#45513]
Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models (#45514) by @tianhaocui in [#45514]
Align latest model attention function dispatch (#45598) by @Cyrilvallez in [#45598]

Tokenizers

There was a bug in AutoTokenizer that caused the wrong tokenizer class to be initialized. This caused regressions in models like DeepSeek R1.

change got reverted (#45680) by @itazap in [#45680]

Generation

Continuous batching generation received several fixes and improvements, including correcting KV deduplication and memory estimation for long sequences (16K+), and removing misleading warnings about num_return_sequences and other unsupported features that were incorrectly firing even when functionality worked correctly. Documentation for per-request sampling parameters was also added.

generate: drop stale num_return_sequences warning on continuous batching path (#45582) by @joaquinhuigomez in [#45582]
Remove unnecessary generate warnings (#45619) by @Cyrilvallez in [#45619]
[CB] Changes for long generation (#45530) by @remi-or in [#45530]
[docs] per-request sampling params (#45553) by @stevhliu in [#45553]

Kernels

Improved kernel support by fixing configuration reading and error handling for FP8 checkpoints (e.g., Qwen3.5-35B-A3B-FP8), enabling custom expert kernels registered from the HF Hub to be properly loaded, and resolving an incompatibility that prevented Gemma3n and Gemma4 from using the rotary kernel.

Fix configuration reading and error handling for kernels (#45610) by @hmellor in [#45610]
Allow for registered experts from kernels hub (#45577) by @winglian in [#45577]
Gemma3n and Gemma4 cannot use rotary kernel (#45564) by @Cyrilvallez in [#45564]

Bugfixes and improvements

fixing more typos (#45689) by @vasqu in [#45689]
[docs] cb memory management (#45587) by @stevhliu in [#45587]
[docs] cpu offloading (#45660) by @stevhliu in [#45660]
docs(README_zh-hans): clarify conditions for not using Transformers (#45688) by @GuaiZai233 in [#45688]
fix padding side issue for fast_vlm tests (#45592) by @kaixuanliu in [#45592]
Fix x_clip: 8 failed test cases (#45394) by @kaixuanliu in [#45394]
zero_shot_object_detection ValueError fix for python 3.13 (#45669) by @AnkitAhlawat7742 in [#45669]
Fix pageable H2D copies in Gated DeltaNet PyTorch fallback (#45665) by @ruixiang63 in [#45665]
Fix UnboundLocalError in shard_and_distribute_module for replicated parameters (#45675) by @Abdennacer-Badaoui in [#45675]
[MistralCommonBackend] Soften validation mode and apply_chat_template arguments check (#45628) by @juliendenize in [#45628]
Fix NameError: PeftConfigLike triggered by PreTrainedModel.__init_subclass__ (#45658) by @qgallouedec in [#45658]
chore(typing): added modeling_utils to ty (#45425) by @tarekziade in [#45425]
[gemma4] infer from config instead of hardcoding (#45606) by @eustlb in [#45606]
Update quants tests (#45480) by @SunMarc in [#45480]
🔴🔴🔴 fix: skip clean_up_tokenization for BPE tokenizers in PreTrainedTokenizerFast (#44915) by @maxsloef-goodfire in [#44915]
Fix colmodernvbert tests (#45652) by @Cyrilvallez in [#45652]
[CB] [Major] Add CPU request offloading (#45184) by @remi-or in [#45184]
Fix peft constructors (#45622) by @Cyrilvallez in [#45622]
chore: speedup modular converter (~30%) (#45046) by @tarekziade in [#45046]
Fix whisper return language (#42227) by @FredHaa in [#42227]
Add supports_gradient_checkpointing to NemotronHPreTrainedModel (#45625) by @sergiopaniego in [#45625]
Raise clear error for problem_type="single_label_classification" with num_labels=1 (#45611) by @gaurav0107 in [#45611]
CircleCI with torch 2.11 (#45633) by @ydshieh in [#45633]
chore: bump doc-builder SHA for main doc build workflow (#45631) by @rtrompier in [#45631]
Allow more artifacts to be download in CI (#45629) by @ydshieh in [#45629]
chore(qa): split pipeline and add type checking (#45432) by @tarekziade in [#45432]
Skip failing offloading tests (#45624) by @Cyrilvallez in [#45624]
fix: compute auxiliary losses when denoising is disabled in D-FINE (#45601) by @Abineshabee in [#45601]
qa: bumped mlinter and allow local override (#45585) by @tarekziade in [#45585]
Processing Utils: continue when content is a string (#45605) by @RyanMullins in [#45605]
SonicMoe (#45433) by @IlyasMoutawwakil in [#45433]
fix transformers + torchao nvfp4 serialization (#45573) by @vkuzo in [#45573]
[AMD CI] Fix expectations for Gemma3n (#45602) by @Abdennacer-Badaoui in [#45602]
[docs] multi-turn tool calling (#45554) by @stevhliu in [#45554]
Fix AttributeError on s_aux=None in flash_attention_forward (#45589) by @jamesbraza in [#45589]
do not index past decoded chars with special tokens (#45435) by @itazap in [#45435]
Update dev version (#45583) by @vasqu in [#45583]
Update torchao usage for XPU and CPU (#45560) by @jiqing-feng in [#45560]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vasqu
- fixing more typos (#45689)
- Update dev version (#45583)
@joerowell
- Laguna XS.2 implementation (#45673)
@tarekziade
- chore(typing): added modeling_utils to ty (#45425)
- chore: speedup modular converter (~30%) (#45046)
- chore(qa): split pipeline and add type checking (#45432)
- qa: bumped mlinter and allow local override (#45585)
@harshaljanjani
- model: Add DEIMv2 to Transformers (#44339)
@remi-or
- [CB] [Major] Add CPU request offloading (#45184)
- [CB] Changes for long generation (#45530)

Apr 27, 2026

4.8.5 ↗

viaDatasets

Main bug fixes

fix: decode Json() values before calling DataFrame.to_json() (#8116) by @Brianzhengca in https://github.com/huggingface/datasets/pull/8122
Fix: decode JSON type before to_list or to_dict is called by @ItsTania in https://github.com/huggingface/datasets/pull/8137
Fix batching for table-formatted datasets by @bluehyena in https://github.com/huggingface/datasets/pull/8126
Fix iterable map resume state by @Brianzhengca in https://github.com/huggingface/datasets/pull/8147
don't embed remote files in download_and_prepare to parquet by @lhoestq in https://github.com/huggingface/datasets/pull/8150

Other improvements and bug fixes

Parse agent traces by @lhoestq in https://github.com/huggingface/datasets/pull/8113
🔒 Pin GitHub Actions to commit SHAs by @paulinebm in https://github.com/huggingface/datasets/pull/8114
chore: bump doc-builder SHA for PR upload workflow by @rtrompier in https://github.com/huggingface/datasets/pull/8134
Remove print statement in JSON processing by @lhoestq in https://github.com/huggingface/datasets/pull/8136
Don't include files list DatasetInfo (and remove old stuff) by @lhoestq in https://github.com/huggingface/datasets/pull/8128
update ci uer by @lhoestq in https://github.com/huggingface/datasets/pull/8139
fix warning in ci by @lhoestq in https://github.com/huggingface/datasets/pull/8140
fix mask in embed_storage for remote files by @lhoestq in https://github.com/huggingface/datasets/pull/8151
fix original_files missing in ci json test by @lhoestq in https://github.com/huggingface/datasets/pull/8152
Fix null in embed storage by @lhoestq in https://github.com/huggingface/datasets/pull/8154
Fix base_path in integration tests by @lhoestq in https://github.com/huggingface/datasets/pull/8155

New Contributors

@paulinebm made their first contribution in https://github.com/huggingface/datasets/pull/8114
@Brianzhengca made their first contribution in https://github.com/huggingface/datasets/pull/8122
@bluehyena made their first contribution in https://github.com/huggingface/datasets/pull/8126
@rtrompier made their first contribution in https://github.com/huggingface/datasets/pull/8134
@ItsTania made their first contribution in https://github.com/huggingface/datasets/pull/8137

Full Changelog: https://github.com/huggingface/datasets/compare/4.8.4...4.8.5

v0.23.1 ↗

Release v0.23.1

viaTokenizers

TL;DR

tokenizers 0.23.1 is the first proper stable release in the 0.23 line — 0.23.0 only ever shipped as rc0 because the release pipeline itself was broken (Node side hadn't shipped multi-platform binaries since 2023, Python side was on pyo3 0.27 without free-threaded support). 0.23.1 is the version where everything actually goes out the door together: full Node multi-platform wheels for the first time in years, Python 3.14 (regular and free-threaded 3.14t), full type hints for every Python class, and a stack of measurable perf wins on the BPE / added-vocab hot paths.

There is no functional 0.23.0 published — we tag 0.23.1 directly so users don't accidentally pull a never-shipped version.

🚨 Breaking changes

Drop Python 3.9 (#1952) — requires-python = ">=3.10"; 3.9 users stay on 0.22.x.
add_tokens normalizes content at insertion (#1995) — re-saved tokenizer.json may differ in the added_tokens block. Existing files load unchanged.
Type stubs are precise (#1928, #1997) — methods that returned Any now return real types; mypy --strict may surface previously-hidden errors. Stub layout also moved from tokenizers/<sub>/__init__.pyi to tokenizers/<sub>.pyi. This breaks the surface of some of the processors like RobertaProcessign's __init__ .
3.14t-only: setters/getters return PyResult<T> because of Arc<RwLock<Tokenizer>>; a poisoned lock surfaces as PyException instead of a panic.

⚡ Performance — measured locally on this Mac, not lifted from PRs

Run with cargo bench --bench <name> -- --save-baseline v0_22_2 on v0.22.2, then --baseline v0_22_2 on v0.23.1. Numbers are point-in-time wall clock on a single laptop; relative deltas are what matters, absolute numbers will differ on CI hardware.

Added-vocabulary deserialize — the headline win (#1995, #1999)

bench: improve added_vocab_deserialize to reflect real-world workloads (#2000) is now representative of how transformers actually loads tokenizer.json files. The combined effect of daachorse for the matching automaton plus the normalize-on-insert refactor is enormous on this workload:

benchmark	v0.22.2	v0.23.1	change
100k tokens, special, no norm	~410 ms	248 ms	−40%
100k tokens, non-special, no norm	~7.1 s	273 ms	−96%
100k tokens, special, NFKC	~395 ms	235 ms	−40%
100k tokens, non-special, NFKC	~7.4 s	290 ms	−96%
400k tokens, special, no norm	~15 s	980 ms	−94%

Real-world impact: loading a Llama-3-style tokenizer with a large set of added tokens dropped from "noticeable pause" to "instant".

BPE encode

benchmark	v0.22.2	v0.23.1	change
`BPE GPT2 encode batch, no cache`	530 ms	446 ms	−16%
`BPE GPT2 encode batch` (cached)	690 ms	685 ms	noise
`BPE GPT2 encode` (single)	1.95 s	1.94 s	noise
`BPE Train (small)`	32.6 ms	31.5 ms	−3%
`BPE Train (big)`	1.01 s	988 ms	−2%

The BPE per-thread cache PR (#2028) shows much larger wins on highly-parallel workloads (+47–62% at 88+ threads on a server box, per the PR's own measurements on Vera). Single-thread batch numbers above are flat or slightly improved because cache-hit overhead was already low without contention.

Llama-3 encode

benchmark	v0.22.2	v0.23.1	change
`llama3-encode` (single)	2.10 s	2.02 s	−4%
`llama3-batch`	438 ms	408 ms	−7%
`llama3-offsets`	410 ms	395 ms	−4%

Truncation early exit (#1990)

Right-direction truncation no longer pre-tokenizes past max_length. The new truncation_benchmark doesn't exist on v0.22.2 so there's no apples-to-apples here, but the PR's own measurements on the same machine showed −20–28% across a range of max_length values for right-truncation; left-truncation unchanged.

Other perf improvements (no direct comparable bench)

BPE::Builder::build no longer formats strings in a hot loop (#2010) — ~45% faster Tokenizer::from_file on Llama-3 in the PR's profile.
BPE per-thread cache (#2028) — see Vera numbers in PR description for parallel scale-out.

🔄 Serialization / deserialization

The tokenizer.json format is forward-compatible: existing files load on 0.23 unchanged. Two things to know if you re-save:

added_tokens entries created via add_tokens(..., normalized=True) will have their content normalized at save time — see breaking-change note above.
tokenizer.train(...) no longer keeps a redundant added_tokens/special_tokens Vec separate from the added_tokens_map_r. Public API surface unchanged; only the internal struct shape moved.

bench: improve added_vocab_deserialize to reflect real-world workloads (#2000) lands a more realistic micro-benchmark for this surface; if you're tracking deserialize perf in your own CI, the new bench is the one to compare against.

🐍 Python: free-threaded 3.14t support

Dedicated wheels for python3.14t (the free-threaded build introduced in PEP 703). The wheel:

Declares Py_MOD_GIL_NOT_USED, so importing tokenizers does not force the GIL back on.
Builds without the abi3 cargo feature (free-threaded Python doesn't expose the limited API).
Goes through Arc<RwLock<Tokenizer>> for the inner state so concurrent setters and encoders don't race PyO3's per-pyclass borrow check.

A new stress-test module tests/test_freethreaded.py exercises N-encoder × M-setter races on a single Tokenizer and asserts no RuntimeError: Already borrowed, no RwLock poisoning, and that sys._is_gil_enabled() is False post-import.

For the regular CPython wheel everything is unchanged.

📦 Node.js bindings: first proper multi-platform release since 2023

The npm package now ships 13 platforms (macOS x64/arm64/universal, Windows x64/i686/arm64, Linux x64/arm64/armv7 in both glibc and musl, Android arm64/armv7) — previous workflows only built 3 of those, leaving Apple Silicon / Linux ARM / Alpine users with package-not-found errors since 2023 (#1365, #1703, #1922). Fixed via #1970 + #2034, which also bumps @napi-rs/cli to v3 and switches cross-builds to cargo-zigbuild.

🧷 Type hints & typing for all classes (#1928, #1997)

Every class in the python bindings now ships proper .pyi stubs — Tokenizer, AddedToken, Encoding, every decoder / model / normalizer / pre-tokenizer / processor / trainer. Editors and type checkers (mypy, pyright, ty) see real signatures with types and docstrings instead of falling back to Any.

The stubs are generated automatically from the compiled extension via tools/stub-gen (Rust binary using pyo3-introspection). Re-running make style regenerates them; CI guards against regenerated-vs-checked-in drift. If the generator ever returns 0 docstrings (e.g. because the [patch.crates-io] pin in .cargo/config.toml falls out of sync with the pyo3 dep version), it now hard-aborts with a precise diagnostic instead of silently emitting bare-bones stubs.

>>> from tokenizers import Tokenizer
>>> # IDEs now resolve every method, every kwarg, every return type
>>> Tokenizer.from_pretrained("bert-base-cased")

⚠️ As called out in breaking changes: stricter type info means previously-hidden type errors in user code may now surface under mypy --strict.

✨ Other features

Unigram sampling: models.Unigram now exposes alpha and nbest_size for subword regularization (parity with Google's implementation, #1994). Closes long-standing requests #730 and #849.
Weakref support on Tokenizer (#1958) — useful for long-lived caches that don't want to keep tokenizers alive.
CI benchmark regression detection on PRs (#2013) — every PR runs ci_benchmark against the stored baseline and posts a comparison chart to the PR.
Longer-context Llama-3 benchmarks (#1971) for tracking head-room on multi-thousand-token inputs.

🛠 Other fixes

EncodingVisualizer: unclosed annotation span fixed (#1911), HTML escape applied to output (#1937).
DecodeStream: __copy__ / __deepcopy__ (#1930).
Pre-tokenize: removed an unnecessary to_vec() from slice (#1964).
Replace wget / norvig URL with HF Hub downloads in test data fetch (#2018).
uv support in the Python Makefile (#1977).
Several security-pin bumps on workflow SHAs (#2004, #2005, #2006, #2016, #2017).

👥 Contributors

Thanks to everyone who shipped commits between v0.22.2 and v0.23.1:

@ArthurZucker, @finnagin, @gordonmessmer, @jberg5, @kennethsible, @llukito, @MayCXC, @McPatate, @michaelfeil, @mrkm4ntr, @musicinmybrain, @ngoldbaum, @OhashiReon, @paulinebm, @podarok, @rtrompier, @sebpop, @Shivam-Bhardwaj, @threexc, @wheynelau, @xanderlent — plus @dependabot and @hf-security-analysis for keeping pins fresh.

Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.22.2...v0.23.1

Apr 26, 2026

v1.3.0 ↗

viaTRL

Features

Qwen 3.6 integration

TRL v1.3 ships training support for the new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B). Qwen 3.6 reuses the Qwen3_5Moe* architecture but ships a slightly different chat template (adds a preserve_thinking flag, tweaks tool-arg stringification), so exact-string template matching needed updates across the stack.

What landed:

Chat templates: qwen3_6.jinja (verbatim from upstream) and qwen3_6_training.jinja (prefix-preserving + {% generation %} markers for assistant_only_loss=True)
Response schema: routes to the existing qwen3_5_schema for tool-call parsing — output format unchanged
Tiny test models for VLM training: tiny-Qwen3_5MoeForConditionalGeneration-3.6 (with MoE-specific shrinking)
Test matrix updated across SFT/DPO/GRPO/RLOO test_(train|training)_vlm cases

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),  # works out of the box
    train_dataset=dataset,
)
trainer.train()

Tool-calling agent training also works end-to-end via the existing Qwen 3.5 response schema:

from trl import GRPOConfig, GRPOTrainer

def multiply(a: int, b: int) -> int:
    """
    Multiplies two integers.

    Args:
        a: The first integer.
        b: The second integer.

    Returns:
        The product of the two integers.
    """
    return a * b

trainer = GRPOTrainer(
    model="Qwen/Qwen3.6-27B",
    reward_funcs=my_reward_fn,
    args=GRPOConfig(...),
    train_dataset=dataset,
    tools=[multiply],
)
trainer.train()

by @qgallouedec in https://github.com/huggingface/trl/pull/5642

New experimental TPO trainer

A new experimental TPOTrainer implements Triple Preference Optimization, which augments DPO with a reference (gold) completion alongside chosen/rejected. The paper reports +7-19 points over DPO/SimPO on Arena-Hard, MixEval-Hard, MMLU-Pro and GSM8K, with less data.

from trl.experimental.tpo import TPOConfig, TPOTrainer

trainer = TPOTrainer(
    model="Qwen/Qwen3-0.6B",
    args=TPOConfig(output_dir="Qwen3-0.6B-TPO"),
    train_dataset=load_dataset("tpo-alignment/triple-preference-ultrafeedback-40K", split="train"),
)
trainer.train()

by @kashif in https://github.com/huggingface/trl/pull/5506

Speculative decoding in `trl vllm-serve`

A new --speculative_config JSON flag exposes vLLM's speculative decoding directly through trl vllm-serve — works with native MTP heads (Qwen3 Next), Eagle3 drafts, etc. — without forking the serve script.

# Qwen3 native MTP (no extra draft model)
trl vllm-serve --model Qwen/Qwen3-Next-80B-A3B-Instruct \
    --speculative_config '{"method": "qwen3_next_mtp", "num_speculative_tokens": 5}'

# Eagle3 draft model
trl vllm-serve --model Qwen/Qwen3-32B \
    --speculative_config '{"model": "RedHatAI/Qwen3-32B-speculator.eagle3", "method": "eagle3", "num_speculative_tokens": 3}'

by @Ofir408 in https://github.com/huggingface/trl/pull/5605

KTO ↔ DPO alignment: nearing the finish line

Twelve more alignment PRs this cycle, bringing KTOTrainer and DPOTrainer essentially into structural parity. Notable shifts include moving completion assembly out of _prepare_dataset into a new DataCollatorForKTO, inlining the two-pass tokenization into a single pass, removing BOS/EOS handling, and supporting IterableDataset and dict eval_dataset. The goal — promoting KTO out of experimental and into stable — is now within reach for an upcoming release.

PRs (all by @albertvillanova): #5582, #5578, #5579, #5583, #5587, #5599, #5601, #5600, #5606, #5612, #5632, #5635

More `{% generation %}` training chat templates

Three more model families gain training-compatible chat templates with {% generation %} markers, so assistant_only_loss=True works out of the box:

Gemma / Gemma 2 by @ps-abhi in https://github.com/huggingface/trl/pull/5523
Phi-3 by @RudrenduPaul in https://github.com/huggingface/trl/pull/5526
GLM-4-MoE by @casinca in https://github.com/huggingface/trl/pull/5519

Other

Support processor in maybe_apply_chat_template by @albertvillanova in https://github.com/huggingface/trl/pull/5567
Support VLM processors in is_chat_template_prefix_preserving by @qgallouedec in https://github.com/huggingface/trl/pull/5558
Check prefix preservation at the token level (not string level) by @qgallouedec in https://github.com/huggingface/trl/pull/5559
Drop vLLM 0.11 support by @qgallouedec in https://github.com/huggingface/trl/pull/5549
Remove forward_masked_logits by @qgallouedec in https://github.com/huggingface/trl/pull/5626
Remove dead token attributes from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5565
Set _tokenizer as trainer attribute by @albertvillanova in https://github.com/huggingface/trl/pull/5489
Use PreTrainedTokenizerBase for tokenizer type hints by @qgallouedec in https://github.com/huggingface/trl/pull/5629
Renaming of internal variables: async_reward_X to async_X by @qgallouedec in https://github.com/huggingface/trl/pull/5616

Fixes

Fix entropy calculation in SFT — three bugs at once: misaligned by one position (next-token shift), averaged over the wrong tokens (used attention_mask instead of label != -100), and wrong cross-rank aggregation (unweighted mean instead of sum/count). The reported entropy under completion_only_loss=True and sequence parallelism is now correct. Same fix applied to DPO entropy logging. By @qgallouedec in https://github.com/huggingface/trl/pull/5620
Pass AsyncGRPOTrainer's processing_class to AsyncRolloutWorker by @xuanduy04 in https://github.com/huggingface/trl/pull/5538
Fix generate_tiny_models for gpt-oss by @albertvillanova in https://github.com/huggingface/trl/pull/5622
Fix docstring style in vllm-serve script by @albertvillanova in https://github.com/huggingface/trl/pull/5628
Replace wrong comment about chat template with EOS by @albertvillanova in https://github.com/huggingface/trl/pull/5607

Documentation and Examples

Add chat templates page to web docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5581
Update AsyncGRPO example with GSM8K and tested hyperparameters by @sergiopaniego in https://github.com/huggingface/trl/pull/5580
Update RapidFire AI integration with FSDP and multi-backend tracking by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/5618

CI

Add doc-builder style check to pre-commit and CI by @albertvillanova in https://github.com/huggingface/trl/pull/5630
Align and update doc-builder commit hash in CI GitHub Actions by @albertvillanova in https://github.com/huggingface/trl/pull/5631
Hotfix CI: Add ruff dependency to doc-builder style check by @albertvillanova in https://github.com/huggingface/trl/pull/5634
Fix CI with dev dependencies for Llava models by @albertvillanova in https://github.com/huggingface/trl/pull/5499
Add additional model parameters to TestSupportsToolCalling for improved coverage by @qgallouedec in https://github.com/huggingface/trl/pull/5537
Differentiate Phi-3 and Phi-3.5 in tests by @qgallouedec in https://github.com/huggingface/trl/pull/5546

New Contributors

@Ofir408 made their first contribution in https://github.com/huggingface/trl/pull/5605
@ps-abhi made their first contribution in https://github.com/huggingface/trl/pull/5523

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5577
Support processor in maybe_apply_chat_template by @albertvillanova in https://github.com/huggingface/trl/pull/5567
Remove dead token attributes from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5565
Support VLM processors in is_chat_template_prefix_preserving by @qgallouedec in https://github.com/huggingface/trl/pull/5558
Align KTO with DPO: Align add_model_tags by @albertvillanova in https://github.com/huggingface/trl/pull/5582
Align KTO with DPO: Align processing_class initialization by @albertvillanova in https://github.com/huggingface/trl/pull/5578
Align KTO with DPO: Align _prepare_dataset by @albertvillanova in https://github.com/huggingface/trl/pull/5579
Align KTO with DPO: Align ref_model preparation for distributed training by @albertvillanova in https://github.com/huggingface/trl/pull/5583
Align KTO with DPO: Make conditional prompt extraction and unpairing in _prepare_dataset by @albertvillanova in https://github.com/huggingface/trl/pull/5587
Update AsyncGRPO example with GSM8K and tested hyperparameters by @sergiopaniego in https://github.com/huggingface/trl/pull/5580
[docs] Add chat templates page to web docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5581
Add additional model parameters to TestSupportsToolCalling for improved coverage by @qgallouedec in https://github.com/huggingface/trl/pull/5537
Fix CI with dev dependencies for Llava models by @albertvillanova in https://github.com/huggingface/trl/pull/5499
Differentiate Phi-3 and Phi-3.5 in tests by @qgallouedec in https://github.com/huggingface/trl/pull/5546
Set _tokenizer as trainer attribute by @albertvillanova in https://github.com/huggingface/trl/pull/5489
Align KTO with DPO: Support dict eval_dataset by @albertvillanova in https://github.com/huggingface/trl/pull/5599
Align KTO with DPO: Align tokenization by @albertvillanova in https://github.com/huggingface/trl/pull/5601
Check prefix preservation at the token level by @qgallouedec in https://github.com/huggingface/trl/pull/5559
Replace wrong comment about chat template with EOS by @albertvillanova in https://github.com/huggingface/trl/pull/5607
Align KTO with DPO: Support IterableDataset by @albertvillanova in https://github.com/huggingface/trl/pull/5600
Drop vLLM 0.11 support by @qgallouedec in https://github.com/huggingface/trl/pull/5549
Align KTO with DPO: Remove maybe_apply_chat_template by @albertvillanova in https://github.com/huggingface/trl/pull/5606
[TPO] experimental TPO trainer by @kashif in https://github.com/huggingface/trl/pull/5506
fix: Pass AsyncGRPOTrainer's processing_class to AsyncRolloutWorker by @xuanduy04 in https://github.com/huggingface/trl/pull/5538
docs: update RapidFire AI integration with FSDP and multi-backend tracking by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/5618
Fix generate_tiny_models for gpt-oss by @albertvillanova in https://github.com/huggingface/trl/pull/5622
Added speculative_config to vllm-serve by @Ofir408 in https://github.com/huggingface/trl/pull/5605
feat(glm-4-moe): Add {% generation %} markers for training chat template by @casinca in https://github.com/huggingface/trl/pull/5519
Fix docstring style in vllm-serve script by @albertvillanova in https://github.com/huggingface/trl/pull/5628
feat: add Gemma/Gemma2 training chat templates with generation markers by @ps-abhi in https://github.com/huggingface/trl/pull/5523
Align KTO with DPO: Inline tokenization, new output format, DataCollatorForKTO by @albertvillanova in https://github.com/huggingface/trl/pull/5612
feat: add Phi-3 training chat template with generation markers by @RudrenduPaul in https://github.com/huggingface/trl/pull/5526
Remove forward_masked_logits by @qgallouedec in https://github.com/huggingface/trl/pull/5626
Use PreTrainedTokenizerBase for tokenizer type hints by @qgallouedec in https://github.com/huggingface/trl/pull/5629
Add doc-builder style check to pre-commit and CI by @albertvillanova in https://github.com/huggingface/trl/pull/5630
Align and update doc-builder commit hash in CI GitHub Actions by @albertvillanova in https://github.com/huggingface/trl/pull/5631
Align KTO with DPO: Move completion assembly from _prepare_dataset to data collator by @albertvillanova in https://github.com/huggingface/trl/pull/5632
Hotfix CI: Add ruff dependency to doc-builder style check by @albertvillanova in https://github.com/huggingface/trl/pull/5634
Fix entropy calculation in SFT by @qgallouedec in https://github.com/huggingface/trl/pull/5620
Renaming of internal variables: async_reward_X to async_X by @qgallouedec in https://github.com/huggingface/trl/pull/5616
Align KTO with DPO: Remove BOS/EOS handling by @albertvillanova in https://github.com/huggingface/trl/pull/5635
Qwen3.6 integration by @qgallouedec in https://github.com/huggingface/trl/pull/5642
Release: v1.3 by @qgallouedec in https://github.com/huggingface/trl/pull/5647

New Contributors

@Ofir408 made their first contribution in https://github.com/huggingface/trl/pull/5605
@ps-abhi made their first contribution in https://github.com/huggingface/trl/pull/5523

Full Changelog: https://github.com/huggingface/trl/compare/v1.2.0...v1.3.0

Apr 24, 2026

v1.12.0 ↗

[v1.12.0] Unified CLI output, bucket search, and more

viahuggingface_hub

🖥️ Unified output format for `hf buckets` commands

All hf buckets commands now use the unified --format [auto|human|agent|json|quiet] flag and the out singleton for consistent, scriptable output. The previous --quiet and --format table|json flags have been replaced by a single --format option that works across create, list, info, delete, rm, move, and cp. Success messages use out.result(), detail views use out.dict(), and listings use out.table() with proper empty-results handling — making the buckets CLI consistent with the rest of the hf command suite.

# Quiet mode: print only bucket IDs
hf buckets list --format quiet

# JSON output for scripting
hf buckets create my-bucket --format json

# Agent-friendly structured output
hf buckets info username/my-bucket --format agent

[CLI] Migrate buckets commands to out singleton by @hanouticelina in #4111

📚 Documentation: Buckets guide · CLI guide

🪣 Search buckets by name

You can now filter buckets by name when listing them, both from the Python API and the CLI. Pass search="checkpoint" to list_buckets() or --search "checkpoint" to hf buckets list to find buckets matching a name pattern, without having to list and filter client-side.

# Filter buckets by name
hf buckets list --search "checkpoint"

# Filter buckets by name in Python
for bucket in list_buckets(search="checkpoint"):
    print(bucket.id)

[Buckets] Add search param to list_buckets by @alexpouliquen in #4130

📚 Documentation: Buckets guide · CLI guide

🖥️ CLI

[CLI] spaces hot-reload: misc improvements by @cbensimon in #4049
[CLI] Detect pi agent by @hanouticelina in #4125

🐛 Bug and typo fixes

Apply fsspec config in HfFileSystem metaclass by @joaquinhuigomez in #4062

🔧 Other QoL Improvements

[Buckets] Skip local walk for download sync without delete by @abidlabs in #4123
[HfApi] Add mainSize to ExpandDatasetProperty_T by @Wauplin in #4136

🏗️ Internal

[Internal] Fix slack-message draft release permissions + update model by @hanouticelina in #4119
Post-release: bump version to 1.12.0.dev0 by @huggingface-hub-bot[bot] in #4120
[Internal] Make RELEASE_NOTES_MODEL configurable via repo variable by @Wauplin in #4126
[Release] Add social media draft generation to release workflow by @Wauplin in #4132
chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #4137
[Release] Make release-notes job fail loudly on bad model/empty output by @Wauplin in #4138

Apr 23, 2026

v5.6.2 ↗

Patch release v5.6.2

viaTransformers

Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this :saluting_face:

Fix configuration reading and error handling for kernels (https://github.com/huggingface/transformers/pull/45610) by @hmellor

Full Changelog: https://github.com/huggingface/transformers/compare/v5.6.1...v5.6.2

4.0.1 ↗

viaTransformers.js

What's new?

Add support for Gemma 4 in #1627

Full Changelog: 4.0.0...4.0.1

4.2.0 ↗

viaTransformers.js

🚀 Transformers.js v4.2 — Tool calling, simpler internals, and privacy filtering

Added tools to TextGenerationPipeline in #1655
Use inputMetadata API for simplified internals in #1657
Add support for OpenAI privacy filter model in #1658

Full Changelog: 4.1.0...4.2.0

4.1.0 ↗

viaTransformers.js

🚀 Transformers.js v4.1 — Gemma 4, KV cache improvements, and new quantization dtypes

Add support for Gemma 4 in #1627
Cached generation improvements (+ past_key_values via pipeline function) in #1638
Improve tokenizer types based on input function parameters in #1641
Add support for q1, q1f16, q2, and q2f16 data types in #1647
Re-enable SmolVLM in #1648
Update default generation parameters in #1649
Pin GitHub Actions to commit SHAs in #1626

Full Changelog: 4.0.0...4.1.0

v5.6.1 ↗

Patch release v5.6.1

viaTransformers

Flash attention path was broken! Sorry everyone for this one 🤗

Fix AttributeError on s_aux=None in flash_attention_forward (https://github.com/huggingface/transformers/pull/45589) by @jamesbraza

Apr 22, 2026

v5.6.0 ↗

Release v5.6.0

viaTransformers

New Model additions

OpenAI Privacy Filter

OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure, predicting probability distributions over 8 privacy-related output categories for each input token.

Links: Documentation

[Privacy Filter] Add model (#45580) by @vasqu in #45580

QianfanOCR

Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by Baidu that performs direct image-to-text conversion without traditional multi-stage OCR pipelines. It supports a broad range of prompt-driven tasks including structured document parsing, table extraction, chart understanding, document question answering, and key information extraction all within one unified model. The model features a unique "Layout-as-Thought" capability that generates structured layout representations before producing final outputs, making it particularly effective for complex documents with mixed element types.

Links: Documentation | Paper

add Qianfan-OCR model definition (#45280) by @marvinzh in #45280

SAM3-LiteText

SAM3-LiteText is a lightweight variant of SAM3 that replaces the heavy SAM3 text encoder (353M parameters) with a compact MobileCLIP-based text encoder optimized through knowledge distillation, while keeping the SAM3 ViT-H image encoder intact. This reduces text encoder parameters by up to 88% while maintaining segmentation performance comparable to the original model. The model enables efficient vision-language segmentation by addressing the redundancy found in text prompting for segmentation tasks.

Links: Documentation | Paper

Add SAM3-LiteText (#44320) by @NielsRogge in #44320

SLANet

SLANet and SLANet_plus are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The model improves accuracy and inference speed by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. SLANet was developed by Baidu PaddlePaddle Vision Team as part of their table structure recognition solutions.

Links: Documentation

[Model] Add SLANet Model Support (#45532) by @zhang-prog in #45532

Breaking changes

The internal rotary_fn is no longer registered as a hidden kernel function, so any code referencing self.rotary_fn(...) within an Attention module will break and must be updated to call the function directly instead.

:rotating_light: [Kernels] Fix kernel function registration (#45420) by @vasqu

Serve

The transformers serve command received several enhancements, including a new /v1/completions endpoint for legacy text completion, multimodal support for audio and video inputs, improved tool-calling via parse_response, proper forwarding of tool_calls/tool_call_id fields, a 400 error on model mismatch when the server is pinned to a specific model, and fixes for the response API. Documentation was also updated to cover new serving options such as --compile and --model-timeout.

Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve (#44558) by @rain-1 in [#44558]
Updated the image cache for Paddle models according to the latest API (#45562) by @zhang-prog in [#45562]
Raise 400 on model mismatch when transformers serve is pinned (#45443) by @qgallouedec in [#45443]
[serve] Update tool call to switch to parse_response (#45485) by @SunMarc in [#45485]
Fix response api support (#45463) by @SunMarc in [#45463]
[serve] Forward tool_calls/tool_call_id in processor inputs (#45418) by @qgallouedec in [#45418]
refactor(qa): extend extras so ty can run on server modules (#45456) by @tarekziade in [#45456]
Multimodal serve support (#45220) by @SunMarc in [#45220]
[docs] transformers serve (#45174) by @stevhliu in [#45174]

Vision

Several vision-related bug fixes were applied in this release, including correcting Qwen2.5-VL temporal RoPE scaling for still images, fixing missing/mismatched image processor backends for Emu3 and BLIP, resolving modular image processor class duplication, and preventing accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models. Image loading performance was also improved by leveraging torchvision's native decode_image in the torchvision backend, yielding up to ~17% speedup over PIL-based loading.

Revert "Fix: modular image processors (#45492)" (#45531) by @tarekziade in [#45531]
Fix: modular image processors (#45492) by @zucchini-nlp in [#45492]
fix: prevent accelerate from splitting vision encoder by setting no… (#43047) by @<NOT FOUND> in [#43047]
Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by @Kash6 in [#45330]
Use torchvision decode_image to load images in the torchvision backend (#45195) by @yonigozlan in [#45195]
Fix missing image processors backends (#45165) by @zucchini-nlp in [#45165]

Parallelization

Fixed several bugs affecting distributed training, including silently wrong results or NaN loss with Expert Parallelism, NaN weights on non-rank-0 FSDP processes, and a resize failure in PP-DocLayoutV3; additionally added support for loading adapters with Tensor Parallelism, added MoE to the Gemma4 TP plan, and published documentation for TP training.

Fix EP: RouterParallel shape, tp_plan property, grouped_mm sentinels (#45473) by @AmineDiro in [#45473]
Fix NaN weights on non-rank-0 FSDP processes (#45050) by @albertvillanova in [#45050]
Load adapter with TP (#45155) by @michaelbenayoun in [#45155]
[docs] tp training (#44613) by @stevhliu in [#44613]
Fix resize failure caused by zero-sized masks in PP-DocLayoutV3 (#45281) by @zhang-prog in [#45281]
Add MoE to Gemma4 TP plan (#45219) by @sywangyi in [#45219]

Tokenization

Fixed a docstring typo in streamer classes, resolved a Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError, and patched a streaming generation crash for Qwen3VLProcessor caused by incorrect _tokenizer attribute access. Additional housekeeping included moving the GPT-SW3 instruct tokenizer to an internal testing repo and fixing a global state leak in the tokenizer registry during tests.

[Doc] Fix 'tokenized' -> 'tokenizer' typo in streamer docstrings (#45508) by @avasis-ai in [#45508]
Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError (#45359) by @ArthurZucker in [#45359]
fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation (#45368) by @sharziki in [#45368]
[Tokenizers] Move gpt sw3 tokenizer out (#45404) by @vasqu in [#45404]
fix: leak in tokenizer registry for test_processors (#45318) by @tarekziade in [#45318]

Cache

Cache handling was improved for Gemma4 and Gemma3n models by dissociating KV state sharing from the Cache class, ensuring KV states are always shared regardless of whether a Cache is used. Additionally, the image cache for Paddle models was updated to align with the latest API.

Align gemma3n cache sharing to gemma4 (#45489) by @Cyrilvallez in [#45489]
remove cache file from tree (#45392) by @tarekziade in [#45392]
[gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez in [#45312]

Audio

Audio models gained vLLM compatibility through targeted fixes across several model implementations, while reliability improvements were also made including exponential back-off retries for audio file downloads, a crash fix in the text-to-speech pipeline when generation configs contain None values, and corrected test failures for Kyutai Speech-To-Text.

feat[vLLM × v5]: Add vLLM compatibility for audio models (#45326) by @harshaljanjani in [#45326]
http retries on audio file downloads (#45126) by @tarekziade in [#45126]
fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI (#44695) by @harshaljanjani in [#44695]
Fix text-to-speech pipeline crash when generation config contains None values (#45107) by @jiqing-feng in [#45107]

Bugfixes and improvements

[Privacy Filter] Add model (#45580) by @vasqu in [#45580]
Add ForSequenceClassification heads for the OLMo family (#45551) by @earino in [#45551]
Add IndexCache support for GLM5 DSA (#45424) by @louzongzhi in [#45424]
Fix redundant logic in video processing SmolVLM (#45272) by @yonigozlan in [#45272]
Fix typos (#45574) by @vasqu in [#45574]
[Model] Add SLANet Model Support (#45532) by @zhang-prog in [#45532]
refactor(Dots1): drop Dots1MoE override to pass (inherits from DSV3 MoE) (#45572) by @casinca in [#45572]
perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models (#45555) by @casinca in [#45555]
Gemma4 training with text-only samples (#45454) by @zucchini-nlp in [#45454]
[nemotron_h] Add support for MLP mixers (#44763) by @xenova in [#44763]
add expert parallelism for gemma-4-26B-A4B-it (#45279) by @sywangyi in [#45279]
Add full GGUF loading support for GPT‑OSS (fixes #43366, supersedes #43757) latest (#45506) by @sirzechs66 in [#45506]
Update Gemma4 weight conversion script (#45328) by @RyanMullins in [#45328]
Move some conversion mappings to PrefixChange (#45567) by @Cyrilvallez in [#45567]
fix table update versions (#45544) by @tarekziade in [#45544]
Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection (#45547) by @rtrompier in [#45547]
fix(DSV3): parity between native DeepseekV3MoE and remote official implementation (#45441) by @casinca in [#45441]
[modular] Fix modular logic broken in #45045 (#45539) by @Cyrilvallez in [#45539]
Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM (#45494) by @lvliang-intel in [#45494]
T5Gemma2: fix prepare_decoder_input_ids_from_labels (#45516) by @Tokarak in [#45516]
[Trainer] Add ddp_static_graph option (#45519) by @KeitaW in [#45519]
Add dtype config options for Four Over Six (#45367) by @jackcook in [#45367]
[Sam3LiteText] Remove unnecessary modules/configs (#45535) by @yonigozlan in [#45535]
Fix conditional check for float formatting (#44425) by @qgallouedec in [#44425]
Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations (#45533) by @Abdennacer-Badaoui in [#45533]
Reapply modular to examples (#45527) by @Cyrilvallez in [#45527]
qa: re-run modular converter when the script itself is modified (#45528) by @tarekziade in [#45528]
[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386) by @UsamaKenway in [#45386]
Fix CSM TextToAudioPipeline missing <bos> token (#45525) by @jiqing-feng in [#45525]
[Conversion Mapping] Small fixups (#45483) by @vasqu in [#45483]
fix: return empty tuple from import_protobuf_decode_error when protobuf is unavailable (#45486) by @jw9603 in [#45486]
throw error when conversion required (#45078) by @itazap in [#45078]
chore: bump doc-builder SHA for PR upload workflow (#45450) by @rtrompier in [#45450]
xpu output align with cuda in test case (#45526) by @sywangyi in [#45526]
chore(qa): split out mlinter (#45475) by @tarekziade in [#45475]
[loading] Clean way to add/remove full parts in checkpoint names (#45448) by @Cyrilvallez in [#45448]
Fix Zamba2MambaMixer ignoring use_mamba_kernels=False (#44853) by @sergiopaniego in [#44853]
revert sha commit pointing to main for transformers_amd_ci_ workflows (#45495) by @paulinebm in [#45495]
Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402) by @saslifat-gif in [#45402]
Remove redundant condition checks in get_image_size method (#45461) by @JiauZhang in [#45461]
Add check-auto in repo-consistency and fix sorting (#45481) by @zucchini-nlp in [#45481]
Fix typos in src/transformers/utils/output_capturing.py (#45269) by @ryota-komatsu in [#45269]
typing: rule 15 - checks for tie_word_embeddings presence (#44988) by @tarekziade in [#44988]
[CB] Fix capture of max_seqlen (#45323) by @remi-or in [#45323]
Minor update (#45484) by @ydshieh in [#45484]
Add Neuron to auto-compile hardware list (#44757) by @dacorvo in [#44757]
Allow loading Qwen Thinker 'base' models without generative head (#45457) by @tomaarsen in [#45457]
[fix] Always early return for non-Mistral models in _patch_mistral_regex (#45444) by @tomaarsen in [#45444]
Fix spurious position_ids warnings for at least 40 architectures (#45437) by @tomaarsen in [#45437]
[fix] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once (#45455) by @tomaarsen in [#45455]
Dynamic auto mapping (#45018) by @zucchini-nlp in [#45018]
[docs] vlm addition (#45271) by @stevhliu in [#45271]
fix: dont download artifacts from the test hub (#45319) by @tarekziade in [#45319]
fix(clipseg): fix 2 failing tests (#45403) by @kaixuanliu in [#45403]
[docs] @auto_docstring decorator (#45130) by @stevhliu in [#45130]
Fix Sam3Processor missing input_boxes_labels for padded None entries (#45171) by @Kash6 in [#45171]
better grad acc tests (#45434) by @SunMarc in [#45434]
Add example for iterative chatting with MLLMs (#45398) by @zucchini-nlp in [#45398]
Gemma4 resizing per layer inputs (#45324) by @zucchini-nlp in [#45324]
Add step3_vl to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#45449) by @hmellor in [#45449]
Update workflow references to new commit hash (#45442) by @paulinebm in [#45442]
[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline (#45207) by @w4nderlust in [#45207]
[Doc] Correct checkpoint path in Dinov2 model_docs (#45430) by @ambroiseodt in [#45430]
Fix ty for transformers cli (#45190) by @SunMarc in [#45190]
fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft) (#45199) by @harshaljanjani in [#45199]
Fix Qwen2.5VL temporal grid positions (#45400) by @zucchini-nlp in [#45400]
[fix] PEFT integration fixes preventing save/load & integration (#45428) by @tomaarsen in [#45428]
Fix the response schema for the gemma4 converter (#45411) by @Rocketknight1 in [#45411]
[Doc] MoE routing capture and replay recipe (#44925) by @kashif in [#44925]
Fix apply_chat_template crash on tool_call messages without content (#45348) by @qgallouedec in [#45348]
[AMD CI] Fix torch.compile/export failures on AMD CI due to untraceable set.contains (#45282) by @Abdennacer-Badaoui in [#45282]
[inference_fusion] convert conv3d patch embed to linear (#45041) by @JJJYmmm in [#45041]
Fix #45305 + add regression test GAS (#45349) by @florian6973 in [#45349]
Update trackio integration to use Buckets and "freeze" Space after training (#45329) by @abidlabs in [#45329]
fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward (#45352) by @RudrenduPaul in [#45352]
Fix: NotebookProgressCallback crash when evaluating with the Trainer (#44949) by @Charly21r in [#44949]
docs: fix 5 docstring errors in Gemma3nTextConfig (typos, grammar, formatting) (#45370) by @RudrenduPaul in [#45370]
Less unnecessary RoPE warnings (#45289) by @zucchini-nlp in [#45289]
Fix unintended Hub metadata calls from _patch_mistral_regex (#43603) by @vaibhav-research in [#43603]
Fix MoE routers returning probabilities instead of logits (#45131) by @yacinemebarki in [#45131]
[docs] training on specific hardware (#44799) by @stevhliu in [#44799]
[docs] zero + sequence parallelism (#44605) by @stevhliu in [#44605]
Fix vlm weight mappings (#45358) by @Cyrilvallez in [#45358]
Copy the template resolution logic from the base apply_chat_template to Voxtral (#45117) by @Rocketknight1 in [#45117]
add kwargs to all methods in the CallbackHandler class (#45353) by @wilnn in [#45353]
Close file handler (#45187) by @ydshieh in [#45187]
fix: restore mypy type checking for PreTrainedConfig subclasses (#45071) (#45240) by @shhKnight30 in [#45240]
cohere_asr: fix device issue for test_model_parallel_beam_search (#45214) by @kaixuanliu in [#45214]
Fix AttributeError in Gemma3ForConditionalGeneration and Gemma3ForSequenceClassification when config.return_dict=False (#45277) by @kamalrajkannan78 in [#45277]
fix bug for videomt model device mismatch (#45204) by @kaixuanliu in [#45204]
fix gemma4 gradient accumulation loss and last token incorrect labels (#45354) by @winglian in [#45354]
Logger has [transformers] prefix in non-verbose mode (#45316) by @zucchini-nlp in [#45316]
Fix AttributeError in AssistantToTargetTranslator.unmap_input_ids with cross-vocab models (#45320) by @Regata3010 in [#45320]
musicflamingo: add test support for Intel XPU device (#45212) by @kaixuanliu in [#45212]
nomic_bert: make the test suitable for general device. (#45209) by @kaixuanliu in [#45209]
Skip invalid flash-attn tests for pi0 model (#45011) by @kaixuanliu in [#45011]
Add cuda compatibility check for using grouped_mm (#45001) by @Sai-Suraj-27 in [#45001]
[docs] optimizers, hyperparam search, training features (#44290) by @stevhliu in [#44290]
Remove unused parameters and improve add_tensor_parallel_hooks_t… (#44768) by @michaelbenayoun in [#44768]
[gemma4] Fix device map auto (#45347) by @Cyrilvallez in [#45347]
Refactor CLIP-like models (#44431) by @zucchini-nlp in [#44431]
refactor: display test duration (#45344) by @tarekziade in [#45344]
Fix Wav2Vec2Config.vocab_size type to allow None (#45108) by @jiqing-feng in [#45108]
Add THD support in ESM (#44145) by @balvisio in [#44145]
[gemma4] Remove all shared weights, and silently skip them during loading (#45336) by @Cyrilvallez in [#45336]
Fix conversion mappings for vlms (#45340) by @Cyrilvallez in [#45340]
chore: added circleci python script to ruff and ty checkers (#45339) by @tarekziade in [#45339]
tweak checkers output on errors (#45163) by @tarekziade in [#45163]
chore: remove test_hub for now (#45337) by @tarekziade in [#45337]
[docs] pipeline cleanup (#44954) by @stevhliu in [#44954]
Fix export for gemma4 and add Integration tests (#45285) by @Cyrilvallez in [#45285]
Fix vllm cis (#45139) by @ArthurZucker in [#45139]
[docs] static model rules (#45232) by @stevhliu in [#45232]
fix(security): prevent untrusted users from triggering TRL CI dispatch (#45302) by @jagwar in [#45302]
[AMD CI] Fix Qwen2 expectations (#45284) by @Abdennacer-Badaoui in [#45284]
Add hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263) by @ydshieh in [#45263]
Fix SmolVLM video processor resize using wrong interpolation after backend refactor (#45258) by @ydshieh in [#45258]
Fix Qwen2IntegrationTest (#45268) by @ydshieh in [#45268]
doc: fix TokenizersBackend.convert_to_native_format docstring (#45262) by @lowzhao in [#45262]
empty (#45261) by @ydshieh in [#45261]
Fix unexpected TF32 being enabled in testing (#45252) by @ydshieh in [#45252]
Fix tf32 issue: set torch.backends.cudnn.conv.fp32_precision explicitly. (#45248) by @ydshieh in [#45248]
Nvidia CI with torch 2.11 (#45243) by @ydshieh in [#45243]
Update tiny model creation script (#45241) by @ydshieh in [#45241]
Update get_test_info.py (related to tiny model creation) (#45238) by @ydshieh in [#45238]
More fix for tiny model creation (#45228) by @ydshieh in [#45228]
remove unnecessary entries in some auto model mappings (#45224) by @ydshieh in [#45224]
fix: hf-doc-builder insallation was failing (#45225) by @tarekziade in [#45225]
[CB] Add per-request logits processors (#45026) by @remi-or in [#45026]
[docs] formatting (#45196) by @stevhliu in [#45196]
fix test_register_result_handler (#45188) by @SunMarc in [#45188]
[CB] Tweaks to update and minor fixes (#45179) by @remi-or in [#45179]
Fix pypi release (#45210) by @ArthurZucker in [#45210]
fix(docs): correct gemma4 docs and examples (#45197) by @douglas-reid in [#45197]
Add Turkish (tr) translation for Get Started section (#45158) by @onwp in [#45158]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vasqu
- [Privacy Filter] Add model (#45580)
- Fix typos (#45574)
- [Conversion Mapping] Small fixups (#45483)
- :rotating_light: [Kernels] Fix kernel function registration (#45420)
- [Tokenizers] Move gpt sw3 tokenizer out (#45404)
@rain-1
- Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve (#44558)
@zhang-prog
- Updated the image cache for Paddle models according to the latest API (#45562)
- [Model] Add SLANet Model Support (#45532)
- Fix resize failure caused by zero-sized masks in PP-DocLayoutV3 (#45281)
@tarekziade
- fix table update versions (#45544)
- qa: re-run modular converter when the script itself is modified (#45528)
- Revert "Fix: modular image processors (#45492)" (#45531)
- chore(qa): split out mlinter (#45475)
- typing: rule 15 - checks for tie_word_embeddings presence (#44988)
- fix: dont download artifacts from the test hub (#45319)
- refactor(qa): extend extras so ty can run on server modules (#45456)
- remove cache file from tree (#45392)
- refactor: display test duration (#45344)
- http retries on audio file downloads (#45126)
- chore: added circleci python script to ruff and ty checkers (#45339)
- tweak checkers output on errors (#45163)
- fix: leak in tokenizer registry for test_processors (#45318)
- chore: remove test_hub for now (#45337)
- fix: hf-doc-builder insallation was failing (#45225)
@marvinzh
- add Qianfan-OCR model definition (#45280)
@remi-or
- [CB] Fix capture of max_seqlen (#45323)
- [CB] Add per-request logits processors (#45026)
- [CB] Tweaks to update and minor fixes (#45179)
@ydshieh
- Minor update (#45484)
- Close file handler (#45187)
- Add hasattr(torch.backends.cudnn, "conv") to conftest.py (#45263)
- Fix SmolVLM video processor resize using wrong interpolation after backend refactor (#45258)
- Fix Qwen2IntegrationTest (#45268)
- empty (#45261)
- Fix unexpected TF32 being enabled in testing (#45252)
- Fix tf32 issue: set torch.backends.cudnn.conv.fp32_precision explicitly. (#45248)
- Nvidia CI with torch 2.11 (#45243)
- Update tiny model creation script (#45241)
- Update get_test_info.py (related to tiny model creation) (#45238)
- More fix for tiny model creation (#45228)
- remove unnecessary entries in some auto model mappings (#45224)
@NielsRogge
- Add SAM3-LiteText (#44320)
@ArthurZucker
- Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active (#45414)
- Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError (#45359)
- Fix vllm cis (#45139)
- Fix pypi release (#45210)
- update to dev version 5.6.0-dev0
@JJJYmmm
- [inference_fusion] convert conv3d patch embed to linear (#45041)
@balvisio
- Add THD support in ESM (#44145)
@onwp
- Add Turkish (tr) translation for Get Started section (#45158)

Apr 17, 2026

v1.2.0 ↗

viaTRL

Features

New `SSDTrainer` — Simple Self-Distillation

A new experimental SSDTrainer implements the method described in Embarrassingly Simple Self-Distillation Improves Code Generation. SSD samples completions from the model itself at a training-time temperature/truncation setting, then fine-tunes on those raw, unverified samples with standard cross-entropy loss. No reward model, verifier, teacher model, or RL: just prompts and the model.

from datasets import Dataset
from trl.experimental.ssd import SSDConfig, SSDTrainer

dataset = Dataset.from_dict({
    "prompt": [
        [{"role": "user", "content": "Write a function to add two numbers."}],
        [{"role": "user", "content": "Write a function to check if a number is prime."}],
    ],
})

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(
        output_dir="ssd-model",
        temperature=0.6,      # T_train from the paper
        top_k=20,
        top_p=0.95,
        learning_rate=5e-6,
    ),
    train_dataset=dataset,
)
trainer.train()

by @kashif in https://github.com/huggingface/trl/pull/5505

Drop, don't truncate, overlong tool results in `GRPOTrainer`

When tool calls produce more tokens than max_completion_length allows, GRPOTrainer now rolls back the tool messages/images added in the current iteration instead of trying to truncate them. This removes ~80 lines of fragile, image-boundary-aware bookkeeping in favor of a ~15-line snapshot-and-rollback. Since overlong samples almost always get rewarded as failures anyway, the learning signal is effectively unchanged — but the code is dramatically simpler and no longer needs per-VLM-family vision-token lookup tables.

by @qgallouedec in https://github.com/huggingface/trl/pull/5521

Expanded tool-calling model support: LLaMA 3.1 / 3.2 & DeepSeek-V3

Continuing the effort from v1.1:

LLaMA 3.1 and 3.2 tool-calling response schemas, with dedicated templates for identity matching. Note that these templates only support a single tool call and no content alongside the tool call — limitations inherited from the models' native templates. By @qgallouedec in https://github.com/huggingface/trl/pull/5518
DeepSeek-V3 training chat template with {% generation %} markers, enabling assistant-only loss masking for DeepSeek-V3 models. By @RudrenduPaul in https://github.com/huggingface/trl/pull/5527

As a result of a tightened detection (see fixes below), the list of templates reported as tool-calling capable is now correct — notably, the basic Llama 3 template is no longer falsely classified as tool-calling capable.

KTO/DPO alignment push

A major cleanup sweep keeps KTOTrainer and DPOTrainer in lockstep, same initialization patterns, same config surface, same precompute behavior:

Add precompute_ref_batch_size to KTO (https://github.com/huggingface/trl/pull/5530)
Align ref_model initialization (https://github.com/huggingface/trl/pull/5534)
Align model initialization (https://github.com/huggingface/trl/pull/5533)
Support None args (https://github.com/huggingface/trl/pull/5531)
Remove generate_during_eval (https://github.com/huggingface/trl/pull/5551)
Remove model and ref adapter names (https://github.com/huggingface/trl/pull/5552)
Don't load ref_model when precompute_ref_log_probs is set in DPO/KTO (https://github.com/huggingface/trl/pull/5542)

All by @albertvillanova.

Other

Support messages with images in prepare_multimodal_messages by @albertvillanova in https://github.com/huggingface/trl/pull/5474
Simplify role handling in prepare_multimodal_messages by @albertvillanova in https://github.com/huggingface/trl/pull/5508
Update vLLM version support to 0.18.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5547

Fixes

Fix supports_tool_calling falsely accepting templates that drop assistant tool_calls by @qgallouedec in https://github.com/huggingface/trl/pull/5517
Fix add_response_schema for VLM processors — the schema was being set on the outer processor instead of the inner tokenizer, so it had no effect. This also collapses a handful of __init__/decode-gate workarounds. By @qgallouedec in https://github.com/huggingface/trl/pull/5520
Remove xfail condition for Gemma 4 response_schema regex bug by @qgallouedec in https://github.com/huggingface/trl/pull/5510
Remove unused dependencies for judges from dev requirements by @qgallouedec in https://github.com/huggingface/trl/pull/5515

Deprecations

Deprecate use_transformers_paged in GRPOConfig and RLOOConfig (and remove entirely from experimental OnlineDPOConfig, GOLDConfig, SelfDistillationConfig). Will be removed from the remaining configs in v2.0.0. In a small A/B benchmark (Qwen3-0.6B GRPO), the paged path is ~20% slower and uses ~6x more peak VRAM than the default; it's also superseded by transformers continuous batching. By @qgallouedec in https://github.com/huggingface/trl/pull/5544

Documentation and Examples

Add example script section to experimental trainer docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5543
[Docs] Fix formatting in SSD training example script by @kashif in https://github.com/huggingface/trl/pull/5548
Nits in SSD docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5554
[docs] Add LLaMA 3 / Qwen 2.5 entries to chat_templates/README by @qgallouedec in https://github.com/huggingface/trl/pull/5545
Update CARLA VLM example scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/5557

CI

Fix CI dependency installs to use a single resolve by @qgallouedec in https://github.com/huggingface/trl/pull/5513
Set upper transformers version to skip distributed test_rloo after fixed by @albertvillanova in https://github.com/huggingface/trl/pull/5535
Update tests with zero3 for RLOO and GRPO once fixed in transformers 5.5.4 by @albertvillanova in https://github.com/huggingface/trl/pull/5541
Bump doc-builder SHA for PR upload workflow by @rtrompier in https://github.com/huggingface/trl/pull/5553

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5525
Simplify role handling in prepare_multimodal_messages by @albertvillanova in https://github.com/huggingface/trl/pull/5508
Fix CI dependency installs to use a single resolve by @qgallouedec in https://github.com/huggingface/trl/pull/5513
Fix supports_tool_calling falsely accepting templates that drop assistant tool_calls by @qgallouedec in https://github.com/huggingface/trl/pull/5517
feat: add DeepSeek-V3 training chat template with generation markers by @RudrenduPaul in https://github.com/huggingface/trl/pull/5527
Drop, don't truncate, overlong tool results in GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5521
Set upper transformers version to skip distributed test_rloo after fixed by @albertvillanova in https://github.com/huggingface/trl/pull/5535
Align KTO with DPO: Add precompute_ref_batch_size by @albertvillanova in https://github.com/huggingface/trl/pull/5530
Update tests with zero3 for RLOO and GRPO once fixed in transformers 5.5.4 by @albertvillanova in https://github.com/huggingface/trl/pull/5541
Align KTO with DPO: Align ref_model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/5534
Align KTO with DPO: Align model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/5533
Remove unused dependencies for judges from dev requirements by @qgallouedec in https://github.com/huggingface/trl/pull/5515
Remove xfail condition for Gemma4 response_schema regex bug by @qgallouedec in https://github.com/huggingface/trl/pull/5510
Align KTO with DPO: Support None args by @albertvillanova in https://github.com/huggingface/trl/pull/5531
Add example script section to experimental trainer docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5543
[SSD] Added SSD trainer in experimental by @kashif in https://github.com/huggingface/trl/pull/5505
[Docs] Fix formatting in SSD training example script by @kashif in https://github.com/huggingface/trl/pull/5548
Don't load ref_model when precompute_ref_log_probs in DPO/KTO by @albertvillanova in https://github.com/huggingface/trl/pull/5542
chore: bump doc-builder SHA for PR upload workflow by @rtrompier in https://github.com/huggingface/trl/pull/5553
Nits is SSD docs by @sergiopaniego in https://github.com/huggingface/trl/pull/5554
Deprecate use_transformers_paged by @qgallouedec in https://github.com/huggingface/trl/pull/5544
Update vLLM version support to 0.18.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5547
Align KTO with DPO: Remove generate_during_eval by @albertvillanova in https://github.com/huggingface/trl/pull/5551
Align KTO with DPO: Remove model and ref adapter names by @albertvillanova in https://github.com/huggingface/trl/pull/5552
Support messages with images in prepare_multimodal_messages by @albertvillanova in https://github.com/huggingface/trl/pull/5474
Update CARLA VLM example scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/5557
Fix add_response_schema for VLM processors by @qgallouedec in https://github.com/huggingface/trl/pull/5520
[docs] Add LLaMA 3 / Qwen 2.5 entries to chat_templates/README by @qgallouedec in https://github.com/huggingface/trl/pull/5545
Add LLaMA 3.1 and 3.2 tool calling support by @qgallouedec in https://github.com/huggingface/trl/pull/5518
Release: v1.2 by @qgallouedec in https://github.com/huggingface/trl/pull/5576

Full Changelog: https://github.com/huggingface/trl/compare/v1.1.0...v1.2.0

Spaces agents.md for your coding agents ↗

viaHugging Face Changelog

Every Gradio Space now auto-serves an /agents.md endpoint, a machine-readable API description that AI agents can read and call directly. Point your coding agents (like Claude Code, Codex, or Pi) at it and they figure out how to use the Space without any setup.

Last Checked

18m ago

Domain

huggingface.co

Hugging Face

Features

Chunked cross-entropy loss for SFT (up to –50% VRAM)

OpenReward Standard environment adapter (experimental)

Training-invariance test suite

MFU helpers

GRPO Liger kernel update (Liger 0.8.0)

Length-normalized DPO sigmoid loss

Even more training chat templates

KTO ↔ DPO alignment: closing in on graduation

Other

Fixes

Documentation and Examples

CI

New Contributors

What's Changed

April 23, 2026

What's Changed

New Contributors

🖥️ Manage Space secrets and variables from the CLI

🪣 Rsync-style trailing slash for bucket folder copies

💔 Breaking Change

🖥️ CLI

🐛 Bug and typo fixes

🏗️ Internal

Release v5.8.0

New Model additions

DeepSeek-V4

Gemma 4 Assistant

GraniteSpeechPlus

Granite4Vision

EXAONE-4.5

PP-FormulaNet

Breaking changes

Tokenization

Bugfixes and improvements

Significant community contributions

New Pipelines

LLaDA2

Nucleus-MoE

Ernie-Image

LongCat-AudioDiT

Ace-Step 1.5

Flux.2 Small Decoder

Modular Pipeline Support

Core Library

All commits

Significant community contributions

🖥️ New CLI commands: repo cards, file listings, and dataset leaderboards

:rocket: Manage Spaces from the CLI

:arrows_clockwise: hf update replaces the auto-update prompt

:pencil2: Global output formatting for every command

🔗 Centralized hf:// URI parsing

🚀 Bucket transport for Jobs script upload

🖥️ CLI

🤖 Inference

📖 Documentation

🐛 Bug and typo fixes

🏗️ Internal

New Model additions

Laguna

DEIMv2

Attention

Tokenizers

Generation

Kernels

Bugfixes and improvements

Significant community contributions

Main bug fixes

Other improvements and bug fixes

New Contributors

TL;DR

🚨 Breaking changes

⚡ Performance — measured locally on this Mac, not lifted from PRs

Added-vocabulary deserialize — the headline win (#1995, #1999)

BPE encode

Llama-3 encode

Truncation early exit (#1990)

Other perf improvements (no direct comparable bench)

🔄 Serialization / deserialization

:arrows_clockwise: `hf update` replaces the auto-update prompt

🔗 Centralized `hf://` URI parsing

Speculative decoding in `trl vllm-serve`

More `{% generation %}` training chat templates

🖥️ Unified output format for `hf buckets` commands

New `SSDTrainer` — Simple Self-Distillation

Drop, don't truncate, overlong tool results in `GRPOTrainer`