v1.5.0
Features
Even more training chat templates
Three more model families gain training-compatible templates with {% generation %} markers (so assistant_only_loss=True just works):
- Phi-3.5 by @DagaBhai in https://github.com/huggingface/trl/pull/5746
- Qwen3-VL by @aazizyan in https://github.com/huggingface/trl/pull/5764
- Qwen3.5 Think / NoThink by @aazizyan in https://github.com/huggingface/trl/pull/5824
Final logits softcapping for async GRPO
The chunked LM-head path used by AsyncGRPOTrainer now supports models that use final_logit_softcapping (notably Gemma 2). _ChunkedLogProbFunction applies logit_scale, optional tanh-based softcapping, and temperature consistently in both forward and backward — softcapped models are no longer rejected.
by @mlarnouhet in https://github.com/huggingface/trl/pull/5691
KTO ↔ DPO alignment continues
Two more cycles closer to KTO graduation:
- Align
compute_lossflow by @albertvillanova in https://github.com/huggingface/trl/pull/5810 - Align
_compute_loss_ligerflow by @albertvillanova in https://github.com/huggingface/trl/pull/5816
Trainer telemetry (opt-out)
_BaseTrainer.__init__ now emits a single anonymous huggingface_hub.send_telemetry ping per trainer instantiation, so we can finally see which trainers / model families / distributed backends are actually being used in practice and prioritize accordingly.
The payload is intentionally minimal — TRL version, trainer class name, model architecture, PEFT yes/no, distributed backend (deepspeed/fsdp/ddp/none), bucketed world size, device type, GPU model when available. No user data, no dataset names, no model paths, no hyperparameter values, never sent in CI / offline / HF_HUB_DISABLE_TELEMETRY mode.
See usage_stats.md for what's collected and how to opt out.
by @qgallouedec in https://github.com/huggingface/trl/pull/5758
Other
OpenRewardSpec: fix omitting task-scoped tools during rollout binding (fixes #5727) by @rycerzes in https://github.com/huggingface/trl/pull/5729- Add OpenReward example to the list of examples by @sergiopaniego in https://github.com/huggingface/trl/pull/5752
- Add DDP-2 members to invariant test suite by @qgallouedec in https://github.com/huggingface/trl/pull/5736
- Align and simplify the stable training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5812
- Replace uv installation script with setup action by @qgallouedec in https://github.com/huggingface/trl/pull/5735
Fixes
- Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing —
GRPOTrainerwas hanging indefinitely on truncated<tool_call>blocks (a degenerate case that happens naturally when generation hitsmax_completion_lengthmid-tool-call). Rewrote the regex to be non-backtracking — worst case goes from O(2ⁿ) to O(n). By @xodn348 in https://github.com/huggingface/trl/pull/5798 - CUDA memory leak: release BNB dequantization buffers & stale state in
OffloadActivations— follow-up to v1.4's activation-offloading leak fix. By @butterwecksolutions in https://github.com/huggingface/trl/pull/5730 - Invalidate ZeRO-3 param coordinator trace in
add_hooksby @roycho96 in https://github.com/huggingface/trl/pull/4693 - Fix nested
vocab_sizeforDistillationTrainerandGOLDTrainerby @Beichen-Ma in https://github.com/huggingface/trl/pull/5592 - Fix MPS support in experimental
empty_cache()by @jamie-peterson-ml in https://github.com/huggingface/trl/pull/5799 - Fix
metric_for_best_modelfor trainer-specific eval metrics by @qgallouedec in https://github.com/huggingface/trl/pull/5811 - Fix
generate_batch: inference tensors blocking inplace ops in background thread by @albertvillanova in https://github.com/huggingface/trl/pull/5818 - Replace deprecated
torch_dtypewithdtypeacross examples, docs, notebooks, tests, and experimentaldistillation/goldtrainers by @qgallouedec in https://github.com/huggingface/trl/pull/5717
Documentation and Examples
- docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in https://github.com/huggingface/trl/pull/5740
CI
- Migrate tests to Qwen3.5 Think/NoThink fixtures + tiny-model generation scripts by @aazizyan in https://github.com/huggingface/trl/pull/5819 and https://github.com/huggingface/trl/pull/5821
- Align tiny
Glm4MoeForCausalLM/Cohere/Cohere2/Qwen2.5-VLconfigs with their reference models by @qgallouedec in https://github.com/huggingface/trl/pull/5638, https://github.com/huggingface/trl/pull/5706, https://github.com/huggingface/trl/pull/5707 and https://github.com/huggingface/trl/pull/5739 - Fix tiny Qwen3-VL
deepstack_visual_indexesand drop the test skip by @qgallouedec in https://github.com/huggingface/trl/pull/5779 - Fix tiny Qwen2.5-VL
fullatt_block_indexesout of range for depth=2 by @albertvillanova in https://github.com/huggingface/trl/pull/5805 - Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in https://github.com/huggingface/trl/pull/5795
- Fix vision config
num_headskey in Qwen VL tiny model scripts by @matdou in https://github.com/huggingface/trl/pull/5792 - Drop unjustified
model.visual.skip in GRPO/RLOO Qwen2.5-VL tests by @qgallouedec in https://github.com/huggingface/trl/pull/5780 - Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in https://github.com/huggingface/trl/pull/5778
- Remove obsolete Gemma 3 vision-head guard from VLM training tests by @qgallouedec in https://github.com/huggingface/trl/pull/5772
- Fix OOM in CI: reduce batch size in VLM SFT / GRPO/RLOO VLM / toolcall tests by @albertvillanova in https://github.com/huggingface/trl/pull/5687, https://github.com/huggingface/trl/pull/5767, https://github.com/huggingface/trl/pull/5801
- Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in https://github.com/huggingface/trl/pull/5776
- Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma 4 by @albertvillanova in https://github.com/huggingface/trl/pull/5760
- Fix CI errors in response parsing for gpt-oss/llama with transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/5755
- Fix CI
AttributeError: 'GptOssConfig' object has no attribute 'num_experts'by @albertvillanova in https://github.com/huggingface/trl/pull/5756 - Fix CI
apply_model_revisionsby removing_commit_hashkwarg by @albertvillanova in https://github.com/huggingface/trl/pull/5762 - Fix CI test to avoid skipping
model.visualparams by @albertvillanova in https://github.com/huggingface/trl/pull/5806 - Fix transformers min version for tiny gemma 4 as 5.5.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5763
- Hotfix CI: pin
torch < 2.12.0(later reverted) by @albertvillanova in https://github.com/huggingface/trl/pull/5769 - Fix catch-all empty string in Makefile
pytest --only-rerunby @albertvillanova in https://github.com/huggingface/trl/pull/5784 - chore: update
tests_latest.ymlby @hf-security-analysis[bot] in https://github.com/huggingface/trl/pull/5733
New Contributors
- @hf-security-analysis[bot] made their first contribution in https://github.com/huggingface/trl/pull/5733
- @Beichen-Ma made their first contribution in https://github.com/huggingface/trl/pull/5592
- @DagaBhai made their first contribution in https://github.com/huggingface/trl/pull/5746
- @xodn348 made their first contribution in https://github.com/huggingface/trl/pull/5740
- @mlarnouhet made their first contribution in https://github.com/huggingface/trl/pull/5691
- @matdou made their first contribution in https://github.com/huggingface/trl/pull/5792
- @jamie-peterson-ml made their first contribution in https://github.com/huggingface/trl/pull/5799
- @rycerzes made their first contribution in https://github.com/huggingface/trl/pull/5729
What's Changed
- ⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5734
- chore: update tests_latest.yml by @hf-security-analysis[bot] in https://github.com/huggingface/trl/pull/5733
- fix: CUDA memory leak / release BNB dequantization buffers & stale state in OffloadActivations by @butterwecksolutions in https://github.com/huggingface/trl/pull/5730
- fix: invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in https://github.com/huggingface/trl/pull/4693
- Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in https://github.com/huggingface/trl/pull/5592
- feat: add Phi-3.5 training chat templates with generation markers by @DagaBhai in https://github.com/huggingface/trl/pull/5746
- docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in https://github.com/huggingface/trl/pull/5740
torch_dtype->dtypeby @qgallouedec in https://github.com/huggingface/trl/pull/5717- Add OpenReward example to the list of examples by @sergiopaniego in https://github.com/huggingface/trl/pull/5752
- Fix CI errors in response parsing for gptoss/llama with transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/5755
- Add DDP-2 members to invariant test suite by @qgallouedec in https://github.com/huggingface/trl/pull/5736
- Hotfix CI param not updated AssertionError: Pin torch < 2.12.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5769
- Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config by @qgallouedec in https://github.com/huggingface/trl/pull/5638
- Align tiny Cohere config with aya-expanse-8b by @qgallouedec in https://github.com/huggingface/trl/pull/5706
- Align tiny Cohere2 config with tiny-aya-earth by @qgallouedec in https://github.com/huggingface/trl/pull/5707
- Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma4 by @albertvillanova in https://github.com/huggingface/trl/pull/5760
- Fix CI AttributeError: 'GptOssConfig' object has no attribute 'num_experts' by @albertvillanova in https://github.com/huggingface/trl/pull/5756
- Fix CI apply_model_revisions by removing _commit_hash kwarg by @albertvillanova in https://github.com/huggingface/trl/pull/5762
- Remove obsolete Gemma3 vision-head guard from VLM training tests by @qgallouedec in https://github.com/huggingface/trl/pull/5772
- Replace uv installation script with setup action by @qgallouedec in https://github.com/huggingface/trl/pull/5735
- Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in https://github.com/huggingface/trl/pull/5776
- Fix transformers min version for tiny gemma4 as 5.5.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5763
- Final logits softcapping support for async GRPO Trainer by @mlarnouhet in https://github.com/huggingface/trl/pull/5691
- Fix vision config num_heads key in Qwen VL tiny model scripts and revert torch pin by @matdou in https://github.com/huggingface/trl/pull/5792
- Drop unjustified
model.visual.skip in GRPO / RLOO Qwen2.5-VL tests by @qgallouedec in https://github.com/huggingface/trl/pull/5780 - Fix OOM in CI by reducing batch size and sequence length for toolcall tests by @albertvillanova in https://github.com/huggingface/trl/pull/5801
- Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing by @xodn348 in https://github.com/huggingface/trl/pull/5798
- Add telemetry to trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5758
- Add Qwen3-VL training chat template with generation markers by @aazizyan in https://github.com/huggingface/trl/pull/5764
- Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct by @qgallouedec in https://github.com/huggingface/trl/pull/5739
- Fix tiny Qwen3-VL
deepstack_visual_indexesand drop the test skip by @qgallouedec in https://github.com/huggingface/trl/pull/5779 - Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests by @albertvillanova in https://github.com/huggingface/trl/pull/5767
- Fix catch-all empty string in Makefile pytest --only-rerun by @albertvillanova in https://github.com/huggingface/trl/pull/5784
- Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in https://github.com/huggingface/trl/pull/5795
- Fix tiny Qwen2.5-VL fullatt_block_indexes out of range for depth=2 by @albertvillanova in https://github.com/huggingface/trl/pull/5805
- Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in https://github.com/huggingface/trl/pull/5778
- Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in https://github.com/huggingface/trl/pull/5799
- Fix CI test to avoid skipping model.visual params by @albertvillanova in https://github.com/huggingface/trl/pull/5806
- Align KTO with DPO: Align compute_loss flow by @albertvillanova in https://github.com/huggingface/trl/pull/5810
- Fix generate_batch: inference tensors block inplace ops in background thread by @albertvillanova in https://github.com/huggingface/trl/pull/5818
- Fix
metric_for_best_modelfor trainer-specific eval metrics by @qgallouedec in https://github.com/huggingface/trl/pull/5811 - Align and simplify the stable training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5812
- Align KTO with DPO: Align _compute_loss_liger flow by @albertvillanova in https://github.com/huggingface/trl/pull/5816
- Add tiny Qwen3.5 Think/NoThink fixture generation scripts by @aazizyan in https://github.com/huggingface/trl/pull/5819
- Migrate tests to Qwen3.5 Think/NoThink fixtures by @aazizyan in https://github.com/huggingface/trl/pull/5821
- Fix
OpenRewardSpecomitting task‑scoped tools during rollout binding (fixes #5727) by @rycerzes in https://github.com/huggingface/trl/pull/5729 - Add Qwen3.5 Think/NoThink training chat templates with generation markers by @aazizyan in https://github.com/huggingface/trl/pull/5824
- Release: v1.5 by @qgallouedec in https://github.com/huggingface/trl/pull/5835
Full Changelog: https://github.com/huggingface/trl/compare/v1.4.0...v1.5.0
Fetched May 25, 2026

