Response parsing hang fixed; CUDA memory leak patched…

Features

Even more training chat templates

Three more model families gain training-compatible templates with {% generation %} markers (so assistant_only_loss=True just works):

Phi-3.5 by @DagaBhai in https://github.com/huggingface/trl/pull/5746
Qwen3-VL by @aazizyan in https://github.com/huggingface/trl/pull/5764
Qwen3.5 Think / NoThink by @aazizyan in https://github.com/huggingface/trl/pull/5824

Final logits softcapping for async GRPO

The chunked LM-head path used by AsyncGRPOTrainer now supports models that use final_logit_softcapping (notably Gemma 2). _ChunkedLogProbFunction applies logit_scale, optional tanh-based softcapping, and temperature consistently in both forward and backward — softcapped models are no longer rejected.

by @mlarnouhet in https://github.com/huggingface/trl/pull/5691

KTO ↔ DPO alignment continues

Two more cycles closer to KTO graduation:

Align compute_loss flow by @albertvillanova in https://github.com/huggingface/trl/pull/5810
Align _compute_loss_liger flow by @albertvillanova in https://github.com/huggingface/trl/pull/5816

Trainer telemetry (opt-out)

_BaseTrainer.__init__ now emits a single anonymous huggingface_hub.send_telemetry ping per trainer instantiation, so we can finally see which trainers / model families / distributed backends are actually being used in practice and prioritize accordingly.

The payload is intentionally minimal — TRL version, trainer class name, model architecture, PEFT yes/no, distributed backend (deepspeed/fsdp/ddp/none), bucketed world size, device type, GPU model when available. No user data, no dataset names, no model paths, no hyperparameter values, never sent in CI / offline / HF_HUB_DISABLE_TELEMETRY mode.

See usage_stats.md for what's collected and how to opt out.

by @qgallouedec in https://github.com/huggingface/trl/pull/5758

Other

OpenRewardSpec: fix omitting task-scoped tools during rollout binding (fixes #5727) by @rycerzes in https://github.com/huggingface/trl/pull/5729
Add OpenReward example to the list of examples by @sergiopaniego in https://github.com/huggingface/trl/pull/5752
Add DDP-2 members to invariant test suite by @qgallouedec in https://github.com/huggingface/trl/pull/5736
Align and simplify the stable training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5812
Replace uv installation script with setup action by @qgallouedec in https://github.com/huggingface/trl/pull/5735

Fixes

Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing — GRPOTrainer was hanging indefinitely on truncated <tool_call> blocks (a degenerate case that happens naturally when generation hits max_completion_length mid-tool-call). Rewrote the regex to be non-backtracking — worst case goes from O(2ⁿ) to O(n). By @xodn348 in https://github.com/huggingface/trl/pull/5798
CUDA memory leak: release BNB dequantization buffers & stale state in OffloadActivations — follow-up to v1.4's activation-offloading leak fix. By @butterwecksolutions in https://github.com/huggingface/trl/pull/5730
Invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in https://github.com/huggingface/trl/pull/4693
Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in https://github.com/huggingface/trl/pull/5592
Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in https://github.com/huggingface/trl/pull/5799
Fix metric_for_best_model for trainer-specific eval metrics by @qgallouedec in https://github.com/huggingface/trl/pull/5811
Fix generate_batch: inference tensors blocking inplace ops in background thread by @albertvillanova in https://github.com/huggingface/trl/pull/5818
Replace deprecated torch_dtype with dtype across examples, docs, notebooks, tests, and experimental distillation / gold trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5717

Documentation and Examples

docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in https://github.com/huggingface/trl/pull/5740

CI

Migrate tests to Qwen3.5 Think/NoThink fixtures + tiny-model generation scripts by @aazizyan in https://github.com/huggingface/trl/pull/5819 and https://github.com/huggingface/trl/pull/5821
Align tiny Glm4MoeForCausalLM / Cohere / Cohere2 / Qwen2.5-VL configs with their reference models by @qgallouedec in https://github.com/huggingface/trl/pull/5638, https://github.com/huggingface/trl/pull/5706, https://github.com/huggingface/trl/pull/5707 and https://github.com/huggingface/trl/pull/5739
Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip by @qgallouedec in https://github.com/huggingface/trl/pull/5779
Fix tiny Qwen2.5-VL fullatt_block_indexes out of range for depth=2 by @albertvillanova in https://github.com/huggingface/trl/pull/5805
Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in https://github.com/huggingface/trl/pull/5795
Fix vision config num_heads key in Qwen VL tiny model scripts by @matdou in https://github.com/huggingface/trl/pull/5792
Drop unjustified model.visual. skip in GRPO/RLOO Qwen2.5-VL tests by @qgallouedec in https://github.com/huggingface/trl/pull/5780
Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in https://github.com/huggingface/trl/pull/5778
Remove obsolete Gemma 3 vision-head guard from VLM training tests by @qgallouedec in https://github.com/huggingface/trl/pull/5772
Fix OOM in CI: reduce batch size in VLM SFT / GRPO/RLOO VLM / toolcall tests by @albertvillanova in https://github.com/huggingface/trl/pull/5687, https://github.com/huggingface/trl/pull/5767, https://github.com/huggingface/trl/pull/5801
Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in https://github.com/huggingface/trl/pull/5776
Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma 4 by @albertvillanova in https://github.com/huggingface/trl/pull/5760
Fix CI errors in response parsing for gpt-oss/llama with transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/5755
Fix CI AttributeError: 'GptOssConfig' object has no attribute 'num_experts' by @albertvillanova in https://github.com/huggingface/trl/pull/5756
Fix CI apply_model_revisions by removing _commit_hash kwarg by @albertvillanova in https://github.com/huggingface/trl/pull/5762
Fix CI test to avoid skipping model.visual params by @albertvillanova in https://github.com/huggingface/trl/pull/5806
Fix transformers min version for tiny gemma 4 as 5.5.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5763
Hotfix CI: pin torch < 2.12.0 (later reverted) by @albertvillanova in https://github.com/huggingface/trl/pull/5769
Fix catch-all empty string in Makefile pytest --only-rerun by @albertvillanova in https://github.com/huggingface/trl/pull/5784
chore: update tests_latest.yml by @hf-security-analysis[bot] in https://github.com/huggingface/trl/pull/5733

New Contributors

@hf-security-analysis[bot] made their first contribution in https://github.com/huggingface/trl/pull/5733
@Beichen-Ma made their first contribution in https://github.com/huggingface/trl/pull/5592
@DagaBhai made their first contribution in https://github.com/huggingface/trl/pull/5746
@xodn348 made their first contribution in https://github.com/huggingface/trl/pull/5740
@mlarnouhet made their first contribution in https://github.com/huggingface/trl/pull/5691
@matdou made their first contribution in https://github.com/huggingface/trl/pull/5792
@jamie-peterson-ml made their first contribution in https://github.com/huggingface/trl/pull/5799
@rycerzes made their first contribution in https://github.com/huggingface/trl/pull/5729

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5734
chore: update tests_latest.yml by @hf-security-analysis[bot] in https://github.com/huggingface/trl/pull/5733
fix: CUDA memory leak / release BNB dequantization buffers & stale state in OffloadActivations by @butterwecksolutions in https://github.com/huggingface/trl/pull/5730
fix: invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in https://github.com/huggingface/trl/pull/4693
Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in https://github.com/huggingface/trl/pull/5592
feat: add Phi-3.5 training chat templates with generation markers by @DagaBhai in https://github.com/huggingface/trl/pull/5746
docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in https://github.com/huggingface/trl/pull/5740
torch_dtype -> dtype by @qgallouedec in https://github.com/huggingface/trl/pull/5717
Add OpenReward example to the list of examples by @sergiopaniego in https://github.com/huggingface/trl/pull/5752
Fix CI errors in response parsing for gptoss/llama with transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/5755
Add DDP-2 members to invariant test suite by @qgallouedec in https://github.com/huggingface/trl/pull/5736
Hotfix CI param not updated AssertionError: Pin torch < 2.12.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5769
Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config by @qgallouedec in https://github.com/huggingface/trl/pull/5638
Align tiny Cohere config with aya-expanse-8b by @qgallouedec in https://github.com/huggingface/trl/pull/5706
Align tiny Cohere2 config with tiny-aya-earth by @qgallouedec in https://github.com/huggingface/trl/pull/5707
Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma4 by @albertvillanova in https://github.com/huggingface/trl/pull/5760
Fix CI AttributeError: 'GptOssConfig' object has no attribute 'num_experts' by @albertvillanova in https://github.com/huggingface/trl/pull/5756
Fix CI apply_model_revisions by removing _commit_hash kwarg by @albertvillanova in https://github.com/huggingface/trl/pull/5762
Remove obsolete Gemma3 vision-head guard from VLM training tests by @qgallouedec in https://github.com/huggingface/trl/pull/5772
Replace uv installation script with setup action by @qgallouedec in https://github.com/huggingface/trl/pull/5735
Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in https://github.com/huggingface/trl/pull/5776
Fix transformers min version for tiny gemma4 as 5.5.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5763
Final logits softcapping support for async GRPO Trainer by @mlarnouhet in https://github.com/huggingface/trl/pull/5691
Fix vision config num_heads key in Qwen VL tiny model scripts and revert torch pin by @matdou in https://github.com/huggingface/trl/pull/5792
Drop unjustified model.visual. skip in GRPO / RLOO Qwen2.5-VL tests by @qgallouedec in https://github.com/huggingface/trl/pull/5780
Fix OOM in CI by reducing batch size and sequence length for toolcall tests by @albertvillanova in https://github.com/huggingface/trl/pull/5801
Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing by @xodn348 in https://github.com/huggingface/trl/pull/5798
Add telemetry to trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5758
Add Qwen3-VL training chat template with generation markers by @aazizyan in https://github.com/huggingface/trl/pull/5764
Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct by @qgallouedec in https://github.com/huggingface/trl/pull/5739
Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip by @qgallouedec in https://github.com/huggingface/trl/pull/5779
Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests by @albertvillanova in https://github.com/huggingface/trl/pull/5767
Fix catch-all empty string in Makefile pytest --only-rerun by @albertvillanova in https://github.com/huggingface/trl/pull/5784
Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in https://github.com/huggingface/trl/pull/5795
Fix tiny Qwen2.5-VL fullatt_block_indexes out of range for depth=2 by @albertvillanova in https://github.com/huggingface/trl/pull/5805
Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in https://github.com/huggingface/trl/pull/5778
Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in https://github.com/huggingface/trl/pull/5799
Fix CI test to avoid skipping model.visual params by @albertvillanova in https://github.com/huggingface/trl/pull/5806
Align KTO with DPO: Align compute_loss flow by @albertvillanova in https://github.com/huggingface/trl/pull/5810
Fix generate_batch: inference tensors block inplace ops in background thread by @albertvillanova in https://github.com/huggingface/trl/pull/5818
Fix metric_for_best_model for trainer-specific eval metrics by @qgallouedec in https://github.com/huggingface/trl/pull/5811
Align and simplify the stable training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/5812
Align KTO with DPO: Align _compute_loss_liger flow by @albertvillanova in https://github.com/huggingface/trl/pull/5816
Add tiny Qwen3.5 Think/NoThink fixture generation scripts by @aazizyan in https://github.com/huggingface/trl/pull/5819
Migrate tests to Qwen3.5 Think/NoThink fixtures by @aazizyan in https://github.com/huggingface/trl/pull/5821
Fix OpenRewardSpec omitting task‑scoped tools during rollout binding (fixes #5727) by @rycerzes in https://github.com/huggingface/trl/pull/5729
Add Qwen3.5 Think/NoThink training chat templates with generation markers by @aazizyan in https://github.com/huggingface/trl/pull/5824
Release: v1.5 by @qgallouedec in https://github.com/huggingface/trl/pull/5835

Full Changelog: https://github.com/huggingface/trl/compare/v1.4.0...v1.5.0

Response parsing hang fixed; CUDA memory leak patched