releases.shpreview

v1.5.0

May 25, 2026TRLView original ↗8 fixes · 4 features · 5 enhancements

Features

Even more training chat templates

Three more model families gain training-compatible templates with {% generation %} markers (so assistant_only_loss=True just works):

Final logits softcapping for async GRPO

The chunked LM-head path used by AsyncGRPOTrainer now supports models that use final_logit_softcapping (notably Gemma 2). _ChunkedLogProbFunction applies logit_scale, optional tanh-based softcapping, and temperature consistently in both forward and backward — softcapped models are no longer rejected.

by @mlarnouhet in https://github.com/huggingface/trl/pull/5691

KTO ↔ DPO alignment continues

Two more cycles closer to KTO graduation:

Trainer telemetry (opt-out)

_BaseTrainer.__init__ now emits a single anonymous huggingface_hub.send_telemetry ping per trainer instantiation, so we can finally see which trainers / model families / distributed backends are actually being used in practice and prioritize accordingly.

The payload is intentionally minimal — TRL version, trainer class name, model architecture, PEFT yes/no, distributed backend (deepspeed/fsdp/ddp/none), bucketed world size, device type, GPU model when available. No user data, no dataset names, no model paths, no hyperparameter values, never sent in CI / offline / HF_HUB_DISABLE_TELEMETRY mode.

See usage_stats.md for what's collected and how to opt out.

by @qgallouedec in https://github.com/huggingface/trl/pull/5758

Other

Fixes

Documentation and Examples

CI

New Contributors

What's Changed

Full Changelog: https://github.com/huggingface/trl/compare/v1.4.0...v1.5.0

Fetched May 25, 2026