releases.shpreview
Home/Hugging Face
Hugging Face

Hugging Face

$npx @buildinternet/releases get hugging-face
releases.shpreview

TRL

$npx @buildinternet/releases get trl
Mon
Wed
Fri
MayJunJulAugSepOctNovDecJanFebMarAprMay
Less
More
Releases10Avg3/moVersionsv0.27.2 → v1.4.0
Recent Highlights8 releases · last 90 days

TRL moved toward production-grade reinforcement learning with v1.0.0, marking a transition from prototype frameworks to deployable training systems. Asynchronous GRPO decoupled generation from gradient updates by offloading rollouts to external vLLM servers, eliminating idle GPU time during training. VESPO (Variational Sequence-Level Soft Policy Optimization) replaced heuristic token-level clipping with a principled variational framework that derives smooth importance weighting, addressing training instability from policy staleness and asynchronous updates. Earlier releases hardened the foundation with async reward functions parallelized across GRPO and RLOO, vLLM 0.12.0 compatibility, tool-calling support for agent training, and memory optimizations like forward-masked logits that cut VRAM usage by up to 50 percent during forward passes.

Generated Apr 7, 2026, 5:28 PM UTC
Mar 2026
Mar 20263 releases

TRL hit v1.0 by shipping asynchronous GRPO, which offloads generation to external vLLM servers to parallelize rollouts and training while eliminating GPU idle time. The release also introduced VESPO, a variational framework that replaces heuristic token-level clipping with a principled Gamma weighting function to stabilize off-policy training. Earlier in the month, v0.29.1 fixed multimodal token handling across SFT/GRPO/RLOO and decoupled rollout dispatch from the vLLM backend to improve compatibility across versions.

Last Checked
18m ago
Latest
v1.4.0
Tracking since Jan 25, 2023
Last Checked
18m ago
Domain
huggingface.co
Accounts
huggingface
Tracking since May 31, 2019