Home/Hugging Face

Hugging Face

$npx @buildinternet/releases get hugging-face

Overview Releases Sources

releases.shpreview

Hugging Face/TRL

TRL

$npx @buildinternet/releases get trl

Sun

Mon

Tue

Wed

Thu

Fri

Sat

MayJunJulAugSepOctNovDecJanFebMarAprMay

Less

Releases10Avg3/moVersionsv0.27.2 → v1.4.0

Highlights All Releases

Recent Highlights8 releases · last 90 days

TRL moved toward production-grade reinforcement learning with v1.0.0, marking a transition from prototype frameworks to deployable training systems. Asynchronous GRPO decoupled generation from gradient updates by offloading rollouts to external vLLM servers, eliminating idle GPU time during training. VESPO (Variational Sequence-Level Soft Policy Optimization) replaced heuristic token-level clipping with a principled variational framework that derives smooth importance weighting, addressing training instability from policy staleness and asynchronous updates. Earlier releases hardened the foundation with async reward functions parallelized across GRPO and RLOO, vLLM 0.12.0 compatibility, tool-calling support for agent training, and memory optimizations like forward-masked logits that cut VRAM usage by up to 50 percent during forward passes.

Generated Apr 7, 2026, 5:28 PM UTC

Mar 2026

Mar 20263 releases

TRL hit v1.0 by shipping asynchronous GRPO, which offloads generation to external vLLM servers to parallelize rollouts and training while eliminating GPU idle time. The release also introduced VESPO, a variational framework that replaces heuristic token-level clipping with a principled Gamma weighting function to stabilize off-policy training. Earlier in the month, v0.29.1 fixed multimodal token handling across SFT/GRPO/RLOO and decoupled rollout dispatch from the vLLM backend to improve compatibility across versions.

Last Checked

18m ago

Latest

v1.4.0

Source

@huggingface/trl

Tracking since Jan 25, 2023

.json·.md·.atom

Last Checked

18m ago

Domain

huggingface.co