releases.shpreview

v0.23.0

September 10, 2025TRLView original โ†—
$npx -y @buildinternet/releases show rel_wv5_RmKqdgm1UewPZvSJN

Major

๐Ÿฅ“ Context Parallelism

SFT now supports Context Parallelism (CP) for training large language models on very large sequences. You can now train with an arbitrarily long sequence length.

<img width="844" height="336" alt="Screenshot 2025-09-09 at 10 39 30โ€ฏPM" src="https://github.com/user-attachments/assets/f1dfc349-440a-4e05-aac9-439a3c286f08" />

by @kashif in https://github.com/huggingface/trl/pull/3994

๐Ÿงจ Dynamic Fine-Tuning

Dynamic Fine-Tuning (DFT) is a nnow supported in TRL.

from trl import SFTConfig

training_args = SFTConfig(
    loss_type="dft",
    ...
)
<img width="692" height="472" alt="Screenshot 2025-09-09 at 10 37 36โ€ฏPM" src="https://github.com/user-attachments/assets/4ee2b4ab-7cc6-4578-bfac-c38124891510" />

by @qgallouedec in https://github.com/huggingface/trl/pull/4042

๐Ÿชต Truncated Importance Sampling (TIS) to address rollout-training mismatch

Different implementations are used for rollout generation (vLLM) and model training. The implementation gap implicitly turns the on-policy RL to be off-policy. Truncated Importance Sampling (TIS) a simple yet effective importance sampling technique for handling such discrepancy. This is now implemented in GRPO.

from trl import GRPOConfig

training_args = GRPOConfig(
    ...
    use_vllm=True,
    vllm_importance_sampling_correction=True, # default True
    vllm_importance_sampling_cap=2.0, # hyper-parameter C
)

by @LeonEricsson in https://github.com/huggingface/trl/pull/3867

๐Ÿฅฃ [SFTTrainer]: Add Aux Loss for MoE models

Mixture of Experts (MoE) models require an auxiliary loss to ensure that the different experts are used evenly. This auxiliary loss is now supported in SFTTrainer.

training_args = SFTConfig(
    model_init_kwargs={"output_router_logits": True},
    ...
)

by @pramodith in https://github.com/huggingface/trl/pull/4012

๐Ÿ’ค [GRPO/RLOO] Adds an option to sleep vllm when running in colocated mode

When running GRPO (or RLOO) with vLLM in colocated mode, the vLLM server consume VRAM during optimization while not being used. We now have an option to put the vLLM server to sleep during optimization to free up VRAM.

from trl import GRPOConfig

training_args = GRPOConfig(..., vllm_sleep_enabled=True)

by @edbeeching in https://github.com/huggingface/trl/pull/3968

โš–๏ธ Add vLLM server mode and VLM support to OnlineDPOTrainer

You can now use vLLM server mode with OnlineDPOTrainer. Additionally, VLM models are now supported.

by @vaelev in https://github.com/huggingface/trl/pull/3783

Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations

The paper index has been significantly enhanced with the addition of 9+ new algorithm implementations, providing a more comprehensive resource for users.

by @behroozazarkhalili in https://github.com/huggingface/trl/pull/3990

Other Notable Changes

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.22.0...v0.23.0

Fetched April 7, 2026