v0.17.0 — TRL — releases.sh

Major and breaking

The TRL v0.17 release introduces three major changes that, together, enable significantly faster generation performance in GRPO—up to 10x faster in some configurations.

autonlp-08

These three changes are:

Data parallelism (DP) for the vLLM server
A new GRPO training strategy that generates once per effective batch
Support for the V1 engine in vLLM

Below, we provide a summary of these changes and how to use them.

⚡ Up to 4x faster: Data Parallel for vLLM server

The TRL vLLM server now supports data parallelism (DP), enabling significantly faster generation speeds—especially for smaller models. This new feature can be used by adding the --data_parallel_size N argument when launching the vLLM server.

trl vllm-serve --model Qwen/Qwen2.5-14B-Instruct --tensor_parallel_size 2 --data_parallel_size 2

by @qgallouedec in https://github.com/huggingface/trl/pull/3310

* ☝️ [GRPO] Generate once per effective batch

Previously, GRPO made one generation request per global batch. The global batch is the total of all local batches, without accounting for gradient accumulation. In other words, if the gradient accumulation step was 8, GRPO would make 8 generation requests per training step.

Now, GRPO groups these global batches into a single "effective batch" and makes only one generation request per effective batch. Since vLLM applies optimizations that are especially effective for large batches, this new approach leads to significantly faster training overall.

No changes are required in the training script, as this is handled internally by the GRPO trainer.

Untitled-2025-04-08-0623-2

by @qgallouedec in https://github.com/huggingface/trl/pull/3283

⏱️ Fix vLLM server to support V1 Engine

vLLM provides two versions of its engine (V0 and V1), and V1 is significantly faster. This version is now supported by TRL and requires vLLM version 0.8.3 or higher.

by @I-l-l-I in https://github.com/huggingface/trl/pull/3276

👎 [GRPO] Adds option to disable dropout

Disabling dropout has shown to stabilize training. You can now disable dropout in GRPO by setting the disable_dropout argument to False in the GRPO config.

from trl import GRPOConfig

training_args = GRPOConfig(..., disable_dropout=True)

by @edbeeching in https://github.com/huggingface/trl/pull/3234

🩺 Dr. GRPO loss

GRPO now supports the various losses proposed in the recent literature, including the Dr. GRPO loss. The loss type can be set in the GRPO config:

from trl import GRPOConfig

training_args = GRPOConfig(..., loss_type="dr_grpo")

by @qgallouedec in https://github.com/huggingface/trl/pull/3256

🎲 [GRPO] Make training dataset shuffle optional

The GRPO trainer now has an option to disable shuffling of the training dataset. This is useful for curriculum learning, where the order of the training data is important.

from trl import GRPOConfig

training_args = GRPOConfig(..., shuffle_dataset=False)

by @LeonEricsson in https://github.com/huggingface/trl/pull/3334

☕ Overlong-filtering for GRPO

Overlong filtering has been shown to significantly stabilize learning and improve performance. You can now use it in TRL!

It simply consists in masking the loss of truncated samples

from trl import GRPOConfig

training_args = GRPOConfig(..., mask_truncated_completions=True)

Untitled-2025-04-08-0623

by @shirinyamani in https://github.com/huggingface/trl/pull/3248

🐯 Integrate Liger GRPO Loss to GRPO Trainer

Liger allows to significantly reduce the memory peak of the loss computation. You can now use it in TRL with the use_liger_loss argument in the GRPO config:

from trl import GRPOConfig

training_args = GRPOConfig(..., use_liger_loss=True)

by @shivam15s in https://github.com/huggingface/trl/pull/3184

Bug fixes

Fix: Multi gpu hang for ORPO and CPO Trainer by @NanoCode012 in https://github.com/huggingface/trl/pull/3069
📊 Fix clip_ratio logging and better document logged values by @qgallouedec in https://github.com/huggingface/trl/pull/3145
⏯️ Fix: handle None inputs when resuming GRPO Trainer from checkpoint by @PenutChen in https://github.com/huggingface/trl/pull/3148
📎 Fix is_clipped to compute the effective clip_ratio by @pandong2011 in https://github.com/huggingface/trl/pull/3175
😷 Fix SFT masking EOS when equal to PAD by @qgallouedec in https://github.com/huggingface/trl/pull/3200
⏯️ Fix logging when resuming from checkpoint GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3185
💠 Fix multi-gpu padding free by @qgallouedec in https://github.com/huggingface/trl/pull/3245
🕷 Fix online DPO crash when model is a DataParallel object by @wilrop in https://github.com/huggingface/trl/pull/3225
🏁 Fix adding special tokens in SFT by @qgallouedec in https://github.com/huggingface/trl/pull/3328
🍡 Fix using reward model and DeepSpeed ZeRO 3 by @qgallouedec in https://github.com/huggingface/trl/pull/3326

What's Changed

Fix: Multi gpu hang for ORPO and CPO Trainer by @NanoCode012 in https://github.com/huggingface/trl/pull/3069
📊 Fix clip_ratio logging and better document logged values by @qgallouedec in https://github.com/huggingface/trl/pull/3145
BCOTrainer version upgrade fixes by @claralp in https://github.com/huggingface/trl/pull/2867
🐇 [Research] Layer Skip SFT by @ariG23498 in https://github.com/huggingface/trl/pull/3111
🤝 Align GRPO equation doc with the implementation by @qgallouedec in https://github.com/huggingface/trl/pull/3151
Enable number of printed completions to be set by @lewtun in https://github.com/huggingface/trl/pull/3149
🩹 Fix CI by @qgallouedec in https://github.com/huggingface/trl/pull/3155
⚰️ Remove deprecated by @qgallouedec in https://github.com/huggingface/trl/pull/3153
🔫 Disable triggering CI when PR is draft by @qgallouedec in https://github.com/huggingface/trl/pull/3154
👨‍🍳 vLLM serve: destroy process group on exit and pass worker_cls as string by @qgallouedec in https://github.com/huggingface/trl/pull/3159
💰 Richer rich table - log all the rewards by @qgallouedec in https://github.com/huggingface/trl/pull/3156
💎 Gemma 3 VLM SFT example script for single-image and multi-image by @sergiopaniego in https://github.com/huggingface/trl/pull/3131
[Liger] Liger KTO support by @vaibhavjindal in https://github.com/huggingface/trl/pull/2812
🏃 Migrate CI to self-hosted runners by @qgallouedec in https://github.com/huggingface/trl/pull/3174
❤️‍🩹 [CI] fix transformers dev CI failure by @kashif in https://github.com/huggingface/trl/pull/3176
⏯️ Fix: handle None inputs when resuming GRPO Trainer from checkpoint by @PenutChen in https://github.com/huggingface/trl/pull/3148
📎 Fix is_clipped to compute the effective clip_ratio by @pandong2011 in https://github.com/huggingface/trl/pull/3175
Fix breaking typo for flash_attention reducing_memory_usage.md by @burtenshaw in https://github.com/huggingface/trl/pull/3190
Show unique prompts in GRPO WandB tables by @lewtun in https://github.com/huggingface/trl/pull/3191
🐗 [CI] Fix trufflehog false positives by @lewtun in https://github.com/huggingface/trl/pull/3192
[GRPO] Improve completion length logging by @edbeeching in https://github.com/huggingface/trl/pull/3188
😷 Fix SFT masking EOS when equal to PAD by @qgallouedec in https://github.com/huggingface/trl/pull/3200
🗝️ Fix type hint in vLLM client by @qgallouedec in https://github.com/huggingface/trl/pull/3205
📚 Accumulate completions for logging by @lewtun in https://github.com/huggingface/trl/pull/3217
Group completion metrics by common prefix by @lewtun in https://github.com/huggingface/trl/pull/3212
🐯 Integrate Liger GRPO Loss to GRPO Trainer by @shivam15s in https://github.com/huggingface/trl/pull/3184
Update ruff to 11.3 and base Python version to 3.9 by @cyyever in https://github.com/huggingface/trl/pull/3230
⏯️ Fix logging when resuming from checkpoint GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3185
📢 Improve GRPO trainer error message for invalid num_generations by @AliBakly in https://github.com/huggingface/trl/pull/3199
🎀 Simplify logging text by @qgallouedec in https://github.com/huggingface/trl/pull/3219
🌊 Add error for iterable datasets in GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/3216
⏳ PPOTrainer: fix progress bar for num_mini_batches > 1 by @dawidm in https://github.com/huggingface/trl/pull/2531
☑ Update PULL_REQUEST_TEMPLATE.md by @qgallouedec in https://github.com/huggingface/trl/pull/3241
🔭 Add support for better KL estimator (k3) in PPOTrainer by @AMindToThink in https://github.com/huggingface/trl/pull/3240
🏃 Fix and make CI faster by @qgallouedec in https://github.com/huggingface/trl/pull/3160
🗑️ Deprecate ConstantLengthDataset by @qgallouedec in https://github.com/huggingface/trl/pull/3242
📦 [SFT] Deprecate batched formatting_func by @YeFD in https://github.com/huggingface/trl/pull/3147
💠 Fix multi-gpu padding free by @qgallouedec in https://github.com/huggingface/trl/pull/3245
☕ Overlong-filtering for GRPO by @shirinyamani in https://github.com/huggingface/trl/pull/3248
📜 Fix license and copyrights by @qgallouedec in https://github.com/huggingface/trl/pull/3264
⛏️ Add cli dict parsing for grpo_config by @Tavish9 in https://github.com/huggingface/trl/pull/3082
🐯 is_liger_kernel_available with min version by @qgallouedec in https://github.com/huggingface/trl/pull/3266
🕷 Fix online DPO crash when model is a DataParallel object by @wilrop in https://github.com/huggingface/trl/pull/3225
👎 [GRPO] Adds option to disable dropout by @edbeeching in https://github.com/huggingface/trl/pull/3234
🚧 Temporarily restrict diffusers to <0.33.0 due to ftfy optional dep issue breaking doc builds by @qgallouedec in https://github.com/huggingface/trl/pull/3273
♾️ [CI] Remove test_raise_error_not_causallm by @qgallouedec in https://github.com/huggingface/trl/pull/3265
🩺 Dr. GRPO loss by @qgallouedec in https://github.com/huggingface/trl/pull/3256
🔗 Fix Dr. GRPO paper link by @qgallouedec in https://github.com/huggingface/trl/pull/3275
Add Fine-tuning a Multimodal Model Using SFT (Single or Multi-Image Dataset) guide to docs by @sergiopaniego in https://github.com/huggingface/trl/pull/3235
🕊️ Un-restrict diffusers by @qgallouedec in https://github.com/huggingface/trl/pull/3274
🦾 Test vLLM client-server by @qgallouedec in https://github.com/huggingface/trl/pull/3277
⏱️ Fix vLLM server to support V1 Engine by @I-l-l-I in https://github.com/huggingface/trl/pull/3276
Expose EOS token in SFTConfig by @lewtun in https://github.com/huggingface/trl/pull/3299
🏷️ Fixed naming error in output_dir for Gemma 3 VLM script by @sergiopaniego in https://github.com/huggingface/trl/pull/3297
🧗 Add Ascend NPU support for vLLM server by @ji-huazhong in https://github.com/huggingface/trl/pull/3286
🅾️ Fixes typo in SFTTrainer by @taras-sereda in https://github.com/huggingface/trl/pull/3282
[GRPO] Add metrics for low and high clipped token probabilities by @lewtun in https://github.com/huggingface/trl/pull/3289
☝️ [GRPO] Generate once per effective batch by @qgallouedec in https://github.com/huggingface/trl/pull/3283
🎲 [GRPO] Make training dataset shuffle optional by @LeonEricsson in https://github.com/huggingface/trl/pull/3334
🙋 Add Optional Eager Execution Mode for vLLM Serving by @ucalyptus in https://github.com/huggingface/trl/pull/3335
Fix typo in text_environments.md by @sunjin-k in https://github.com/huggingface/trl/pull/3305
✅ [doc] Update sft_trainer.md in table x->✓ by @HERIUN in https://github.com/huggingface/trl/pull/3313
🧸 Fix unset tokenizer pad_token by @LeonEricsson in https://github.com/huggingface/trl/pull/3290
💡 Fix type hint in _generate_and_score_completions by @syt-nju in https://github.com/huggingface/trl/pull/3336
🦄 Add optional uvicorn log level for vLLM serve by @I-l-l-I in https://github.com/huggingface/trl/pull/3338
[CPO] Check that max_prompt_length < max_length by @LeonEricsson in https://github.com/huggingface/trl/pull/3341
🏁 Fix adding special tokens in SFT by @qgallouedec in https://github.com/huggingface/trl/pull/3328
Define default chat template for SFT by @lewtun in https://github.com/huggingface/trl/pull/3309
🍡 Fix using reward model and DeepSpeed ZeRO 3 by @qgallouedec in https://github.com/huggingface/trl/pull/3326
⚡ Up to 4x faster: Data Parallel for vLLM server by @qgallouedec in https://github.com/huggingface/trl/pull/3310
Release: v0.17 by @qgallouedec in https://github.com/huggingface/trl/pull/3356

New Contributors

@NanoCode012 made their first contribution in https://github.com/huggingface/trl/pull/3069
@ariG23498 made their first contribution in https://github.com/huggingface/trl/pull/3111
@PenutChen made their first contribution in https://github.com/huggingface/trl/pull/3148
@pandong2011 made their first contribution in https://github.com/huggingface/trl/pull/3175
@shivam15s made their first contribution in https://github.com/huggingface/trl/pull/3184
@cyyever made their first contribution in https://github.com/huggingface/trl/pull/3230
@AMindToThink made their first contribution in https://github.com/huggingface/trl/pull/3240
@YeFD made their first contribution in https://github.com/huggingface/trl/pull/3147
@Tavish9 made their first contribution in https://github.com/huggingface/trl/pull/3082
@wilrop made their first contribution in https://github.com/huggingface/trl/pull/3225
@I-l-l-I made their first contribution in https://github.com/huggingface/trl/pull/3276
@taras-sereda made their first contribution in https://github.com/huggingface/trl/pull/3282
@LeonEricsson made their first contribution in https://github.com/huggingface/trl/pull/3334
@ucalyptus made their first contribution in https://github.com/huggingface/trl/pull/3335
@sunjin-k made their first contribution in https://github.com/huggingface/trl/pull/3305
@HERIUN made their first contribution in https://github.com/huggingface/trl/pull/3313
@syt-nju made their first contribution in https://github.com/huggingface/trl/pull/3336

Full Changelog: https://github.com/huggingface/trl/compare/v0.16.0...v0.17.0