The TRL v0.17 release introduces three major changes that, together, enable significantly faster generation performance in GRPO—up to 10x faster in some configurations.
These three changes are:
Below, we provide a summary of these changes and how to use them.
The TRL vLLM server now supports data parallelism (DP), enabling significantly faster generation speeds—especially for smaller models. This new feature can be used by adding the --data_parallel_size N argument when launching the vLLM server.
trl vllm-serve --model Qwen/Qwen2.5-14B-Instruct --tensor_parallel_size 2 --data_parallel_size 2
by @qgallouedec in https://github.com/huggingface/trl/pull/3310
Previously, GRPO made one generation request per global batch. The global batch is the total of all local batches, without accounting for gradient accumulation. In other words, if the gradient accumulation step was 8, GRPO would make 8 generation requests per training step.
Now, GRPO groups these global batches into a single "effective batch" and makes only one generation request per effective batch. Since vLLM applies optimizations that are especially effective for large batches, this new approach leads to significantly faster training overall.
No changes are required in the training script, as this is handled internally by the GRPO trainer.
by @qgallouedec in https://github.com/huggingface/trl/pull/3283
vLLM provides two versions of its engine (V0 and V1), and V1 is significantly faster. This version is now supported by TRL and requires vLLM version 0.8.3 or higher.
by @I-l-l-I in https://github.com/huggingface/trl/pull/3276
Disabling dropout has shown to stabilize training. You can now disable dropout in GRPO by setting the disable_dropout argument to False in the GRPO config.
from trl import GRPOConfig
training_args = GRPOConfig(..., disable_dropout=True)
by @edbeeching in https://github.com/huggingface/trl/pull/3234
GRPO now supports the various losses proposed in the recent literature, including the Dr. GRPO loss. The loss type can be set in the GRPO config:
from trl import GRPOConfig
training_args = GRPOConfig(..., loss_type="dr_grpo")
by @qgallouedec in https://github.com/huggingface/trl/pull/3256
The GRPO trainer now has an option to disable shuffling of the training dataset. This is useful for curriculum learning, where the order of the training data is important.
from trl import GRPOConfig
training_args = GRPOConfig(..., shuffle_dataset=False)
by @LeonEricsson in https://github.com/huggingface/trl/pull/3334
Overlong filtering has been shown to significantly stabilize learning and improve performance. You can now use it in TRL!
It simply consists in masking the loss of truncated samples
from trl import GRPOConfig
training_args = GRPOConfig(..., mask_truncated_completions=True)
by @shirinyamani in https://github.com/huggingface/trl/pull/3248
Liger allows to significantly reduce the memory peak of the loss computation. You can now use it in TRL with the use_liger_loss argument in the GRPO config:
from trl import GRPOConfig
training_args = GRPOConfig(..., use_liger_loss=True)
by @shivam15s in https://github.com/huggingface/trl/pull/3184
clip_ratio logging and better document logged values by @qgallouedec in https://github.com/huggingface/trl/pull/3145clip_ratio logging and better document logged values by @qgallouedec in https://github.com/huggingface/trl/pull/3145worker_cls as string by @qgallouedec in https://github.com/huggingface/trl/pull/3159ConstantLengthDataset by @qgallouedec in https://github.com/huggingface/trl/pull/3242formatting_func by @YeFD in https://github.com/huggingface/trl/pull/3147is_liger_kernel_available with min version by @qgallouedec in https://github.com/huggingface/trl/pull/3266test_raise_error_not_causallm by @qgallouedec in https://github.com/huggingface/trl/pull/3265_generate_and_score_completions by @syt-nju in https://github.com/huggingface/trl/pull/3336max_prompt_length < max_length by @LeonEricsson in https://github.com/huggingface/trl/pull/3341Full Changelog: https://github.com/huggingface/trl/compare/v0.16.0...v0.17.0
Fetched April 7, 2026