v1.0.0
Read our blog post for an overview of TRL v1.
Features
Asynchronous GRPO
Asynchronous GRPO decouples generation from the gradient update loop by offloading rollouts to an external vLLM server. Generation runs in parallel while training continues, eliminating idle GPU time and improving hardware utilization.
from trl.experimental.async_grpo import AsyncGRPOTrainer
from trl.rewards import accuracy_reward
from datasets import load_dataset
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = AsyncGRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()
by @qgallouedec in https://github.com/huggingface/trl/pull/5293
Variational Sequence-Level Soft Policy Optimization (VESPO)
<img width="465" height="279" alt="Screenshot 2026-03-20 at 5 49 50 PM" src="https://github.com/user-attachments/assets/b60c9697-6eb7-498e-95b3-df78c367f5fa" />VESPO addresses training instability in off-policy RL caused by policy staleness, asynchronous updates, and train-inference mismatches. Rather than relying on heuristic token-level clipping (GRPO) or sequence-length normalization (GSPO), VESPO derives a principled reshaping kernel from a variational framework. In practice, this yields a smooth, asymmetric Gamma weighting function that gracefully suppresses extreme sequence-level importance weights without introducing length bias. It can be enabled via the loss_type parameter of GRPOConfig:
from trl import GRPOConfig, GRPOTrainer
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(loss_type="vespo"),
...
)
by @casinca in https://github.com/huggingface/trl/pull/5199
Divergence Proximal Policy Optimization (DPPO)
<img width="3180" height="1187" alt="z_TXYw37xZqsQ21YiDkYL" src="https://github.com/user-attachments/assets/40f1d538-82b3-4097-91c6-119ea9f7797b" /> <img width="1189" height="490" alt="SfgWotuuuRKPkg-0bxWv1" src="https://github.com/user-attachments/assets/2b090df3-0bfb-42e4-9f94-15943736e689" />DPPO is a new experimental trainer that replaces the standard PPO clipping mechanism with divergence constraints, providing more principled trust-region updates.
by @LeonEricsson in https://github.com/huggingface/trl/pull/5117
Self-Distillation Policy Optimization (SDPO)
SDPO is a new experimental trainer that augments on-policy RL with self-distillation from the model's own high-reward trajectories. Instead of using an external teacher, SDPO treats the current model conditioned on feedback as a self-teacher, distilling its feedback-informed predictions back into the policy.
from trl.experimental import SDPOTrainer, SDPOConfig
config = SDPOConfig(
output_dir="./results",
num_generations=8,
success_reward_threshold=1.0,
use_successful_as_teacher=True,
)
trainer = SDPOTrainer(
model="Qwen/Qwen2.5-Math-1.5B-Instruct",
reward_funcs=[accuracy_reward],
args=config,
train_dataset=dataset,
)
trainer.train()
by @MengAiDev in https://github.com/huggingface/trl/pull/4935
Reward functions can now log extra columns and scalar metrics
Reward functions can return a dictionary of extra values (scalars or per-sample columns) that will be logged alongside the reward. This makes it easier to track intermediate signals without writing custom callbacks.
def my_reward_fn(completions, answer, log_extra=None, log_metric=None, **kwargs):
extracted = [extract_answer(c) for c in completions]
rewards = [1.0 if e == a else 0.0 for e, a in zip(extracted, answer)]
if log_extra:
log_extra("golden_answer", list(answer))
log_extra("extracted_answer", extracted)
if log_metric:
log_metric("accuracy", sum(rewards) / len(rewards))
return rewards
<img width="1400" height="407" alt="image" src="https://github.com/user-attachments/assets/d345b0ac-0d3c-446f-9321-a26e73ee16b4" />
<img width="1353" height="673" alt="image" src="https://github.com/user-attachments/assets/b4c0302b-f69a-4715-9aad-278b4ad13299" />
by @manueldeprada in https://github.com/huggingface/trl/pull/5233
Tool calling support in VLLMClient.chat()
VLLMClient.chat() now supports tool calling, enabling agentic workflows directly through the vLLM client interface.
by @kansalaman in https://github.com/huggingface/trl/pull/4889
35% faster packing
BFD packing is 35% faster. The "bfd-requeue" packing strategy has also been renamed to "bfd_split". See MIGRATION.md for details.
by @mariosasko in https://github.com/huggingface/trl/pull/5189
[GKD] Buffer implementation and vLLM inference for distillation trainer
The GKD/GOLD trainer now supports buffered rollout generation, decoupling generation from gradient updates for more efficient distillation. vLLM inference support has also been added to the base self-distillation trainer.
by @cmpatino in https://github.com/huggingface/trl/pull/5137 and https://github.com/huggingface/trl/pull/5388
v0 → v1 migration guide
A MIGRATION.md guide has been added covering all breaking changes when upgrading from TRL v0 to v1. If you're already on v0.29, the changes are minimal.
by @qgallouedec in https://github.com/huggingface/trl/pull/5255
Other
- Change default
vllm_modeto"colocate"by @qgallouedec in https://github.com/huggingface/trl/pull/5255 - Support
truncation_modein SFT by @albertvillanova in https://github.com/huggingface/trl/pull/5306 - Support
max_lengthin DPO VLM training by @albertvillanova in https://github.com/huggingface/trl/pull/5284 - Add
pad_to_multiple_ofto GRPOTrainer and RLOOTrainer by @czkkkkkk in https://github.com/huggingface/trl/pull/5180 - Support sequence sampling in Liger Kernel by @michaelroyzen in https://github.com/huggingface/trl/pull/5190
- Add tool calling support to VLLMClient.chat() by @kansalaman in https://github.com/huggingface/trl/pull/4889
- Add support for raw token IDs in vLLM client prompts by @qgallouedec in https://github.com/huggingface/trl/pull/5225
- Add VLM support when passing raw token IDs to vLLM client by @qgallouedec in https://github.com/huggingface/trl/pull/5227
- Enhance
print_prompt_completions_sampleto include reasoning content by @qgallouedec in https://github.com/huggingface/trl/pull/5327 - Add support for
pixel_position_idsvision key by @qgallouedec in https://github.com/huggingface/trl/pull/5374 - Add second version of Qwen 3.5 chat template by @apardyl in https://github.com/huggingface/trl/pull/5405
- Pass tools as
Nonetoapply_chat_templatewhen it is an empty list by @rabinadk1 in https://github.com/huggingface/trl/pull/5380
Fixes
- Fix DPOTrainer collators to truncate sequences before padding by @albertvillanova in https://github.com/huggingface/trl/pull/5305
- Prevent corruption of DPO VLM training if "keep_end" truncation_mode by @albertvillanova in https://github.com/huggingface/trl/pull/5286
- Fix mm_token_type_ids silently dropped in DPO VLM training by @albertvillanova in https://github.com/huggingface/trl/pull/5279
- Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model by @albertvillanova in https://github.com/huggingface/trl/pull/5295
- Fix
accuracy_rewardcrash when called from non-main thread by @qgallouedec in https://github.com/huggingface/trl/pull/5281 - Fix GRPOTrainer attribute access for vLLM model config by @falcondai in https://github.com/huggingface/trl/pull/5302
- [GRPO] Fix re-tokenization bug in tool-calling loop by @qgallouedec in https://github.com/huggingface/trl/pull/5242
- [CPO/ORPO] Fix handling of different length chosen/rejected prompts by @davmels in https://github.com/huggingface/trl/pull/4639
- Fix
RewardFunctype alias to reflect actual calling convention by @s-zx in https://github.com/huggingface/trl/pull/5246 - fix(ppo): add gradient_checkpointing_enable/disable to PolicyAndValueWrapper by @s-zx in https://github.com/huggingface/trl/pull/5245
- Fix
prepare_multimodal_messagesto supporttool_callsandtoolrole by @alvarobartt in https://github.com/huggingface/trl/pull/5212 - Fix support for model_init_kwargs when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5230
- Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5274
- Fix support for model_init_kwargs in GKD/GOLD when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5266
- Sync entire prompt/completion token tensors before indexing by @shawnghu in https://github.com/huggingface/trl/pull/5218
- Clean up model update group on worker exit by @AmineDiro in https://github.com/huggingface/trl/pull/5325
- Fix prefix EOS slicing for tool suffix (with Qwen3/3.5 chat templates) by @casinca in https://github.com/huggingface/trl/pull/5330
- Fix: apply reward_weights to logged reward/reward_std in GRPOTrainer by @lailanelkoussy in https://github.com/huggingface/trl/pull/5353
- Fix IDs shape mismatch in SFT for VLMs with text-only by @albertvillanova in https://github.com/huggingface/trl/pull/5354
Documentation and Examples
- Add minimal CARLA example script by @sergiopaniego in https://github.com/huggingface/trl/pull/5161
- Nemotron 3 examples added by @sergiopaniego in https://github.com/huggingface/trl/pull/5272
- Align docs about tool calling in trainers with dataset format by @albertvillanova in https://github.com/huggingface/trl/pull/5311
- Add repository-specific guidance for agents (
AGENTS.md) by @qgallouedec in https://github.com/huggingface/trl/pull/5236 - Align documentation with the intended public API by @qgallouedec in https://github.com/huggingface/trl/pull/5162
- Update openenv examples to use
environment_factoryby @sergiopaniego in https://github.com/huggingface/trl/pull/5235 - Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrainer by @DhruvvArora in https://github.com/huggingface/trl/pull/5347
- Centralize AI agent templates in
.aiby @qgallouedec in https://github.com/huggingface/trl/pull/5268
What's Changed
- ⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/5182
- Handle mm_token_type_ids in SFT/GRPO/RLOO to fix IndexError by @albertvillanova in https://github.com/huggingface/trl/pull/5178
- Document parameters with differing default values in core configs by @albertvillanova in https://github.com/huggingface/trl/pull/5168
- Make _BaseConfig and _BaseTrainer explicitly private by @albertvillanova in https://github.com/huggingface/trl/pull/5169
- Refactor CLI [4/N]: Replace top-level TrlParser with ArgumentParser by @albertvillanova in https://github.com/huggingface/trl/pull/5170
- Add minimal CARLA example script by @sergiopaniego in https://github.com/huggingface/trl/pull/5161
- Align documentation with the intended public API by @qgallouedec in https://github.com/huggingface/trl/pull/5162
- Fix deprecation warning of create_reference_model by @albertvillanova in https://github.com/huggingface/trl/pull/5184
- Fix deprecation warning of fork in multi-threaded process by @albertvillanova in https://github.com/huggingface/trl/pull/5185
- Refactor CLI [5/N]: Refactor TrainingCommand with delayed imports by @albertvillanova in https://github.com/huggingface/trl/pull/5186
- Refactor CLI [6/N]: Refactor env/vllm-serve commands with delayed imports by @albertvillanova in https://github.com/huggingface/trl/pull/5187
- Fix CI tests patching BaseTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/5192
- Add
pad_to_multiple_ofto GRPOTrainer and RLOOTrainer by @czkkkkkk in https://github.com/huggingface/trl/pull/5180 - Re-add liger-kernel to dev deps by @qgallouedec in https://github.com/huggingface/trl/pull/5164
- Set CI PYTORCH_ALLOC_CONF env variable to avoid OOM by @albertvillanova in https://github.com/huggingface/trl/pull/5197
- Support sequence sampling in Liger Kernel and pass importance_samplin… by @michaelroyzen in https://github.com/huggingface/trl/pull/5190
- Mark CI test_training_vlm_and_liger as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/5202
- Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn by @albertvillanova in https://github.com/huggingface/trl/pull/5122
- CI: Add Qwen 3.5 tiny model to tests by @qgallouedec in https://github.com/huggingface/trl/pull/5204
- Add support for Qwen3.5 for agent training by @qgallouedec in https://github.com/huggingface/trl/pull/5205
- Update vLLM version support to include 0.13.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5206
- feat: Add tool calling support to VLLMClient.chat() by @kansalaman in https://github.com/huggingface/trl/pull/4889
- Refactor CLI [7/N]: Move patching to compat and import transformers conditionally by @albertvillanova in https://github.com/huggingface/trl/pull/5208
- Update vLLM version support to include 0.14.0 and 0.14.1 by @qgallouedec in https://github.com/huggingface/trl/pull/5214
- Refactor CLI [8/N]: Refactor scripts/utils with delayed imports by @albertvillanova in https://github.com/huggingface/trl/pull/5209
- Simplify logic for structured outputs across vLLM versions by @albertvillanova in https://github.com/huggingface/trl/pull/5215
- Refactor CLI [9/N]: Replace HfArgumentParser from transformers with local by @albertvillanova in https://github.com/huggingface/trl/pull/5210
- Refactor CLI [10/N]: Refactor scripts with delayed imports by @albertvillanova in https://github.com/huggingface/trl/pull/5219
- Refactor CLI [11/N]: Refactor scripts/vllm_serve with delayed imports by @albertvillanova in https://github.com/huggingface/trl/pull/5220
- Refactor CLI [12/N]: Fix command name in scripts help usage by @albertvillanova in https://github.com/huggingface/trl/pull/5221
- Refactor CLI [13/N]: Pass clean training args to scripts by @albertvillanova in https://github.com/huggingface/trl/pull/5223
- Fix
prepare_multimodal_messagesto supporttool_callsandtoolrole by @alvarobartt in https://github.com/huggingface/trl/pull/5212 - Fix link to Hugging Face Hub in OpenEnv documentation by @thesteve0 in https://github.com/huggingface/trl/pull/5229
- Fix type for model_init_kwargs when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5230
- Add repository-specific guidance for agents (
AGENTS.md) by @qgallouedec in https://github.com/huggingface/trl/pull/5236 - Add support for raw ids in
promptsin vLLM client and server by @qgallouedec in https://github.com/huggingface/trl/pull/5225 - Deprecate
truncate_prompt_tokensfor vLLM 0.17.0 by @winglian in https://github.com/huggingface/trl/pull/5248 - Add VLM support when passing raw token IDs to vLLM client by @qgallouedec in https://github.com/huggingface/trl/pull/5227
- Move
rollout_funcfrom_generate_single_turnto_generateby @qgallouedec in https://github.com/huggingface/trl/pull/5232 - Fix
RewardFunctype alias to reflect actual calling convention by @s-zx in https://github.com/huggingface/trl/pull/5246 - [GRPO] In-place temperature scaling operation by @winglian in https://github.com/huggingface/trl/pull/5254
- Update vLLM version support to 0.15.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5251
- Sync entire prompt/completion token tensors before indexing by @shawnghu in https://github.com/huggingface/trl/pull/5218
- Update vLLM version support to 0.16.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5252
- Update vLLM version support to 0.17.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5253
- [GRPO/RLOO] Tokenize before vLLM generation call by @qgallouedec in https://github.com/huggingface/trl/pull/5238
- Refactor CLI [14/N] : Remove TrainingArguments import from core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5257
- Support JSON string parsing of teacher_model_init_kwargs in MiniLLMConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5259
- Fix typo in docstring for teacher_model_init_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/5260
- Remove extra_fields dead code [1/N]: Remove extra_fields handling from VLLMGeneration.generate by @albertvillanova in https://github.com/huggingface/trl/pull/5262
- [GRPO/RLOO] Unify tokenization across all generation backends in
_generate_single_turnby @qgallouedec in https://github.com/huggingface/trl/pull/5239 - Remove extra_fields dead code [2/N]: Remove extra_fields from VLLMGeneration.generate return value by @albertvillanova in https://github.com/huggingface/trl/pull/5263
- Remove extra_fields dead code [3/N]: Remove extra_fields from GRPOTrainer._generate_single_turn return value by @albertvillanova in https://github.com/huggingface/trl/pull/5264
- fix(ppo): add gradient_checkpointing_enable/disable to PolicyAndValueWrapper by @s-zx in https://github.com/huggingface/trl/pull/5245
- [GRPO/RLOO] Extract tokenize prompts from
_generate_single_turnby @qgallouedec in https://github.com/huggingface/trl/pull/5240 - [CPO/ORPO] Fix handling of different length chosen/rejected prompts. by @davmels in https://github.com/huggingface/trl/pull/4639
- Fix type for teacher_model_init_kwargs when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5258
- Align GOLDConfig docstrings for optional params with None default by @albertvillanova in https://github.com/huggingface/trl/pull/5261
- Fix support for model_init_kwargs in GKD/GOLD when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5266
- Update TRL banner to support light/dark mode by @qgallouedec in https://github.com/huggingface/trl/pull/5270
- Fix error message in OnlineDPO by @qgallouedec in https://github.com/huggingface/trl/pull/5237
- Fix title consistency from "Transformer Reinforcement Learning" to "Transformers Reinforcement Learning" by @qgallouedec in https://github.com/huggingface/trl/pull/5183
- Nemotron 3 examples added by @sergiopaniego in https://github.com/huggingface/trl/pull/5272
- Fix mm_token_type_ids silently dropped in DPO VLM training by @albertvillanova in https://github.com/huggingface/trl/pull/5279
- Simplify get_train_dataloader in GRPO and RLOO by @albertvillanova in https://github.com/huggingface/trl/pull/5276
- Raise ValueError for None train_dataset in experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5275
- 35% faster packing + rename
bfd-requeuetobfd_splitby @mariosasko in https://github.com/huggingface/trl/pull/5189 - Change default
vllm_modeto"colocate"and add v0→v1 migration guide by @qgallouedec in https://github.com/huggingface/trl/pull/5255 - Allow nullable logprobs in vLLM serve responses by @LeonEricsson in https://github.com/huggingface/trl/pull/5203
- feat(
grpo_trainer.py): Variational Sequence-Level Soft Policy Optimization (VESPO) by @casinca in https://github.com/huggingface/trl/pull/5199 - Simplify structured outputs logic across vLLM versions in scripts/vllm_serve by @albertvillanova in https://github.com/huggingface/trl/pull/5273
- Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5274
- Fix
accuracy_rewardcrash when called from non-main thread by @qgallouedec in https://github.com/huggingface/trl/pull/5281 - Remove TrainingArguments import from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5290
- Remove custom get_train/eval_dataloader from OnlineDPO by @albertvillanova in https://github.com/huggingface/trl/pull/5291
- [GKD] Buffer Implementation for Distillation Trainer by @cmpatino in https://github.com/huggingface/trl/pull/5137
- Support max_length in DPO VLM training by @albertvillanova in https://github.com/huggingface/trl/pull/5284
- Prevent corruption of DPO VLM training if "keep_end" truncation_mode by @albertvillanova in https://github.com/huggingface/trl/pull/5286
- Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model by @albertvillanova in https://github.com/huggingface/trl/pull/5295
- Apply docstyle by @qgallouedec in https://github.com/huggingface/trl/pull/5296
- Add guidance to avoid
hasattrandgetattrwith defaults inAGENTS.mdby @qgallouedec in https://github.com/huggingface/trl/pull/5294 - Fix DPOTrainer collators to truncate sequences before padding by @albertvillanova in https://github.com/huggingface/trl/pull/5305
- Update
RewardFunctype annotation to allowNonevalues in reward list by @qgallouedec in https://github.com/huggingface/trl/pull/5297 - Suggest the
Json()type for tool calling dataset format by @lhoestq in https://github.com/huggingface/trl/pull/5307 - Allow reward functions to log extra columns and scalar metrics by @manueldeprada in https://github.com/huggingface/trl/pull/5233
- Fix GRPOTrainer attribute access for vLLM model config by @falcondai in https://github.com/huggingface/trl/pull/5302
- Support truncation_mode in SFT by @albertvillanova in https://github.com/huggingface/trl/pull/5306
- 🔌 Asynchronous GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/5293
- Fix datasets version supporting Json dtype in docs about tool calling dataset format by @albertvillanova in https://github.com/huggingface/trl/pull/5310
- Align docs about tool calling in trainers with dataset format by @albertvillanova in https://github.com/huggingface/trl/pull/5311
- [GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs by @qgallouedec in https://github.com/huggingface/trl/pull/5242
- feat(experimental): Divergence Proximal Policy Optimization by @LeonEricsson in https://github.com/huggingface/trl/pull/5117
- Clean up model update group on worker exit by @AmineDiro in https://github.com/huggingface/trl/pull/5325
- Fix style in DPPO docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/5326
GRPOTrainer/async: fix prefix EOS slicing for tool suffix (with Qwen3/3.5 type of chat templates) by @casinca in https://github.com/huggingface/trl/pull/5330- refactor(async_rollout_worker): renamed tool variables to mirror
grpo_trainer.pyby @casinca in https://github.com/huggingface/trl/pull/5332 - Add truncation to SFT DataCollatorForLanguageModeling by @albertvillanova in https://github.com/huggingface/trl/pull/5315
- Add SDPO (Self-Distillation Policy Optimization) trainer by @MengAiDev in https://github.com/huggingface/trl/pull/4935
- Update openenv examples to use
environment_factoryby @sergiopaniego in https://github.com/huggingface/trl/pull/5235 - Enhance
print_prompt_completions_sampleto include reasoning content by @qgallouedec in https://github.com/huggingface/trl/pull/5327 - Add Cursor Bugbot rules from
AGENTS.mdby @qgallouedec in https://github.com/huggingface/trl/pull/5280 - Change model dtype from bfloat16 to float32 in AsyncGRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5333
- docs: Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrainer by @DhruvvArora in https://github.com/huggingface/trl/pull/5347
- fix: apply reward_weights to logged reward/reward_std in GRPOTrainer by @lailanelkoussy in https://github.com/huggingface/trl/pull/5353
- Remove post-collation truncation from DPO by @albertvillanova in https://github.com/huggingface/trl/pull/5350
- Remove unused flush_right by @albertvillanova in https://github.com/huggingface/trl/pull/5358
- Fix IDs shape mismatch in SFT for VLMs with text-only by @albertvillanova in https://github.com/huggingface/trl/pull/5354
- Remove post-collation truncation from SFT by @albertvillanova in https://github.com/huggingface/trl/pull/5359
- Simplify DPO DataCollatorForPreference by @albertvillanova in https://github.com/huggingface/trl/pull/5362
- Simplify SFT tokenization by @albertvillanova in https://github.com/huggingface/trl/pull/5363
- Simplify SFT DataCollatorForLanguageModeling by @albertvillanova in https://github.com/huggingface/trl/pull/5360
- Use BaseConfig post_init in experimental KTO and MiniLLM configs by @albertvillanova in https://github.com/huggingface/trl/pull/5371
- Move truncate_dataset to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5370
- Simplify DPO tokenization by @albertvillanova in https://github.com/huggingface/trl/pull/5369
- Kd vllm generation by @cmpatino in https://github.com/huggingface/trl/pull/5351
- Adds support for the
pixel_position_idsvision key by @qgallouedec in https://github.com/huggingface/trl/pull/5374 - Minor diff reduction between RLOO and GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/5368
- Remove requirements.txt by @albertvillanova in https://github.com/huggingface/trl/pull/5377
- Remove dead truncation_mode from experimental BCO, CPO and ORPO by @albertvillanova in https://github.com/huggingface/trl/pull/5378
- Centralize AI agent templates in
.aiby @qgallouedec in https://github.com/huggingface/trl/pull/5268 - Pass tools as None to
apply_chat_templatewhen it is an empty list by @rabinadk1 in https://github.com/huggingface/trl/pull/5380 - Require datasets>=4.7.0 for Json dtype to prevent insertion of None values by @albertvillanova in https://github.com/huggingface/trl/pull/5376
- Remove deprecated
TRACKIO_SPACE_IDenv var from all scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/5365 - Mark test_rloo[fsdp2] as xfail for transformers 5.4.0 by @albertvillanova in https://github.com/huggingface/trl/pull/5387
- Enforce PR template for first-time contributors and document AI usage policy by @qgallouedec in https://github.com/huggingface/trl/pull/5356
- Enhance PR template check to exclude reopened PRs from first-time contributor validation by @qgallouedec in https://github.com/huggingface/trl/pull/5392
- chore: update
pr_template_check.ymlby @qgallouedec in https://github.com/huggingface/trl/pull/5393 - Move
disable_config=TruefromgeneratetoGenerationConfigby @qgallouedec in https://github.com/huggingface/trl/pull/5384 - Add vLLM inference to the Base Self-Distillation Trainer by @cmpatino in https://github.com/huggingface/trl/pull/5388
- Add HF_TOKEN environment variable to workflow files by @qgallouedec in https://github.com/huggingface/trl/pull/5397
- Add second version of Qwen 3.5 chat template to chat_template_utils by @apardyl in https://github.com/huggingface/trl/pull/5405
- Release: v1.0 by @qgallouedec in https://github.com/huggingface/trl/pull/5409
New Contributors
- @czkkkkkk made their first contribution in https://github.com/huggingface/trl/pull/5180
- @michaelroyzen made their first contribution in https://github.com/huggingface/trl/pull/5190
- @thesteve0 made their first contribution in https://github.com/huggingface/trl/pull/5229
- @s-zx made their first contribution in https://github.com/huggingface/trl/pull/5246
- @shawnghu made their first contribution in https://github.com/huggingface/trl/pull/5218
- @davmels made their first contribution in https://github.com/huggingface/trl/pull/4639
- @manueldeprada made their first contribution in https://github.com/huggingface/trl/pull/5233
- @falcondai made their first contribution in https://github.com/huggingface/trl/pull/5302
- @AmineDiro made their first contribution in https://github.com/huggingface/trl/pull/5325
- @DhruvvArora made their first contribution in https://github.com/huggingface/trl/pull/5347
- @lailanelkoussy made their first contribution in https://github.com/huggingface/trl/pull/5353
- @rabinadk1 made their first contribution in https://github.com/huggingface/trl/pull/5380
- @apardyl made their first contribution in https://github.com/huggingface/trl/pull/5405
Full Changelog: https://github.com/huggingface/trl/compare/v0.29.0...v1.0.0
Fetched April 7, 2026
