v0.27.0

Features

Add vllm_group_port argument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545
Preserve truncated tokens in BFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/4632
Support async reward functions and parallelize call to reward functions. by @pramodith in https://github.com/huggingface/trl/pull/4567
RLOO supports async rewards. by @pramodith in https://github.com/huggingface/trl/pull/4718
Support vLLM 0.12.0 by @jiqing-feng in https://github.com/huggingface/trl/pull/4117
feat: DeepSeek V3.2 Off-policy sequence masking by @casinca in https://github.com/huggingface/trl/pull/4689
🎭 Up to 50% less VRAM during forward with forward_masked_logits function by @qgallouedec in https://github.com/huggingface/trl/pull/4729
[GRPO] Add a config to limit the number of tool calling iterations by @pramodith in https://github.com/huggingface/trl/pull/4761
Switch gradient checkpointing default to use_reentrant=False (PyTorch recommended) by @qgallouedec in https://github.com/huggingface/trl/pull/4811
Add support for GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization by @nbasyl in https://github.com/huggingface/trl/pull/4785

Experimental

Move AutoModelForCausalLMWithValueHead and AutoModelForSeq2SeqLMWithValueHead to experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4654
Move DPODataCollatorWithPadding to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4667
Move DataCollatorForChatML to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4668
Move add_bos_token_if_needed and add_eos_token_if_needed to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4674
Move truncate_right and SIMPLE_CHAT_TEMPLATE to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4677
Move prepare_model_for_kbit_training, enable_gradient_checkpointing, prepare_peft_model to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4704
Move get_reward function to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4683
Remove experimental imports from testing_utils by @albertvillanova in https://github.com/huggingface/trl/pull/4727
ORPO: Avoid catastrophic cancellation in loss function by @hartmans in https://github.com/huggingface/trl/pull/4763
Refactor KTO [1/N]: Modernize model initialization by @albertvillanova in https://github.com/huggingface/trl/pull/4783
[GOLD] add probability merging fix to implement chain rule by @kashif in https://github.com/huggingface/trl/pull/4765
Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support by @albertvillanova in https://github.com/huggingface/trl/pull/4792
Refactor KTO coordinated with DPO [b/N]: Simplify truncation logic by @albertvillanova in https://github.com/huggingface/trl/pull/4808

Fixes

Accounting for case num_generations_eval=1 in the calculation of the advantage by @qgallouedec in https://github.com/huggingface/trl/pull/4662
Fix vLLM error for tools usage not supported when running GRPO training by @apalmas-saifh in https://github.com/huggingface/trl/pull/4663
Fix GRPO config validation in case num_generations_eval is specified and different than num_generations by @apalmas-saifh in https://github.com/huggingface/trl/pull/4682
Fix top_k default value to 0 for disabling top-k filtering by @albertvillanova in https://github.com/huggingface/trl/pull/4695
Include generation_config for tiny model uploads by @qgallouedec in https://github.com/huggingface/trl/pull/4643
Fix KeyError with transformers 5.0.0+ where push_to_hub_token is removed by @Manodeepray in https://github.com/huggingface/trl/pull/4691
Overwrite model default generation config used by model.generate by @albertvillanova in https://github.com/huggingface/trl/pull/4647
Fix: handle multiple tool calls in qwen3_schema by @mattbui in https://github.com/huggingface/trl/pull/4709
Fix bugs when using multi-gpu: dataset streaming for offline trainers + dtype initialization by @kaixuanliu in https://github.com/huggingface/trl/pull/3950
Ensure llm-blender is importable with transformers >= v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4781
Monkey patch for HybridCache in Liger-Kernel with transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4798
[fix] GRPOTrainer: proper access args by @carlyou in https://github.com/huggingface/trl/pull/4801
Fix vllm compat patches to be applied only to affected versions by @albertvillanova in https://github.com/huggingface/trl/pull/4815
fix bug when sft calc outputs.token_accuracy by @kaixuanliu in https://github.com/huggingface/trl/pull/4814
fix xpu vllm client server by @jiqing-feng in https://github.com/huggingface/trl/pull/4780

Miscellaneous

Move compute_accuracy to PRM Trainer file by @qgallouedec in https://github.com/huggingface/trl/pull/4656
Move clone_chat_template to chat_template_utils by @qgallouedec in https://github.com/huggingface/trl/pull/4653
Move GeometricMixtureWrapper to nash_md_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4670
Move exact_div, print_rich_table, truncate_response, forward to ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4676
Merge OnPolicyConfig and PPOConfig and move OnlineTrainerState by @qgallouedec in https://github.com/huggingface/trl/pull/4671
Move PEFT tests for AutoModelForCausalLMWithValueHead to test_ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4678
Move generate and batch_generation to ppo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4675
Import TrainerCallback from top-level transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4694
Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4690
Align import utils with transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4684
Align stable trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4687
Align GRPO and RLOO initialization by @qgallouedec in https://github.com/huggingface/trl/pull/4685
Align use of vllm_max_model_length in RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4702
Align RLOO with GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/4706
Fix test assertion for top_k parameter in OnlineDPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4714
Disallow PeftModel + peft_config in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4713
Fix deprecation version for RLOO max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/4726
Refactor vLLM generation [3/N]: Decouple profiling from trainer by @albertvillanova in https://github.com/huggingface/trl/pull/4717
Avoid docstyle formatting for TestParseResponse by @qgallouedec in https://github.com/huggingface/trl/pull/4736
🥂 Happy New Year by @qgallouedec in https://github.com/huggingface/trl/pull/4775
Update import structure by @qgallouedec in https://github.com/huggingface/trl/pull/4665
Improve PEFT integration by @qgallouedec in https://github.com/huggingface/trl/pull/4723
Replace GuidedDecodingParams with StructuredOutputsParams in sampling parameter configuration by @qgallouedec in https://github.com/huggingface/trl/pull/4797
Move compatibility shims to dedicated module _compat by @albertvillanova in https://github.com/huggingface/trl/pull/4807
Refactor _compat module by @albertvillanova in https://github.com/huggingface/trl/pull/4809
Revised comments explaining the higher learning rate choice given tiny gradients by @qgallouedec in https://github.com/huggingface/trl/pull/4810
Simplify version checks in compat patches by @albertvillanova in https://github.com/huggingface/trl/pull/4817
Set packaging as explicit dependency and standardize version comparison by @albertvillanova in https://github.com/huggingface/trl/pull/4819
Fix _patch_transformers_hybrid_cache also for peft by @albertvillanova in https://github.com/huggingface/trl/pull/4820
Fix _patch_vllm_cached_tokenizer to only apply if transformers >= v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4827
Fix code quality in SFTTrainer file by @albertvillanova in https://github.com/huggingface/trl/pull/4832

New Contributors

@pointerhacker made their first contribution in https://github.com/huggingface/trl/pull/4545
@apalmas-saifh made their first contribution in https://github.com/huggingface/trl/pull/4663
@Manodeepray made their first contribution in https://github.com/huggingface/trl/pull/4691
@salmanmkc made their first contribution in https://github.com/huggingface/trl/pull/4734
@mattbui made their first contribution in https://github.com/huggingface/trl/pull/4709
@murilo-cunha made their first contribution in https://github.com/huggingface/trl/pull/4753
@hartmans made their first contribution in https://github.com/huggingface/trl/pull/4763
@s23deepak made their first contribution in https://github.com/huggingface/trl/pull/4758
@Tianyi-Billy-Ma made their first contribution in https://github.com/huggingface/trl/pull/4804
@carlyou made their first contribution in https://github.com/huggingface/trl/pull/4801
@BurnyCoder made their first contribution in https://github.com/huggingface/trl/pull/4803

Full Changelog: https://github.com/huggingface/trl/compare/v0.26.0...v0.27.0

Features

Experimental

Fixes

Documentation and Examples

Deprecations

CI Improvements

Miscellaneous

New Contributors

More from Hugging Face

More from Hugging Face