v0.28.0
is_conversational by @qgallouedec in https://github.com/huggingface/trl/pull/4923openenv/utils.py: fallback for no vLLM installed case by @Datta0 in https://github.com/huggingface/trl/pull/4868current_gradient_accumulation_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4852get_open_port based on vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4883device_map init consistency in GRPO/RLOO/KTO by @qgallouedec in https://github.com/huggingface/trl/pull/4909warnings_issued by @qgallouedec in https://github.com/huggingface/trl/pull/4960DPOConfig by @qgallouedec in https://github.com/huggingface/trl/pull/4969warmup_ratio with warmup_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4983RewardTrainer, RLOOTrainer and GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4823TestGRPOTrainer.test_training_vlm_and_liger and update version checks by @qgallouedec in https://github.com/huggingface/trl/pull/4898compute_metrics in SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4950RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4959compute_metrics in RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4958CITATION.cff by @qgallouedec in https://github.com/huggingface/trl/pull/4856DataCollatorForVisionLanguageModeling by @qgallouedec in https://github.com/huggingface/trl/pull/4911max_length in RewardConfig and SFTConfig by @qgallouedec in https://github.com/huggingface/trl/pull/4910sync_ref_model in GRPOTrainer and RLOOTrainer when using PEFT models by @qgallouedec in https://github.com/huggingface/trl/pull/4912⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4835
Support triggering CI via push to ci-* branches by @albertvillanova in https://github.com/huggingface/trl/pull/4840
Revert CI hotfix pinning transformers 4.57.4 after tiny model regeneration by @albertvillanova in https://github.com/huggingface/trl/pull/4833
Use pytest-datadir in CI tests by @albertvillanova in https://github.com/huggingface/trl/pull/4836
Refactor KTO coordinated with DPO [c/N]: Remove ref_model_init_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4837
Fix _patch_transformers_hybrid_cache for peft by @albertvillanova in https://github.com/huggingface/trl/pull/4844
Refactor KTO [4/N]: Remove unused padding_value by @albertvillanova in https://github.com/huggingface/trl/pull/4839
Remove unused padding_value from BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4846
Fix CI with dev dependencies: Mark Qwen3-VL tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4851
Fix: undefined current_gradient_accumulation_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4852
Remove deprecated parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4847
Add Nash Learning from Human Feedback paper to paper index by @kansalaman in https://github.com/huggingface/trl/pull/4860
Use pytest-datadir for accelerate config files by @albertvillanova in https://github.com/huggingface/trl/pull/4861
Update OpenEnv dependency to new version for hf jobs scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4843
Update CITATION.cff by @qgallouedec in https://github.com/huggingface/trl/pull/4856
[GRPOTrainer]: Agent Training Supports Async Tool Calls by @pramodith in https://github.com/huggingface/trl/pull/4742
Enhance GRPO documentation with scaling notes by @javadtaghia in https://github.com/huggingface/trl/pull/4849
Add retry strategy to vLLM Client for increased robustness by @apalmas-saifh in https://github.com/huggingface/trl/pull/4845
Update generate_tiny_models.py: CohereForAI -> CohereLabs by @Michellehbn in https://github.com/huggingface/trl/pull/4877
Refactor KTO coordinated with DPO [e/N]: Remove label_pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4875
Refactor KTO coordinated with DPO [d/N]: Remove base_model_attribute_name by @albertvillanova in https://github.com/huggingface/trl/pull/4862
Fix type hint in openenv/utils.py: fallback for no vLLM installed case by @Datta0 in https://github.com/huggingface/trl/pull/4868
Update transformer version checks and documentation for lr_scheduler_kwargs workaround by @qgallouedec in https://github.com/huggingface/trl/pull/4876
fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
Remove label_pad_token_id from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4878
Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
Enable vLLM sleep mode for generation in Online DPO by @winglian in https://github.com/huggingface/trl/pull/4882
Test distributed training for RewardTrainer, RLOOTrainer and GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4823
Mark ZeRO 2 as xfail in distributed tests due to current failure by @qgallouedec in https://github.com/huggingface/trl/pull/4885
Fix import path for get_open_port based on vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4883
Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887
GOLD training speed up by @141forever in https://github.com/huggingface/trl/pull/4888
Transformers v5 release: extend xfail condition for TestGRPOTrainer.test_training_vlm_and_liger and update version checks by @qgallouedec in https://github.com/huggingface/trl/pull/4898
Fix CI NotImplementedError for bfloat16 by @albertvillanova in https://github.com/huggingface/trl/pull/4902
Fix CI AssertionError: Parameter has not changed by @albertvillanova in https://github.com/huggingface/trl/pull/4904
Refactor vLLM generation [1/N]: Extract vLLM generation by @albertvillanova in https://github.com/huggingface/trl/pull/4700
Created new PTT integration docs as requested by @adityachallapally in https://github.com/huggingface/trl/pull/4907
Fix CI TypeError in llm-blender tests by @albertvillanova in https://github.com/huggingface/trl/pull/4919
Rearrange variable assignments in DataCollatorForVisionLanguageModeling by @qgallouedec in https://github.com/huggingface/trl/pull/4911
Fix help text formatting for max_length in RewardConfig and SFTConfig by @qgallouedec in https://github.com/huggingface/trl/pull/4910
device_map init consistency in GRPO/RLOO/KTO by @qgallouedec in https://github.com/huggingface/trl/pull/4909
Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4913
Remove gradient checkpointing option from various training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4905
docs: add DoRA (2402.09353) to Paper Index by @billycrapediem in https://github.com/huggingface/trl/pull/4892
Fix CI AssertionError: assert not True by @albertvillanova in https://github.com/huggingface/trl/pull/4921
Fix CI ValueError for 0 temperature by @albertvillanova in https://github.com/huggingface/trl/pull/4916
Fix extra EOS appended in DPO preprocessing for conversational data by @qgallouedec in https://github.com/huggingface/trl/pull/4908
Remove chat template setup in dpo_vlm.py by @qgallouedec in https://github.com/huggingface/trl/pull/4906
Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests by @qgallouedec in https://github.com/huggingface/trl/pull/4914
Add validation for sync_ref_model in GRPOTrainer and RLOOTrainer when using PEFT models by @qgallouedec in https://github.com/huggingface/trl/pull/4912
Support tool call data in is_conversational by @qgallouedec in https://github.com/huggingface/trl/pull/4923
Set model dtype to float32 in tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4924
Require transformers<5 with PairRMJudge by @albertvillanova in https://github.com/huggingface/trl/pull/4926
Move VLLMClient to generation module by @albertvillanova in https://github.com/huggingface/trl/pull/4928
Set model dtype to float32 in experimental tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4925
Fix profiling of VLLMGeneration.sync_weights by @albertvillanova in https://github.com/huggingface/trl/pull/4931
Fix import statement for import_utils in vllm_client.py by @qgallouedec in https://github.com/huggingface/trl/pull/4932
Set default top_k to 0 in VLLMClient by @albertvillanova in https://github.com/huggingface/trl/pull/4927
[GRPO] Add parquet logging for completions with individual rewards by @qgallouedec in https://github.com/huggingface/trl/pull/4818
Fix SFTTrainer init logic: remove TrainingArguments.push_to_hub_token only for transformers < v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4942
Remove ref_model_init_kwargs from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4946
Update wordle.py example with masking of env tokens by @sergiopaniego in https://github.com/huggingface/trl/pull/4895
Fix PPO run_name parameter not taking effect by @mel3c in https://github.com/huggingface/trl/pull/4945
Minor fix docs style by @albertvillanova in https://github.com/huggingface/trl/pull/4953
Add test for training with compute_metrics in SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4950
Remove access to warnings_issued by @qgallouedec in https://github.com/huggingface/trl/pull/4960
NeMo-Gym Integration by @cmunley1 in https://github.com/huggingface/trl/pull/4848
Add test for tool call data in RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4959
Add test for training with compute_metrics in RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4958
Remove max_prompt_length from experimental PRM by @albertvillanova in https://github.com/huggingface/trl/pull/4963
Remove max_prompt_length from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4964
Remove max_prompt_length from experimental CPO by @albertvillanova in https://github.com/huggingface/trl/pull/4965
Remove max_prompt_length from experimental ORPO by @albertvillanova in https://github.com/huggingface/trl/pull/4966
Revert change in GRPO from NeMo-Gym Integration by @qgallouedec in https://github.com/huggingface/trl/pull/4970
Fix test_train_with_chat_template_kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/4971
Remove padding_value from experimental CPO and use pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4962
Remove truncation from tokenizer calls if no max_length by @albertvillanova in https://github.com/huggingface/trl/pull/4972
Set specific OpenEnv version when installed by @sergiopaniego in https://github.com/huggingface/trl/pull/4978
Fix add_column in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4979
Support truncated completions in GRPO multi-turn training by @albertvillanova in https://github.com/huggingface/trl/pull/4976
Replace torch.allclose with torch.testing.assert_close by @qgallouedec in https://github.com/huggingface/trl/pull/4977
Simplify instructions of installation of OpenEnv by @sergiopaniego in https://github.com/huggingface/trl/pull/4980
Deprecate parameters in DPOConfig by @qgallouedec in https://github.com/huggingface/trl/pull/4969
[CI] Disallow installation of transformers 5.1.0 due to compatibility issues with DeepSpeed by @qgallouedec in https://github.com/huggingface/trl/pull/4982
Replace warmup_ratio with warmup_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4983
Pin transformers!=5.1.0 in deepspeed extra due to incompatibility by @albertvillanova in https://github.com/huggingface/trl/pull/4985
Fix passing tokenizer in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4987
Update dataset configuration name in toolcall dataset loading by @qgallouedec in https://github.com/huggingface/trl/pull/4984
Use local variable instead of attribute in collator tests by @qgallouedec in https://github.com/huggingface/trl/pull/4957
Fix import of AutoModelForCausalLMWithValueHead from experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4990
Assert chat_template is applied in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4991
Fix deprecation of DPOConfig.max_completion_length by @albertvillanova in https://github.com/huggingface/trl/pull/4992
Fix post_init warning stacklevel to 3 by @albertvillanova in https://github.com/huggingface/trl/pull/4993
Fix ZeRO-3 + PEFT + gradient checkpointing by @qgallouedec in https://github.com/huggingface/trl/pull/4951
Add GitHub Actions workflow for testing against Transformers branch by @qgallouedec in https://github.com/huggingface/trl/pull/4995
Add distributed smoke tests workflow for Transformers branch by @qgallouedec in https://github.com/huggingface/trl/pull/4996
Update NeMo-Gym to use env_mask by @cmunley1 in https://github.com/huggingface/trl/pull/4986
Update sampling mode to token level for safety by @sergiopaniego in https://github.com/huggingface/trl/pull/4989
perf: Qwen SAPO loss optimization by @casinca in https://github.com/huggingface/trl/pull/4956
Fix GRPO tool calling for corrupted tool calls by @akshayballal95 in https://github.com/huggingface/trl/pull/4890
Add sanitize_logprob function for NaN handling in vLLM log probabilities by @qgallouedec in https://github.com/huggingface/trl/pull/5001
[tests] Remove xfail for transformers version >= 5.0.0 due to upstream bug resolution by @qgallouedec in https://github.com/huggingface/trl/pull/5000
docs: add CGPO/Mixture of Judges (2409.20370) to Paper Index + link ref to AllTrueJudge by @nabin2004 in https://github.com/huggingface/trl/pull/5002
Filter CI SWIG deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5004
Fix CI TRLExperimentalWarning in regular tests by @albertvillanova in https://github.com/huggingface/trl/pull/5007
Add support for nested_gather in OnlineDPOTrainer for transformers v5.2.0 and above by @qgallouedec in https://github.com/huggingface/trl/pull/4981
Fix CI FutureWarning: ref_model_init_kwargs is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5009
Fix typo in DPO max_prompt_length deprecation warning message by @albertvillanova in https://github.com/huggingface/trl/pull/5020
Fix vision model prompt truncation bug in DPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/5023
Pin transformers < 5 in judges extra due to incompatibility by @albertvillanova in https://github.com/huggingface/trl/pull/5024
Fix CI FutureWarning: generate_during_eval is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5017
Fix typo in xfail test reason by @albertvillanova in https://github.com/huggingface/trl/pull/5028
Fix CI FutureWarning: rpo_alpha is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5011
Fix CI FutureWarning: use_logits_to_keep is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5013
Mark Qwen3VL tests as xfail for transformers 5.0.x by @albertvillanova in https://github.com/huggingface/trl/pull/5029
[CI] Silence PyTorch JIT and DataLoader deprecation warnings by @qgallouedec in https://github.com/huggingface/trl/pull/4999
Add length-unbiased GRPO loss (LUSPO) by @Haseebasif7 in https://github.com/huggingface/trl/pull/4988
Fix CI FutureWarning: tools is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5015
Filter max_prompt_length UserWarning in all test cases by @albertvillanova in https://github.com/huggingface/trl/pull/5035
Fix CI FutureWarning: max_prompt_length is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5019
Allow testing with transformers 5.1.0 via xfail marks by @albertvillanova in https://github.com/huggingface/trl/pull/5034
Rename AOT loss type 'aot_pair' to 'aot_unpaired' in DPO by @qgallouedec in https://github.com/huggingface/trl/pull/5038
Deprecate string usage for ref_model in DPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5040
Deprecate FDivergenceType in DPOConfig; update f_divergence_type to use string values by @qgallouedec in https://github.com/huggingface/trl/pull/5039
Fix multiprocessing start method to 'spawn' for test compatibility with Python 3.12+ by @qgallouedec in https://github.com/huggingface/trl/pull/5036
Add Online Direct Preference Optimization section to paper index by @qgallouedec in https://github.com/huggingface/trl/pull/5037
Release: 0.28 by @albertvillanova in https://github.com/huggingface/trl/pull/5043
Full Changelog: https://github.com/huggingface/trl/compare/v0.27.0...v0.28.0
Fetched April 7, 2026