v0.28.0
Features
- [GRPOTrainer]: Agent Training Supports Async Tool Calls by @pramodith in https://github.com/huggingface/trl/pull/4742
- Add retry strategy to vLLM Client for increased robustness by @apalmas-saifh in https://github.com/huggingface/trl/pull/4845
- Enable vLLM sleep mode for generation in Online DPO by @winglian in https://github.com/huggingface/trl/pull/4882
- Support tool call data in
is_conversationalby @qgallouedec in https://github.com/huggingface/trl/pull/4923 - [GRPO] Add parquet logging for completions with individual rewards by @qgallouedec in https://github.com/huggingface/trl/pull/4818
- Update wordle.py example with masking of env tokens by @sergiopaniego in https://github.com/huggingface/trl/pull/4895
- NeMo-Gym Integration by @cmunley1 in https://github.com/huggingface/trl/pull/4848
Experimental
- Refactor KTO coordinated with DPO [c/N]: Remove ref_model_init_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4837
- Refactor KTO coordinated with DPO [e/N]: Remove label_pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4875
- Refactor KTO coordinated with DPO [d/N]: Remove base_model_attribute_name by @albertvillanova in https://github.com/huggingface/trl/pull/4862
- Fix type hint in
openenv/utils.py: fallback for no vLLM installed case by @Datta0 in https://github.com/huggingface/trl/pull/4868 - Remove label_pad_token_id from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4878
- GOLD training speed up by @141forever in https://github.com/huggingface/trl/pull/4888
- Remove ref_model_init_kwargs from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4946
- Remove max_prompt_length from experimental PRM by @albertvillanova in https://github.com/huggingface/trl/pull/4963
- Remove max_prompt_length from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4964
- Remove max_prompt_length from experimental CPO by @albertvillanova in https://github.com/huggingface/trl/pull/4965
- Remove max_prompt_length from experimental ORPO by @albertvillanova in https://github.com/huggingface/trl/pull/4966
- Remove padding_value from experimental CPO and use pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4962
Fixes
- Fix _patch_transformers_hybrid_cache for peft by @albertvillanova in https://github.com/huggingface/trl/pull/4844
- Refactor KTO [4/N]: Remove unused padding_value by @albertvillanova in https://github.com/huggingface/trl/pull/4839
- Fix: undefined
current_gradient_accumulation_stepsby @qgallouedec in https://github.com/huggingface/trl/pull/4852 - fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
- Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
- Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
- Fix import path for
get_open_portbased on vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4883 - Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887
device_mapinit consistency in GRPO/RLOO/KTO by @qgallouedec in https://github.com/huggingface/trl/pull/4909- Fix extra EOS appended in DPO preprocessing for conversational data by @qgallouedec in https://github.com/huggingface/trl/pull/4908
- Fix SFTTrainer init logic: remove TrainingArguments.push_to_hub_token only for transformers < v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4942
- Fix PPO run_name parameter not taking effect by @mel3c in https://github.com/huggingface/trl/pull/4945
- Remove access to
warnings_issuedby @qgallouedec in https://github.com/huggingface/trl/pull/4960 - Revert change in GRPO from NeMo-Gym Integration by @qgallouedec in https://github.com/huggingface/trl/pull/4970
Documentation and Examples
- Add Nash Learning from Human Feedback paper to paper index by @kansalaman in https://github.com/huggingface/trl/pull/4860
- Update OpenEnv dependency to new version for hf jobs scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4843
- Enhance GRPO documentation with scaling notes by @javadtaghia in https://github.com/huggingface/trl/pull/4849
- Created new PTT integration docs as requested by @adityachallapally in https://github.com/huggingface/trl/pull/4907
- docs: add DoRA (2402.09353) to Paper Index by @billycrapediem in https://github.com/huggingface/trl/pull/4892
Deprecations
- Remove unused padding_value from BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4846
- Remove deprecated parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4847
- Deprecate parameters in
DPOConfigby @qgallouedec in https://github.com/huggingface/trl/pull/4969 - Replace
warmup_ratiowithwarmup_stepsby @qgallouedec in https://github.com/huggingface/trl/pull/4983
CI Improvements
- Support triggering CI via push to ci-* branches by @albertvillanova in https://github.com/huggingface/trl/pull/4840
- Revert CI hotfix pinning transformers 4.57.4 after tiny model regeneration by @albertvillanova in https://github.com/huggingface/trl/pull/4833
- Use pytest-datadir in CI tests by @albertvillanova in https://github.com/huggingface/trl/pull/4836
- Fix CI with dev dependencies: Mark Qwen3-VL tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4851
- Use pytest-datadir for accelerate config files by @albertvillanova in https://github.com/huggingface/trl/pull/4861
- Update transformer version checks and documentation for lr_scheduler_kwargs workaround by @qgallouedec in https://github.com/huggingface/trl/pull/4876
- Test distributed training for
RewardTrainer,RLOOTrainerandGRPOTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4823 - Mark ZeRO 2 as xfail in distributed tests due to current failure by @qgallouedec in https://github.com/huggingface/trl/pull/4885
- Transformers v5 release: extend xfail condition for
TestGRPOTrainer.test_training_vlm_and_ligerand update version checks by @qgallouedec in https://github.com/huggingface/trl/pull/4898 - Fix CI NotImplementedError for bfloat16 by @albertvillanova in https://github.com/huggingface/trl/pull/4902
- Fix CI AssertionError: Parameter has not changed by @albertvillanova in https://github.com/huggingface/trl/pull/4904
- Fix CI TypeError in llm-blender tests by @albertvillanova in https://github.com/huggingface/trl/pull/4919
- Fix CI AssertionError: assert not True by @albertvillanova in https://github.com/huggingface/trl/pull/4921
- Fix CI ValueError for 0 temperature by @albertvillanova in https://github.com/huggingface/trl/pull/4916
- Set model dtype to float32 in tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4924
- Set model dtype to float32 in experimental tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4925
- Add test for training with
compute_metricsinSFTTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4950 - Add test for tool call data in
RewardTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4959 - Add test for training with
compute_metricsinRewardTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4958 - Fix test_train_with_chat_template_kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/4971
Miscellaneous
- Update
CITATION.cffby @qgallouedec in https://github.com/huggingface/trl/pull/4856 - Update generate_tiny_models.py: CohereForAI -> CohereLabs by @Michellehbn in https://github.com/huggingface/trl/pull/4877
- Refactor vLLM generation [1/N]: Extract vLLM generation by @albertvillanova in https://github.com/huggingface/trl/pull/4700
- Rearrange variable assignments in
DataCollatorForVisionLanguageModelingby @qgallouedec in https://github.com/huggingface/trl/pull/4911 - Fix help text formatting for
max_lengthinRewardConfigandSFTConfigby @qgallouedec in https://github.com/huggingface/trl/pull/4910 - Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4913
- Remove gradient checkpointing option from various training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4905
- Remove chat template setup in dpo_vlm.py by @qgallouedec in https://github.com/huggingface/trl/pull/4906
- Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests by @qgallouedec in https://github.com/huggingface/trl/pull/4914
- Add validation for
sync_ref_modelinGRPOTrainerandRLOOTrainerwhen using PEFT models by @qgallouedec in https://github.com/huggingface/trl/pull/4912 - Require transformers<5 with PairRMJudge by @albertvillanova in https://github.com/huggingface/trl/pull/4926
- Move VLLMClient to generation module by @albertvillanova in https://github.com/huggingface/trl/pull/4928
- Fix profiling of VLLMGeneration.sync_weights by @albertvillanova in https://github.com/huggingface/trl/pull/4931
- Fix import statement for import_utils in vllm_client.py by @qgallouedec in https://github.com/huggingface/trl/pull/4932
- Set default top_k to 0 in VLLMClient by @albertvillanova in https://github.com/huggingface/trl/pull/4927
- Minor fix docs style by @albertvillanova in https://github.com/huggingface/trl/pull/4953
What's Changed
-
⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4835
-
Support triggering CI via push to ci-* branches by @albertvillanova in https://github.com/huggingface/trl/pull/4840
-
Revert CI hotfix pinning transformers 4.57.4 after tiny model regeneration by @albertvillanova in https://github.com/huggingface/trl/pull/4833
-
Use pytest-datadir in CI tests by @albertvillanova in https://github.com/huggingface/trl/pull/4836
-
Refactor KTO coordinated with DPO [c/N]: Remove ref_model_init_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4837
-
Fix _patch_transformers_hybrid_cache for peft by @albertvillanova in https://github.com/huggingface/trl/pull/4844
-
Refactor KTO [4/N]: Remove unused padding_value by @albertvillanova in https://github.com/huggingface/trl/pull/4839
-
Remove unused padding_value from BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4846
-
Fix CI with dev dependencies: Mark Qwen3-VL tests as xfail by @albertvillanova in https://github.com/huggingface/trl/pull/4851
-
Fix: undefined
current_gradient_accumulation_stepsby @qgallouedec in https://github.com/huggingface/trl/pull/4852 -
Remove deprecated parameters by @qgallouedec in https://github.com/huggingface/trl/pull/4847
-
Add Nash Learning from Human Feedback paper to paper index by @kansalaman in https://github.com/huggingface/trl/pull/4860
-
Use pytest-datadir for accelerate config files by @albertvillanova in https://github.com/huggingface/trl/pull/4861
-
Update OpenEnv dependency to new version for hf jobs scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4843
-
Update
CITATION.cffby @qgallouedec in https://github.com/huggingface/trl/pull/4856 -
[GRPOTrainer]: Agent Training Supports Async Tool Calls by @pramodith in https://github.com/huggingface/trl/pull/4742
-
Enhance GRPO documentation with scaling notes by @javadtaghia in https://github.com/huggingface/trl/pull/4849
-
Add retry strategy to vLLM Client for increased robustness by @apalmas-saifh in https://github.com/huggingface/trl/pull/4845
-
Update generate_tiny_models.py: CohereForAI -> CohereLabs by @Michellehbn in https://github.com/huggingface/trl/pull/4877
-
Refactor KTO coordinated with DPO [e/N]: Remove label_pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4875
-
Refactor KTO coordinated with DPO [d/N]: Remove base_model_attribute_name by @albertvillanova in https://github.com/huggingface/trl/pull/4862
-
Fix type hint in
openenv/utils.py: fallback for no vLLM installed case by @Datta0 in https://github.com/huggingface/trl/pull/4868 -
Update transformer version checks and documentation for lr_scheduler_kwargs workaround by @qgallouedec in https://github.com/huggingface/trl/pull/4876
-
fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
-
Remove label_pad_token_id from experimental trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4878
-
Fix SFT training for prompt-completion type and transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4880
-
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) by @kdubovikov in https://github.com/huggingface/trl/pull/4873
-
Enable vLLM sleep mode for generation in Online DPO by @winglian in https://github.com/huggingface/trl/pull/4882
-
Test distributed training for
RewardTrainer,RLOOTrainerandGRPOTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4823 -
Mark ZeRO 2 as xfail in distributed tests due to current failure by @qgallouedec in https://github.com/huggingface/trl/pull/4885
-
Fix import path for
get_open_portbased on vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4883 -
Fix RewardTrainer's results not reproducible by @liyc-ai in https://github.com/huggingface/trl/pull/4887
-
GOLD training speed up by @141forever in https://github.com/huggingface/trl/pull/4888
-
Transformers v5 release: extend xfail condition for
TestGRPOTrainer.test_training_vlm_and_ligerand update version checks by @qgallouedec in https://github.com/huggingface/trl/pull/4898 -
Fix CI NotImplementedError for bfloat16 by @albertvillanova in https://github.com/huggingface/trl/pull/4902
-
Fix CI AssertionError: Parameter has not changed by @albertvillanova in https://github.com/huggingface/trl/pull/4904
-
Refactor vLLM generation [1/N]: Extract vLLM generation by @albertvillanova in https://github.com/huggingface/trl/pull/4700
-
Created new PTT integration docs as requested by @adityachallapally in https://github.com/huggingface/trl/pull/4907
-
Fix CI TypeError in llm-blender tests by @albertvillanova in https://github.com/huggingface/trl/pull/4919
-
Rearrange variable assignments in
DataCollatorForVisionLanguageModelingby @qgallouedec in https://github.com/huggingface/trl/pull/4911 -
Fix help text formatting for
max_lengthinRewardConfigandSFTConfigby @qgallouedec in https://github.com/huggingface/trl/pull/4910 -
device_mapinit consistency in GRPO/RLOO/KTO by @qgallouedec in https://github.com/huggingface/trl/pull/4909 -
Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4913
-
Remove gradient checkpointing option from various training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4905
-
docs: add DoRA (2402.09353) to Paper Index by @billycrapediem in https://github.com/huggingface/trl/pull/4892
-
Fix CI AssertionError: assert not True by @albertvillanova in https://github.com/huggingface/trl/pull/4921
-
Fix CI ValueError for 0 temperature by @albertvillanova in https://github.com/huggingface/trl/pull/4916
-
Fix extra EOS appended in DPO preprocessing for conversational data by @qgallouedec in https://github.com/huggingface/trl/pull/4908
-
Remove chat template setup in dpo_vlm.py by @qgallouedec in https://github.com/huggingface/trl/pull/4906
-
Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests by @qgallouedec in https://github.com/huggingface/trl/pull/4914
-
Add validation for
sync_ref_modelinGRPOTrainerandRLOOTrainerwhen using PEFT models by @qgallouedec in https://github.com/huggingface/trl/pull/4912 -
Support tool call data in
is_conversationalby @qgallouedec in https://github.com/huggingface/trl/pull/4923 -
Set model dtype to float32 in tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4924
-
Require transformers<5 with PairRMJudge by @albertvillanova in https://github.com/huggingface/trl/pull/4926
-
Move VLLMClient to generation module by @albertvillanova in https://github.com/huggingface/trl/pull/4928
-
Set model dtype to float32 in experimental tests of trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4925
-
Fix profiling of VLLMGeneration.sync_weights by @albertvillanova in https://github.com/huggingface/trl/pull/4931
-
Fix import statement for import_utils in vllm_client.py by @qgallouedec in https://github.com/huggingface/trl/pull/4932
-
Set default top_k to 0 in VLLMClient by @albertvillanova in https://github.com/huggingface/trl/pull/4927
-
[GRPO] Add parquet logging for completions with individual rewards by @qgallouedec in https://github.com/huggingface/trl/pull/4818
-
Fix SFTTrainer init logic: remove TrainingArguments.push_to_hub_token only for transformers < v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4942
-
Remove ref_model_init_kwargs from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4946
-
Update wordle.py example with masking of env tokens by @sergiopaniego in https://github.com/huggingface/trl/pull/4895
-
Fix PPO run_name parameter not taking effect by @mel3c in https://github.com/huggingface/trl/pull/4945
-
Minor fix docs style by @albertvillanova in https://github.com/huggingface/trl/pull/4953
-
Add test for training with
compute_metricsinSFTTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4950 -
Remove access to
warnings_issuedby @qgallouedec in https://github.com/huggingface/trl/pull/4960 -
NeMo-Gym Integration by @cmunley1 in https://github.com/huggingface/trl/pull/4848
-
Add test for tool call data in
RewardTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4959 -
Add test for training with
compute_metricsinRewardTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4958 -
Remove max_prompt_length from experimental PRM by @albertvillanova in https://github.com/huggingface/trl/pull/4963
-
Remove max_prompt_length from experimental BCO by @albertvillanova in https://github.com/huggingface/trl/pull/4964
-
Remove max_prompt_length from experimental CPO by @albertvillanova in https://github.com/huggingface/trl/pull/4965
-
Remove max_prompt_length from experimental ORPO by @albertvillanova in https://github.com/huggingface/trl/pull/4966
-
Revert change in GRPO from NeMo-Gym Integration by @qgallouedec in https://github.com/huggingface/trl/pull/4970
-
Fix test_train_with_chat_template_kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/4971
-
Remove padding_value from experimental CPO and use pad_token_id by @albertvillanova in https://github.com/huggingface/trl/pull/4962
-
Remove truncation from tokenizer calls if no max_length by @albertvillanova in https://github.com/huggingface/trl/pull/4972
-
Set specific OpenEnv version when installed by @sergiopaniego in https://github.com/huggingface/trl/pull/4978
-
Fix add_column in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4979
-
Support truncated completions in GRPO multi-turn training by @albertvillanova in https://github.com/huggingface/trl/pull/4976
-
Replace
torch.allclosewithtorch.testing.assert_closeby @qgallouedec in https://github.com/huggingface/trl/pull/4977 -
Simplify instructions of installation of OpenEnv by @sergiopaniego in https://github.com/huggingface/trl/pull/4980
-
Deprecate parameters in
DPOConfigby @qgallouedec in https://github.com/huggingface/trl/pull/4969 -
[CI] Disallow installation of transformers 5.1.0 due to compatibility issues with DeepSpeed by @qgallouedec in https://github.com/huggingface/trl/pull/4982
-
Replace
warmup_ratiowithwarmup_stepsby @qgallouedec in https://github.com/huggingface/trl/pull/4983 -
Pin transformers!=5.1.0 in deepspeed extra due to incompatibility by @albertvillanova in https://github.com/huggingface/trl/pull/4985
-
Fix passing tokenizer in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4987
-
Update dataset configuration name in toolcall dataset loading by @qgallouedec in https://github.com/huggingface/trl/pull/4984
-
Use local variable instead of attribute in collator tests by @qgallouedec in https://github.com/huggingface/trl/pull/4957
-
Fix import of AutoModelForCausalLMWithValueHead from experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4990
-
Assert chat_template is applied in test_train_with_chat_template_kwargs by @albertvillanova in https://github.com/huggingface/trl/pull/4991
-
Fix deprecation of DPOConfig.max_completion_length by @albertvillanova in https://github.com/huggingface/trl/pull/4992
-
Fix post_init warning stacklevel to 3 by @albertvillanova in https://github.com/huggingface/trl/pull/4993
-
Fix ZeRO-3 + PEFT + gradient checkpointing by @qgallouedec in https://github.com/huggingface/trl/pull/4951
-
Add GitHub Actions workflow for testing against Transformers branch by @qgallouedec in https://github.com/huggingface/trl/pull/4995
-
Add distributed smoke tests workflow for Transformers branch by @qgallouedec in https://github.com/huggingface/trl/pull/4996
-
Update NeMo-Gym to use
env_maskby @cmunley1 in https://github.com/huggingface/trl/pull/4986 -
Update sampling mode to token level for safety by @sergiopaniego in https://github.com/huggingface/trl/pull/4989
-
perf: Qwen SAPO loss optimization by @casinca in https://github.com/huggingface/trl/pull/4956
-
Fix GRPO tool calling for corrupted tool calls by @akshayballal95 in https://github.com/huggingface/trl/pull/4890
-
Add
sanitize_logprobfunction for NaN handling in vLLM log probabilities by @qgallouedec in https://github.com/huggingface/trl/pull/5001 -
[tests] Remove xfail for transformers version >= 5.0.0 due to upstream bug resolution by @qgallouedec in https://github.com/huggingface/trl/pull/5000
-
docs: add CGPO/Mixture of Judges (2409.20370) to Paper Index + link ref to AllTrueJudge by @nabin2004 in https://github.com/huggingface/trl/pull/5002
-
Filter CI SWIG deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5004
-
Fix CI TRLExperimentalWarning in regular tests by @albertvillanova in https://github.com/huggingface/trl/pull/5007
-
Add support for
nested_gatherin OnlineDPOTrainer for transformers v5.2.0 and above by @qgallouedec in https://github.com/huggingface/trl/pull/4981 -
Fix CI FutureWarning: ref_model_init_kwargs is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5009
-
Fix typo in DPO max_prompt_length deprecation warning message by @albertvillanova in https://github.com/huggingface/trl/pull/5020
-
Fix vision model prompt truncation bug in DPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/5023
-
Pin transformers < 5 in judges extra due to incompatibility by @albertvillanova in https://github.com/huggingface/trl/pull/5024
-
Fix CI FutureWarning: generate_during_eval is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5017
-
Fix typo in xfail test reason by @albertvillanova in https://github.com/huggingface/trl/pull/5028
-
Fix CI FutureWarning: rpo_alpha is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5011
-
Fix CI FutureWarning: use_logits_to_keep is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5013
-
Mark Qwen3VL tests as xfail for transformers 5.0.x by @albertvillanova in https://github.com/huggingface/trl/pull/5029
-
[CI] Silence PyTorch JIT and DataLoader deprecation warnings by @qgallouedec in https://github.com/huggingface/trl/pull/4999
-
Add length-unbiased GRPO loss (LUSPO) by @Haseebasif7 in https://github.com/huggingface/trl/pull/4988
-
Fix CI FutureWarning: tools is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5015
-
Filter max_prompt_length UserWarning in all test cases by @albertvillanova in https://github.com/huggingface/trl/pull/5035
-
Fix CI FutureWarning: max_prompt_length is deprecated by @albertvillanova in https://github.com/huggingface/trl/pull/5019
-
Allow testing with transformers 5.1.0 via xfail marks by @albertvillanova in https://github.com/huggingface/trl/pull/5034
-
Rename AOT loss type 'aot_pair' to 'aot_unpaired' in DPO by @qgallouedec in https://github.com/huggingface/trl/pull/5038
-
Deprecate string usage for
ref_modelin DPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5040 -
Deprecate FDivergenceType in DPOConfig; update f_divergence_type to use string values by @qgallouedec in https://github.com/huggingface/trl/pull/5039
-
Fix multiprocessing start method to 'spawn' for test compatibility with Python 3.12+ by @qgallouedec in https://github.com/huggingface/trl/pull/5036
-
Add Online Direct Preference Optimization section to paper index by @qgallouedec in https://github.com/huggingface/trl/pull/5037
-
Release: 0.28 by @albertvillanova in https://github.com/huggingface/trl/pull/5043
New Contributors
- @kansalaman made their first contribution in https://github.com/huggingface/trl/pull/4860
- @javadtaghia made their first contribution in https://github.com/huggingface/trl/pull/4849
- @Michellehbn made their first contribution in https://github.com/huggingface/trl/pull/4877
- @Datta0 made their first contribution in https://github.com/huggingface/trl/pull/4868
- @kdubovikov made their first contribution in https://github.com/huggingface/trl/pull/4873
- @liyc-ai made their first contribution in https://github.com/huggingface/trl/pull/4887
- @141forever made their first contribution in https://github.com/huggingface/trl/pull/4888
- @adityachallapally made their first contribution in https://github.com/huggingface/trl/pull/4907
- @billycrapediem made their first contribution in https://github.com/huggingface/trl/pull/4892
- @mel3c made their first contribution in https://github.com/huggingface/trl/pull/4945
- @cmunley1 made their first contribution in https://github.com/huggingface/trl/pull/4848
- @akshayballal95 made their first contribution in https://github.com/huggingface/trl/pull/4890
- @nabin2004 made their first contribution in https://github.com/huggingface/trl/pull/5002
- @Haseebasif7 made their first contribution in https://github.com/huggingface/trl/pull/4988
Full Changelog: https://github.com/huggingface/trl/compare/v0.27.0...v0.28.0
Fetched April 7, 2026
