v0.24.0
Features
- Add accuracy reward by @pramodith in https://github.com/huggingface/trl/pull/4270
- Add support for
token_type_idsinDPOTrainerby @aweers in https://github.com/huggingface/trl/pull/4285 - 💰
RichProgressCallbackenhancement by @qgallouedec in https://github.com/huggingface/trl/pull/4245 - Include
chat_template_kwargsinapply_chat_templateby @cmpatino in https://github.com/huggingface/trl/pull/4233 - 🏷️ Account for
token_type_idsinDataCollatorForVisionLanguageModelingby @qgallouedec in https://github.com/huggingface/trl/pull/4190 - 🎨 Support mixing image+text and text-only examples by @qgallouedec in https://github.com/huggingface/trl/pull/4203
- 🎁
RewardTrainerrefactor by @qgallouedec in https://github.com/huggingface/trl/pull/4093 - 🎞️ Support sequence classification models in
clone_chat_templateby @qgallouedec in https://github.com/huggingface/trl/pull/4097 - ✨ Add logging for training completion and model saving in training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4048
- 🖨️ Print rich table for messages by @qgallouedec in https://github.com/huggingface/trl/pull/4160
- 😴 Add
vllm_enable_sleep_modeto RLOO Trainer by @sergiopaniego in https://github.com/huggingface/trl/pull/4107 - 📽 Multi image support for GRPO/RLOO by @qgallouedec in https://github.com/huggingface/trl/pull/4113
- 👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4067
- ℹ️ Enable XPU for vLLM client by @jiqing-feng in https://github.com/huggingface/trl/pull/4031
- 🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in https://github.com/huggingface/trl/pull/4089
Fixes
- [Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in https://github.com/huggingface/trl/pull/4193
- Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in https://github.com/huggingface/trl/pull/4196
- Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4201
- 🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in https://github.com/huggingface/trl/pull/4163
- Fix handling of f_divergence_type in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/4171
- ⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in https://github.com/huggingface/trl/pull/4170
- Pass required token_type_ids by @albertvillanova in https://github.com/huggingface/trl/pull/4148
- 👩🦯 Fix usage of VLM using text only by @SamuelBarryCS in https://github.com/huggingface/trl/pull/4080
- ⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in https://github.com/huggingface/trl/pull/4057
- 📤 Fix a dataset loading bug in scripts by @singing-cat in https://github.com/huggingface/trl/pull/4124
- 🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in https://github.com/huggingface/trl/pull/4087
- [GKD] Fix
batchmeanreduce op in GKDTrainer's loss by @cmpatino in https://github.com/huggingface/trl/pull/4105 - Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in https://github.com/huggingface/trl/pull/4081
- Aux loss is already included in the loss returned by Transformers by @pramodith in https://github.com/huggingface/trl/pull/4078
- ♨️ [GRPO] Fix potential hang in
get_high_entropy_maskby @akakakakakaa in https://github.com/huggingface/trl/pull/4041
Documentation
- Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4269
- Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4268
- Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4267
- Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in https://github.com/huggingface/trl/pull/4275
- Remove obsolete research_projects directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4243
- Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in https://github.com/huggingface/trl/pull/4219
- Add trainers taxonomy to docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4195
- Updated vLLM integration guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4162
- [DOCS] Lora without regret by @burtenshaw in https://github.com/huggingface/trl/pull/4181
- Add docstring for OnlineTrainerState by @albertvillanova in https://github.com/huggingface/trl/pull/4166
- ⚖️ Align SFT and DPO for model creation and deprecate
DPOConfig.padding_valuein favour orpad_token_idby @qgallouedec in https://github.com/huggingface/trl/pull/4006 - 🏞️ Context Parallelism benchmark guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4075
- ▶️ Add video to community tutorials by @qgallouedec in https://github.com/huggingface/trl/pull/4090
- Reviewed HF jobs updated docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4088
Deprecations
- Deprecate
BestOfNSamplerby @qgallouedec in https://github.com/huggingface/trl/pull/4291 - Raise deprecation warning for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4226
- Deprecate unused dataset_formatting module by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4242
- Warnings pointing to RFC by @qgallouedec in https://github.com/huggingface/trl/pull/4224
- 🅰️ Remove apex by @qgallouedec in https://github.com/huggingface/trl/pull/4139
- 🗑️ Remove deprecated
AlignPropTrainer,DDPOTrainerandIterativeSFTTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4068
Experimental
- 🧪 Add
trl.experimentalSubmodule by @August-murr in https://github.com/huggingface/trl/pull/4073 - [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in https://github.com/huggingface/trl/pull/4060
- 🪙 [Experimental] Support GSPO-token by @hjh0119 in https://github.com/huggingface/trl/pull/3820
- 🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in https://github.com/huggingface/trl/pull/3989
- 🌾 [Experimental] BEMA for ref model by @qgallouedec in https://github.com/huggingface/trl/pull/3898
What's Changed
- ⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4054
- Remove redundant 'None' from docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4058
- Hotfix: Add ParallelismConfig fallback for transformers with old accelerate by @albertvillanova in https://github.com/huggingface/trl/pull/4063
- Fix CI failure in slow GRPO test due to missing pillow dependency by @albertvillanova in https://github.com/huggingface/trl/pull/4064
- 💡 Fix type hint to
make_parserfunction in multiple scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4050 - Improve docstring of AlignPropTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4059
- ♨️ [GRPO] Fix potential hang in
get_high_entropy_maskby @akakakakakaa in https://github.com/huggingface/trl/pull/4041 - Set Ruff src for first-party imports by @albertvillanova in https://github.com/huggingface/trl/pull/4074
- 🧪 Add
trl.experimentalSubmodule by @August-murr in https://github.com/huggingface/trl/pull/4073 - 🌾 [Experimental] BEMA for ref model by @qgallouedec in https://github.com/huggingface/trl/pull/3898
- ✂️ [GRPO VLM] Update split sizes to generalize by @zucchini-nlp in https://github.com/huggingface/trl/pull/4032
- 🛠️ Fix CI by @qgallouedec in https://github.com/huggingface/trl/pull/4076
- 🐳 Docker update + Simplify Jobs doc by @qgallouedec in https://github.com/huggingface/trl/pull/3931
- Aux loss is already included in the loss returned by Transformers by @pramodith in https://github.com/huggingface/trl/pull/4078
- Reviewed HF jobs updated docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4088
- 🗑️ Remove deprecated
AlignPropTrainer,DDPOTrainerandIterativeSFTTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/4068 - ▶️ Add video to community tutorials by @qgallouedec in https://github.com/huggingface/trl/pull/4090
- Align slow tests with regular tests by @albertvillanova in https://github.com/huggingface/trl/pull/4085
- Add support for testing experimental features by @albertvillanova in https://github.com/huggingface/trl/pull/4082
- Community Tutorials design adaptation for videos by @sergiopaniego in https://github.com/huggingface/trl/pull/4095
- 🏞️ Context Parallelism benchmark guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4075
- ⌨️ Pin num2words by @lewtun in https://github.com/huggingface/trl/pull/4094
- Add deprecation warnings to docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4083
- 📜 Convert
settolistof tags by @qgallouedec in https://github.com/huggingface/trl/pull/4092 - 🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in https://github.com/huggingface/trl/pull/4089
- ⚖️ Align SFT and DPO for model creation and deprecate
DPOConfig.padding_valuein favour orpad_token_idby @qgallouedec in https://github.com/huggingface/trl/pull/4006 - 🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in https://github.com/huggingface/trl/pull/3989
- ℹ️ feat: Add NPU and XPU support for activation offloading by @zilongzheng in https://github.com/huggingface/trl/pull/4056
- ℹ️ Enable XPU for vLLM client by @jiqing-feng in https://github.com/huggingface/trl/pull/4031
- Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in https://github.com/huggingface/trl/pull/4081
- [GKD] Fix
batchmeanreduce op in GKDTrainer's loss by @cmpatino in https://github.com/huggingface/trl/pull/4105 - 👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4067
- Some nits GRPO and RLOO trainer docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4108
- Fix typos by @cyyever in https://github.com/huggingface/trl/pull/4106
- Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4109
- Fix VLM configs in generate_tiny_models by @albertvillanova in https://github.com/huggingface/trl/pull/4101
- docs: correct option name to enable vllm sleep mode by @muupan in https://github.com/huggingface/trl/pull/4102
- CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 by @albertvillanova in https://github.com/huggingface/trl/pull/4120
- Fix code style with make precommit by @albertvillanova in https://github.com/huggingface/trl/pull/4119
- 🟩 Drop
image_split_sizesin favour ofimage_grid_thwby @qgallouedec in https://github.com/huggingface/trl/pull/4111 - 🔭 Align param passing to VLM configs in generate_tiny_models by @albertvillanova in https://github.com/huggingface/trl/pull/4118
- 📽 Multi image support for GRPO/RLOO by @qgallouedec in https://github.com/huggingface/trl/pull/4113
- 😴 Add
vllm_enable_sleep_modeto RLOO Trainer by @sergiopaniego in https://github.com/huggingface/trl/pull/4107 - 🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in https://github.com/huggingface/trl/pull/4087
- 📤 Fix a dataset loading bug in scripts by @singing-cat in https://github.com/huggingface/trl/pull/4124
- ⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in https://github.com/huggingface/trl/pull/4057
- 📌 Pin vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4122
- 👋 Remove
backendparameter fromGuidedDecodingParamsby @qgallouedec in https://github.com/huggingface/trl/pull/4123 - 🧹 Remove
max_batch_tokens,num_blocksandblock_sizefrom generation kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/4065 - Remove Python version < 3.13 constraint from vllm extra dependencies by @albertvillanova in https://github.com/huggingface/trl/pull/4125
- 👩🦯 Fix usage of VLM using text only by @SamuelBarryCS in https://github.com/huggingface/trl/pull/4080
- [SFTrainer]: Fix DFT Loss by @pramodith in https://github.com/huggingface/trl/pull/4112
- Improve typing of SFT trainer by @cyyever in https://github.com/huggingface/trl/pull/4007
- 🌺 Fix GPT-OSS test by @qgallouedec in https://github.com/huggingface/trl/pull/4134
- 🪙 [Experimental] Support GSPO-token by @hjh0119 in https://github.com/huggingface/trl/pull/3820
- Fix CI: torch.AcceleratorError: CUDA error: device-side assert triggered by @albertvillanova in https://github.com/huggingface/trl/pull/4138
- 🤸♀️ Fix DFT test by @qgallouedec in https://github.com/huggingface/trl/pull/4135
- 🌵 Mark GKD trainer test as expected failure due to OOM issue by @qgallouedec in https://github.com/huggingface/trl/pull/4126
- [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in https://github.com/huggingface/trl/pull/4060
- Fix import statement and GRPO test case by @qgallouedec in https://github.com/huggingface/trl/pull/4141
- Refactor trainers classes to use BaseTrainer with shared functionality by @albertvillanova in https://github.com/huggingface/trl/pull/4128
- Fixed some <Tip> rendering issues by @sergiopaniego in https://github.com/huggingface/trl/pull/4143
- 😷 Refactor GRPO/RLOO to isolate
_generateby @qgallouedec in https://github.com/huggingface/trl/pull/4114 - 🟩 Drop
image_split_sizesin favour ofimage_grid_thwby @qgallouedec in https://github.com/huggingface/trl/pull/4156 - 📽 Multi image support for GRPO replay buffer by @qgallouedec in https://github.com/huggingface/trl/pull/4157
- 😷 Refactor GRPO/RLOO to isolate
_generatefor GRPO with replay buffer by @qgallouedec in https://github.com/huggingface/trl/pull/4158 - Add docstring for OnlineTrainerState by @albertvillanova in https://github.com/huggingface/trl/pull/4166
- Pass required token_type_ids by @albertvillanova in https://github.com/huggingface/trl/pull/4148
- 💡 Replace
<Tip>with new markdown syntax by @qgallouedec in https://github.com/huggingface/trl/pull/4161 - Remove unnecessary list comprehensions by @albertvillanova in https://github.com/huggingface/trl/pull/4164
- Add missing FDivergenceType docstring by @albertvillanova in https://github.com/huggingface/trl/pull/4165
- Fix docstrings with 'deprecated' Sphinx directive by @albertvillanova in https://github.com/huggingface/trl/pull/4174
- Fix docstring interlink to parent class for NashMDTrainer and XPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4179
- Fix link in docstring of RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4180
- 🖨️ Print rich table for messages by @qgallouedec in https://github.com/huggingface/trl/pull/4160
- 🅰️ Remove apex by @qgallouedec in https://github.com/huggingface/trl/pull/4139
- Fix CI ValueError: Unknown loss type: dapo by @albertvillanova in https://github.com/huggingface/trl/pull/4173
- Fix PEFT interlinks in docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4178
- ✨ Add logging for training completion and model saving in training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4048
- 👾 Use our own
require_bitsandbytesby @qgallouedec in https://github.com/huggingface/trl/pull/4137 - 🎞️ Support sequence classification models in
clone_chat_templateby @qgallouedec in https://github.com/huggingface/trl/pull/4097 - ⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in https://github.com/huggingface/trl/pull/4170
- 🎁
RewardTrainerrefactor by @qgallouedec in https://github.com/huggingface/trl/pull/4093 - 🧺 [1/N] Refactor
_generatein GRPO/RLOO: list of ints instead of tensors by @qgallouedec in https://github.com/huggingface/trl/pull/4146 - Fix handling of f_divergence_type in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/4171
- 🔣 Fix test: replace
trainer.tokenizerbytrainer.processing_classby @qgallouedec in https://github.com/huggingface/trl/pull/4185 - Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests by @albertvillanova in https://github.com/huggingface/trl/pull/4176
- Hotfix wrong formatting of docstrings with blockquote tips by @albertvillanova in https://github.com/huggingface/trl/pull/4187
- 🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in https://github.com/huggingface/trl/pull/4163
- Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test by @albertvillanova in https://github.com/huggingface/trl/pull/4192
- [DOCS] Lora without regret by @burtenshaw in https://github.com/huggingface/trl/pull/4181
- [DOCS/FIX] lora without regrets - fix lr by @burtenshaw in https://github.com/huggingface/trl/pull/4207
- Remove custome_container for building the docs by @albertvillanova in https://github.com/huggingface/trl/pull/4198
- Remove tokenizer creation from
sftexample script by @sergiopaniego in https://github.com/huggingface/trl/pull/4197 - Hotfix: Exclude transformers 4.57.0 for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4209
- Replace unittest with pytest by @albertvillanova in https://github.com/huggingface/trl/pull/4188
- Updated vLLM integration guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4162
- Remove
Optionalfromprocessing_classinPPOTrainerby @sergiopaniego in https://github.com/huggingface/trl/pull/4212 - Replace setup with pyproject and fix packaging unintended modules by @albertvillanova in https://github.com/huggingface/trl/pull/4194
- Removed tokenizer/processor creation from example scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4211
- Apply style and revert change in
sft_video_llmexample by @qgallouedec in https://github.com/huggingface/trl/pull/4214 - Fix
trl-internal-testing/tiny-DbrxForCausalLMby @qgallouedec in https://github.com/huggingface/trl/pull/4213 - Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4201
- Fix LoRA params in Python in LoRA without regret by @sergiopaniego in https://github.com/huggingface/trl/pull/4215
- [DOCS] fix prose in lora guide by @burtenshaw in https://github.com/huggingface/trl/pull/4217
- Add trainers taxonomy to docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4195
- 🎨 Support mixing image+text and text-only examples by @qgallouedec in https://github.com/huggingface/trl/pull/4203
- 🧺 [2/N] Refactor
_generatein GRPO/RLOO: Useprompt_idsfrom generation by @qgallouedec in https://github.com/huggingface/trl/pull/4152 - Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in https://github.com/huggingface/trl/pull/4196
- Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in https://github.com/huggingface/trl/pull/4219
- 🏷️ Account for
token_type_idsinDataCollatorForVisionLanguageModelingby @qgallouedec in https://github.com/huggingface/trl/pull/4190 - Exclude vllm dependencies from dev extra by @albertvillanova in https://github.com/huggingface/trl/pull/4229
- Fix CI unittest asserts by @albertvillanova in https://github.com/huggingface/trl/pull/4234
- Fix callable annotations by @albertvillanova in https://github.com/huggingface/trl/pull/4216
- Remove unused Path import in init.py by @albertvillanova in https://github.com/huggingface/trl/pull/4227
- Update CI Docker image to pytorch/pytorch:2.8.0 by @albertvillanova in https://github.com/huggingface/trl/pull/4232
- Replace setup with pyproject in CI tests paths by @albertvillanova in https://github.com/huggingface/trl/pull/4230
- Fix CI IndentationError for Python 3.13.8 by @albertvillanova in https://github.com/huggingface/trl/pull/4240
- Remove unused log_example_reports.py script by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4241
- 🧘 Enhance markdown style by @qgallouedec in https://github.com/huggingface/trl/pull/4235
- Warnings pointing to RFC by @qgallouedec in https://github.com/huggingface/trl/pull/4224
- Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors by @sywangyi in https://github.com/huggingface/trl/pull/4236
- Fix CI CUDA out of memory errors by improving GPU memory management by @albertvillanova in https://github.com/huggingface/trl/pull/4238
- Install peft from main for CI tests with dev dependencies by @albertvillanova in https://github.com/huggingface/trl/pull/4250
- Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' by @albertvillanova in https://github.com/huggingface/trl/pull/4253
- Fix CI slow test ValueError: Unknown loss type: dapo by @albertvillanova in https://github.com/huggingface/trl/pull/4254
- 🧺 [3/N] Refactor
_generatein GRPO/RLOO: Rely on generator for prompt truncation by @qgallouedec in https://github.com/huggingface/trl/pull/4153 - Remove obsolete research_projects directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4243
- Deprecate unused dataset_formatting module by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4242
- Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' by @albertvillanova in https://github.com/huggingface/trl/pull/4255
- [Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in https://github.com/huggingface/trl/pull/4193
- Include
chat_template_kwargsinapply_chat_templateby @cmpatino in https://github.com/huggingface/trl/pull/4233 - Fix Python version check for skipping tests on Python 3.13.8 by @albertvillanova in https://github.com/huggingface/trl/pull/4246
- Raise deprecation warning for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4226
- Fix docstring interlinks by @albertvillanova in https://github.com/huggingface/trl/pull/4221
- Use FutureWarning instead of DeprecationWarning by @albertvillanova in https://github.com/huggingface/trl/pull/4266
- Fix style with make precommit by @albertvillanova in https://github.com/huggingface/trl/pull/4265
- Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in https://github.com/huggingface/trl/pull/4275
- Fix typo in Colab link by @sergiopaniego in https://github.com/huggingface/trl/pull/4276
- Fix docstrings with Sphinx 'deprecated' directive by @albertvillanova in https://github.com/huggingface/trl/pull/4279
- Fix CI slow test OSError: You are trying to access a gated repo by @albertvillanova in https://github.com/huggingface/trl/pull/4283
- 💰
RichProgressCallbackenhancement by @qgallouedec in https://github.com/huggingface/trl/pull/4245 - Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' by @albertvillanova in https://github.com/huggingface/trl/pull/4262
- Replace unittest skipTest with pytest.skip by @albertvillanova in https://github.com/huggingface/trl/pull/4263
- Fix CI slow tests: ImportError: vLLM is not installed by @albertvillanova in https://github.com/huggingface/trl/pull/4287
- Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4269
- Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4268
- Add support for
token_type_idsinDPOTrainerby @aweers in https://github.com/huggingface/trl/pull/4285 - Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4267
- Add accuracy reward by @pramodith in https://github.com/huggingface/trl/pull/4270
- Remove unused commands directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4258
- Deprecate
BestOfNSamplerby @qgallouedec in https://github.com/huggingface/trl/pull/4291 - Release: v0.24 by @qgallouedec in https://github.com/huggingface/trl/pull/4292
New Contributors
- @zucchini-nlp made their first contribution in https://github.com/huggingface/trl/pull/4032
- @parambharat made their first contribution in https://github.com/huggingface/trl/pull/4089
- @zilongzheng made their first contribution in https://github.com/huggingface/trl/pull/4056
- @jiqing-feng made their first contribution in https://github.com/huggingface/trl/pull/4031
- @Hoesu made their first contribution in https://github.com/huggingface/trl/pull/4081
- @cmpatino made their first contribution in https://github.com/huggingface/trl/pull/4105
- @singing-cat made their first contribution in https://github.com/huggingface/trl/pull/4124
- @SamuelBarryCS made their first contribution in https://github.com/huggingface/trl/pull/4080
- @YonatanGideoni made their first contribution in https://github.com/huggingface/trl/pull/4163
- @aweers made their first contribution in https://github.com/huggingface/trl/pull/4285
Full Changelog: https://github.com/huggingface/trl/compare/v0.23.0...v0.24.0
Fetched April 7, 2026
