v0.24.0

Features

Add accuracy reward by @pramodith in https://github.com/huggingface/trl/pull/4270
Add support for token_type_ids in DPOTrainer by @aweers in https://github.com/huggingface/trl/pull/4285
💰 RichProgressCallback enhancement by @qgallouedec in https://github.com/huggingface/trl/pull/4245
Include chat_template_kwargs in apply_chat_template by @cmpatino in https://github.com/huggingface/trl/pull/4233
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling by @qgallouedec in https://github.com/huggingface/trl/pull/4190
🎨 Support mixing image+text and text-only examples by @qgallouedec in https://github.com/huggingface/trl/pull/4203
🎁 RewardTrainer refactor by @qgallouedec in https://github.com/huggingface/trl/pull/4093
🎞️ Support sequence classification models in clone_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/4097
✨ Add logging for training completion and model saving in training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4048
🖨️ Print rich table for messages by @qgallouedec in https://github.com/huggingface/trl/pull/4160
😴 Add vllm_enable_sleep_mode to RLOO Trainer by @sergiopaniego in https://github.com/huggingface/trl/pull/4107
📽 Multi image support for GRPO/RLOO by @qgallouedec in https://github.com/huggingface/trl/pull/4113
👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4067
ℹ️ Enable XPU for vLLM client by @jiqing-feng in https://github.com/huggingface/trl/pull/4031
🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in https://github.com/huggingface/trl/pull/4089

Fixes

[Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in https://github.com/huggingface/trl/pull/4193
Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in https://github.com/huggingface/trl/pull/4196
Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4201
🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in https://github.com/huggingface/trl/pull/4163
Fix handling of f_divergence_type in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/4171
⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in https://github.com/huggingface/trl/pull/4170
Pass required token_type_ids by @albertvillanova in https://github.com/huggingface/trl/pull/4148
👩‍🦯 Fix usage of VLM using text only by @SamuelBarryCS in https://github.com/huggingface/trl/pull/4080
⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in https://github.com/huggingface/trl/pull/4057
📤 Fix a dataset loading bug in scripts by @singing-cat in https://github.com/huggingface/trl/pull/4124
🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in https://github.com/huggingface/trl/pull/4087
[GKD] Fix batchmean reduce op in GKDTrainer's loss by @cmpatino in https://github.com/huggingface/trl/pull/4105
Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in https://github.com/huggingface/trl/pull/4081
Aux loss is already included in the loss returned by Transformers by @pramodith in https://github.com/huggingface/trl/pull/4078
♨️ [GRPO] Fix potential hang in get_high_entropy_mask by @akakakakakaa in https://github.com/huggingface/trl/pull/4041

Documentation

Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4269
Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4268
Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4267
Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in https://github.com/huggingface/trl/pull/4275
Remove obsolete research_projects directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4243
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in https://github.com/huggingface/trl/pull/4219
Add trainers taxonomy to docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4195
Updated vLLM integration guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4162
[DOCS] Lora without regret by @burtenshaw in https://github.com/huggingface/trl/pull/4181
Add docstring for OnlineTrainerState by @albertvillanova in https://github.com/huggingface/trl/pull/4166
⚖️ Align SFT and DPO for model creation and deprecate DPOConfig.padding_value in favour or pad_token_id by @qgallouedec in https://github.com/huggingface/trl/pull/4006
🏞️ Context Parallelism benchmark guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4075
▶️ Add video to community tutorials by @qgallouedec in https://github.com/huggingface/trl/pull/4090
Reviewed HF jobs updated docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4088

Deprecations

Deprecate BestOfNSampler by @qgallouedec in https://github.com/huggingface/trl/pull/4291
Raise deprecation warning for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4226
Deprecate unused dataset_formatting module by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4242
Warnings pointing to RFC by @qgallouedec in https://github.com/huggingface/trl/pull/4224
🅰️ Remove apex by @qgallouedec in https://github.com/huggingface/trl/pull/4139
🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4068

Experimental

🧪 Add trl.experimental Submodule by @August-murr in https://github.com/huggingface/trl/pull/4073
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in https://github.com/huggingface/trl/pull/4060
🪙 [Experimental] Support GSPO-token by @hjh0119 in https://github.com/huggingface/trl/pull/3820
🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in https://github.com/huggingface/trl/pull/3989
🌾 [Experimental] BEMA for ref model by @qgallouedec in https://github.com/huggingface/trl/pull/3898

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4054
Remove redundant 'None' from docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4058
Hotfix: Add ParallelismConfig fallback for transformers with old accelerate by @albertvillanova in https://github.com/huggingface/trl/pull/4063
Fix CI failure in slow GRPO test due to missing pillow dependency by @albertvillanova in https://github.com/huggingface/trl/pull/4064
💡 Fix type hint to make_parser function in multiple scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4050
Improve docstring of AlignPropTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4059
♨️ [GRPO] Fix potential hang in get_high_entropy_mask by @akakakakakaa in https://github.com/huggingface/trl/pull/4041
Set Ruff src for first-party imports by @albertvillanova in https://github.com/huggingface/trl/pull/4074
🧪 Add trl.experimental Submodule by @August-murr in https://github.com/huggingface/trl/pull/4073
🌾 [Experimental] BEMA for ref model by @qgallouedec in https://github.com/huggingface/trl/pull/3898
✂️ [GRPO VLM] Update split sizes to generalize by @zucchini-nlp in https://github.com/huggingface/trl/pull/4032
🛠️ Fix CI by @qgallouedec in https://github.com/huggingface/trl/pull/4076
🐳 Docker update + Simplify Jobs doc by @qgallouedec in https://github.com/huggingface/trl/pull/3931
Aux loss is already included in the loss returned by Transformers by @pramodith in https://github.com/huggingface/trl/pull/4078
Reviewed HF jobs updated docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4088
🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4068
▶️ Add video to community tutorials by @qgallouedec in https://github.com/huggingface/trl/pull/4090
Align slow tests with regular tests by @albertvillanova in https://github.com/huggingface/trl/pull/4085
Add support for testing experimental features by @albertvillanova in https://github.com/huggingface/trl/pull/4082
Community Tutorials design adaptation for videos by @sergiopaniego in https://github.com/huggingface/trl/pull/4095
🏞️ Context Parallelism benchmark guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4075
⌨️ Pin num2words by @lewtun in https://github.com/huggingface/trl/pull/4094
Add deprecation warnings to docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4083
📜 Convert set to list of tags by @qgallouedec in https://github.com/huggingface/trl/pull/4092
🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in https://github.com/huggingface/trl/pull/4089
⚖️ Align SFT and DPO for model creation and deprecate DPOConfig.padding_value in favour or pad_token_id by @qgallouedec in https://github.com/huggingface/trl/pull/4006
🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in https://github.com/huggingface/trl/pull/3989
ℹ️ feat: Add NPU and XPU support for activation offloading by @zilongzheng in https://github.com/huggingface/trl/pull/4056
ℹ️ Enable XPU for vLLM client by @jiqing-feng in https://github.com/huggingface/trl/pull/4031
Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in https://github.com/huggingface/trl/pull/4081
[GKD] Fix batchmean reduce op in GKDTrainer's loss by @cmpatino in https://github.com/huggingface/trl/pull/4105
👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4067
Some nits GRPO and RLOO trainer docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4108
Fix typos by @cyyever in https://github.com/huggingface/trl/pull/4106
Fix typos by @qgallouedec in https://github.com/huggingface/trl/pull/4109
Fix VLM configs in generate_tiny_models by @albertvillanova in https://github.com/huggingface/trl/pull/4101
docs: correct option name to enable vllm sleep mode by @muupan in https://github.com/huggingface/trl/pull/4102
CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 by @albertvillanova in https://github.com/huggingface/trl/pull/4120
Fix code style with make precommit by @albertvillanova in https://github.com/huggingface/trl/pull/4119
🟩 Drop image_split_sizes in favour of image_grid_thw by @qgallouedec in https://github.com/huggingface/trl/pull/4111
🔭 Align param passing to VLM configs in generate_tiny_models by @albertvillanova in https://github.com/huggingface/trl/pull/4118
📽 Multi image support for GRPO/RLOO by @qgallouedec in https://github.com/huggingface/trl/pull/4113
😴 Add vllm_enable_sleep_mode to RLOO Trainer by @sergiopaniego in https://github.com/huggingface/trl/pull/4107
🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in https://github.com/huggingface/trl/pull/4087
📤 Fix a dataset loading bug in scripts by @singing-cat in https://github.com/huggingface/trl/pull/4124
⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in https://github.com/huggingface/trl/pull/4057
📌 Pin vLLM version by @qgallouedec in https://github.com/huggingface/trl/pull/4122
👋 Remove backend parameter from GuidedDecodingParams by @qgallouedec in https://github.com/huggingface/trl/pull/4123
🧹 Remove max_batch_tokens, num_blocks and block_size from generation kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/4065
Remove Python version < 3.13 constraint from vllm extra dependencies by @albertvillanova in https://github.com/huggingface/trl/pull/4125
👩‍🦯 Fix usage of VLM using text only by @SamuelBarryCS in https://github.com/huggingface/trl/pull/4080
[SFTrainer]: Fix DFT Loss by @pramodith in https://github.com/huggingface/trl/pull/4112
Improve typing of SFT trainer by @cyyever in https://github.com/huggingface/trl/pull/4007
🌺 Fix GPT-OSS test by @qgallouedec in https://github.com/huggingface/trl/pull/4134
🪙 [Experimental] Support GSPO-token by @hjh0119 in https://github.com/huggingface/trl/pull/3820
Fix CI: torch.AcceleratorError: CUDA error: device-side assert triggered by @albertvillanova in https://github.com/huggingface/trl/pull/4138
🤸‍♀️ Fix DFT test by @qgallouedec in https://github.com/huggingface/trl/pull/4135
🌵 Mark GKD trainer test as expected failure due to OOM issue by @qgallouedec in https://github.com/huggingface/trl/pull/4126
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in https://github.com/huggingface/trl/pull/4060
Fix import statement and GRPO test case by @qgallouedec in https://github.com/huggingface/trl/pull/4141
Refactor trainers classes to use BaseTrainer with shared functionality by @albertvillanova in https://github.com/huggingface/trl/pull/4128
Fixed some <Tip> rendering issues by @sergiopaniego in https://github.com/huggingface/trl/pull/4143
😷 Refactor GRPO/RLOO to isolate _generate by @qgallouedec in https://github.com/huggingface/trl/pull/4114
🟩 Drop image_split_sizes in favour of image_grid_thw by @qgallouedec in https://github.com/huggingface/trl/pull/4156
📽 Multi image support for GRPO replay buffer by @qgallouedec in https://github.com/huggingface/trl/pull/4157
😷 Refactor GRPO/RLOO to isolate _generate for GRPO with replay buffer by @qgallouedec in https://github.com/huggingface/trl/pull/4158
Add docstring for OnlineTrainerState by @albertvillanova in https://github.com/huggingface/trl/pull/4166
Pass required token_type_ids by @albertvillanova in https://github.com/huggingface/trl/pull/4148
💡 Replace <Tip> with new markdown syntax by @qgallouedec in https://github.com/huggingface/trl/pull/4161
Remove unnecessary list comprehensions by @albertvillanova in https://github.com/huggingface/trl/pull/4164
Add missing FDivergenceType docstring by @albertvillanova in https://github.com/huggingface/trl/pull/4165
Fix docstrings with 'deprecated' Sphinx directive by @albertvillanova in https://github.com/huggingface/trl/pull/4174
Fix docstring interlink to parent class for NashMDTrainer and XPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4179
Fix link in docstring of RLOOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4180
🖨️ Print rich table for messages by @qgallouedec in https://github.com/huggingface/trl/pull/4160
🅰️ Remove apex by @qgallouedec in https://github.com/huggingface/trl/pull/4139
Fix CI ValueError: Unknown loss type: dapo by @albertvillanova in https://github.com/huggingface/trl/pull/4173
Fix PEFT interlinks in docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4178
✨ Add logging for training completion and model saving in training scripts by @qgallouedec in https://github.com/huggingface/trl/pull/4048
👾 Use our own require_bitsandbytes by @qgallouedec in https://github.com/huggingface/trl/pull/4137
🎞️ Support sequence classification models in clone_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/4097
⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in https://github.com/huggingface/trl/pull/4170
🎁 RewardTrainer refactor by @qgallouedec in https://github.com/huggingface/trl/pull/4093
🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors by @qgallouedec in https://github.com/huggingface/trl/pull/4146
Fix handling of f_divergence_type in DPO by @albertvillanova in https://github.com/huggingface/trl/pull/4171
🔣 Fix test: replace trainer.tokenizer by trainer.processing_class by @qgallouedec in https://github.com/huggingface/trl/pull/4185
Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests by @albertvillanova in https://github.com/huggingface/trl/pull/4176
Hotfix wrong formatting of docstrings with blockquote tips by @albertvillanova in https://github.com/huggingface/trl/pull/4187
🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in https://github.com/huggingface/trl/pull/4163
Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test by @albertvillanova in https://github.com/huggingface/trl/pull/4192
[DOCS] Lora without regret by @burtenshaw in https://github.com/huggingface/trl/pull/4181
[DOCS/FIX] lora without regrets - fix lr by @burtenshaw in https://github.com/huggingface/trl/pull/4207
Remove custome_container for building the docs by @albertvillanova in https://github.com/huggingface/trl/pull/4198
Remove tokenizer creation from sft example script by @sergiopaniego in https://github.com/huggingface/trl/pull/4197
Hotfix: Exclude transformers 4.57.0 for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4209
Replace unittest with pytest by @albertvillanova in https://github.com/huggingface/trl/pull/4188
Updated vLLM integration guide by @sergiopaniego in https://github.com/huggingface/trl/pull/4162
Remove Optional from processing_class in PPOTrainer by @sergiopaniego in https://github.com/huggingface/trl/pull/4212
Replace setup with pyproject and fix packaging unintended modules by @albertvillanova in https://github.com/huggingface/trl/pull/4194
Removed tokenizer/processor creation from example scripts by @sergiopaniego in https://github.com/huggingface/trl/pull/4211
Apply style and revert change in sft_video_llm example by @qgallouedec in https://github.com/huggingface/trl/pull/4214
Fix trl-internal-testing/tiny-DbrxForCausalLM by @qgallouedec in https://github.com/huggingface/trl/pull/4213
Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4201
Fix LoRA params in Python in LoRA without regret by @sergiopaniego in https://github.com/huggingface/trl/pull/4215
[DOCS] fix prose in lora guide by @burtenshaw in https://github.com/huggingface/trl/pull/4217
Add trainers taxonomy to docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4195
🎨 Support mixing image+text and text-only examples by @qgallouedec in https://github.com/huggingface/trl/pull/4203
🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation by @qgallouedec in https://github.com/huggingface/trl/pull/4152
Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in https://github.com/huggingface/trl/pull/4196
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in https://github.com/huggingface/trl/pull/4219
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling by @qgallouedec in https://github.com/huggingface/trl/pull/4190
Exclude vllm dependencies from dev extra by @albertvillanova in https://github.com/huggingface/trl/pull/4229
Fix CI unittest asserts by @albertvillanova in https://github.com/huggingface/trl/pull/4234
Fix callable annotations by @albertvillanova in https://github.com/huggingface/trl/pull/4216
Remove unused Path import in init.py by @albertvillanova in https://github.com/huggingface/trl/pull/4227
Update CI Docker image to pytorch/pytorch:2.8.0 by @albertvillanova in https://github.com/huggingface/trl/pull/4232
Replace setup with pyproject in CI tests paths by @albertvillanova in https://github.com/huggingface/trl/pull/4230
Fix CI IndentationError for Python 3.13.8 by @albertvillanova in https://github.com/huggingface/trl/pull/4240
Remove unused log_example_reports.py script by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4241
🧘 Enhance markdown style by @qgallouedec in https://github.com/huggingface/trl/pull/4235
Warnings pointing to RFC by @qgallouedec in https://github.com/huggingface/trl/pull/4224
Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors by @sywangyi in https://github.com/huggingface/trl/pull/4236
Fix CI CUDA out of memory errors by improving GPU memory management by @albertvillanova in https://github.com/huggingface/trl/pull/4238
Install peft from main for CI tests with dev dependencies by @albertvillanova in https://github.com/huggingface/trl/pull/4250
Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' by @albertvillanova in https://github.com/huggingface/trl/pull/4253
Fix CI slow test ValueError: Unknown loss type: dapo by @albertvillanova in https://github.com/huggingface/trl/pull/4254
🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation by @qgallouedec in https://github.com/huggingface/trl/pull/4153
Remove obsolete research_projects directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4243
Deprecate unused dataset_formatting module by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4242
Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' by @albertvillanova in https://github.com/huggingface/trl/pull/4255
[Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in https://github.com/huggingface/trl/pull/4193
Include chat_template_kwargs in apply_chat_template by @cmpatino in https://github.com/huggingface/trl/pull/4233
Fix Python version check for skipping tests on Python 3.13.8 by @albertvillanova in https://github.com/huggingface/trl/pull/4246
Raise deprecation warning for Python 3.9 by @albertvillanova in https://github.com/huggingface/trl/pull/4226
Fix docstring interlinks by @albertvillanova in https://github.com/huggingface/trl/pull/4221
Use FutureWarning instead of DeprecationWarning by @albertvillanova in https://github.com/huggingface/trl/pull/4266
Fix style with make precommit by @albertvillanova in https://github.com/huggingface/trl/pull/4265
Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in https://github.com/huggingface/trl/pull/4275
Fix typo in Colab link by @sergiopaniego in https://github.com/huggingface/trl/pull/4276
Fix docstrings with Sphinx 'deprecated' directive by @albertvillanova in https://github.com/huggingface/trl/pull/4279
Fix CI slow test OSError: You are trying to access a gated repo by @albertvillanova in https://github.com/huggingface/trl/pull/4283
💰 RichProgressCallback enhancement by @qgallouedec in https://github.com/huggingface/trl/pull/4245
Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' by @albertvillanova in https://github.com/huggingface/trl/pull/4262
Replace unittest skipTest with pytest.skip by @albertvillanova in https://github.com/huggingface/trl/pull/4263
Fix CI slow tests: ImportError: vLLM is not installed by @albertvillanova in https://github.com/huggingface/trl/pull/4287
Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4269
Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4268
Add support for token_type_ids in DPOTrainer by @aweers in https://github.com/huggingface/trl/pull/4285
Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4267
Add accuracy reward by @pramodith in https://github.com/huggingface/trl/pull/4270
Remove unused commands directory by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4258
Deprecate BestOfNSampler by @qgallouedec in https://github.com/huggingface/trl/pull/4291
Release: v0.24 by @qgallouedec in https://github.com/huggingface/trl/pull/4292

New Contributors

@zucchini-nlp made their first contribution in https://github.com/huggingface/trl/pull/4032
@parambharat made their first contribution in https://github.com/huggingface/trl/pull/4089
@zilongzheng made their first contribution in https://github.com/huggingface/trl/pull/4056
@jiqing-feng made their first contribution in https://github.com/huggingface/trl/pull/4031
@Hoesu made their first contribution in https://github.com/huggingface/trl/pull/4081
@cmpatino made their first contribution in https://github.com/huggingface/trl/pull/4105
@singing-cat made their first contribution in https://github.com/huggingface/trl/pull/4124
@SamuelBarryCS made their first contribution in https://github.com/huggingface/trl/pull/4080
@YonatanGideoni made their first contribution in https://github.com/huggingface/trl/pull/4163
@aweers made their first contribution in https://github.com/huggingface/trl/pull/4285

Full Changelog: https://github.com/huggingface/trl/compare/v0.23.0...v0.24.0

Features

Fixes

Documentation

Deprecations

Experimental

What's Changed

New Contributors

More from Hugging Face

More from Hugging Face