v0.25.0

Features

💤 Switch to sleep level=2 and split wake-ups in GRPO and RLOO trainers by @xxrjun in https://github.com/huggingface/trl/pull/4296
Added custom prepare_model_for_kbit_training to save VRAM by @sergiopaniego in https://github.com/huggingface/trl/pull/4335
Add add_generation_prompt to processor_kwargs in GRPO and RLOO trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4361
Add support for Trackio completions logging in GRPOTrainer by @taha-yassine in https://github.com/huggingface/trl/pull/4359
Support chat_template_kwargs by @pramodith in https://github.com/huggingface/trl/pull/4350
GRPO: ScaleRL -> Support casting LM Head to FP32 by @pramodith in https://github.com/huggingface/trl/pull/4303
Support casting to fp32 when word embeddings are tied to lm_head by @pramodith in https://github.com/huggingface/trl/pull/4446
💬 Add chat to vLLM client and server, update trainer calls by @qgallouedec in https://github.com/huggingface/trl/pull/4450

Experimental

🚚 Move BCO to trl.experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4312
👑 [experimental] GOLD Trainer by @kashif in https://github.com/huggingface/trl/pull/4349
Add PAPOTrainer for preference-based optimization by @SolarWindRider in https://github.com/huggingface/trl/pull/4334
[GFPO] fix the GFPO loss calculation error caused by unmodified old_per_token_logps by @Peter-Chou in https://github.com/huggingface/trl/pull/4454
🕹️ Add rollout function for OpenEnv integration by @lewtun in https://github.com/huggingface/trl/pull/4310

Fixes

[Activation-checkpointing] add tensor dedup and param offloading by @kashif in https://github.com/huggingface/trl/pull/4247
Fix attn_implementation name in OnlineDPO for transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4322
Hotfix: Fall back to config.text_config._name_or_path if missing config._name_or_path by @albertvillanova in https://github.com/huggingface/trl/pull/4324
Fix GRPO and RLOO trainers for continuous batching by @albertvillanova in https://github.com/huggingface/trl/pull/4348
Fix: add_generation_prompt=True for conversational only by @qgallouedec in https://github.com/huggingface/trl/pull/4362
Remove ignored max_length parameter from PRMTrainer data collator by @albertvillanova in https://github.com/huggingface/trl/pull/4355
Fix add_generation_prompt arg for paged transformers in GRPO and RLOO trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4370
Fix GKD Liger memory spike by @qgallouedec in https://github.com/huggingface/trl/pull/4140
Fix GRPO with replay buffer by inserting images in the prompt by @albertvillanova in https://github.com/huggingface/trl/pull/4391
fix: Remove chat template setting from non-SFT trainer scripts by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4437
🖼️ Fix reporting images with vLLM by @qgallouedec in https://github.com/huggingface/trl/pull/4476

Documentation and Examples

Added SFT LoRA notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4244
Update notebooks README with latest additions by @sergiopaniego in https://github.com/huggingface/trl/pull/4316
Add notebooks to Examples docs and restructure by @sergiopaniego in https://github.com/huggingface/trl/pull/4317
Highlight OpenEnv in landing docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4327
Update OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4328
Add OpenEnv blog to landing by @sergiopaniego in https://github.com/huggingface/trl/pull/4333
🗞️ Update "What's New" by @qgallouedec in https://github.com/huggingface/trl/pull/4338
Update Reducing Memory Consumption guide with more details by @sergiopaniego in https://github.com/huggingface/trl/pull/4332
Fixed links inside Tips in docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4360
🔥 docs: Add RapidFire AI integration guide by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4340
Fix paper link for "Towards Efficient and Exact Optimization of Language Model Alignment" by @qgallouedec in https://github.com/huggingface/trl/pull/4409
Migrate experimental trl feature docs by @ethanknights in https://github.com/huggingface/trl/pull/4411
Update SFT QLoRA notebook with 14B model on free Colab by @sergiopaniego in https://github.com/huggingface/trl/pull/4336
Create "Talks" subsection by @sergiopaniego in https://github.com/huggingface/trl/pull/4414
Openenv wordle example by @burtenshaw in https://github.com/huggingface/trl/pull/4357
docs: Remove outdated conversational dataset conversion guidance by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4422
docs: List all trainers that support Liger Kernel by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4432
Add On-Policy Distillation from thinking labs to paper index. by @pramodith in https://github.com/huggingface/trl/pull/4410
Upload notebook with T4 selected by @sergiopaniego in https://github.com/huggingface/trl/pull/4449
Removed outdated warning about batch contamination by @Harras3 in https://github.com/huggingface/trl/pull/4423
Removed Sentiment Tuning Examples by @Harras3 in https://github.com/huggingface/trl/pull/4424
docs: Remove outdated notebooks by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4435
docs: Move Multi-Adapter RL section to PEFT integration by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4436
Update max_length explanation for VLM in online trainers by @sergiopaniego in https://github.com/huggingface/trl/pull/4220
Updated OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4418
add llasa-tutorial by @Deep-unlearning in https://github.com/huggingface/trl/pull/4456

Deprecations

Replace deprecated AutoModelForVision2Seq with AutoModelForImageTextToText by @albertvillanova in https://github.com/huggingface/trl/pull/4353
Replace deprecated list with tuple indexing in PPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4356
Remove liger loss in favor of liger kernel by @sergiopaniego in https://github.com/huggingface/trl/pull/4364
🐍 Drop Python 3.9 by @qgallouedec in https://github.com/huggingface/trl/pull/4183

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4293
Update links to docs in README to latest packaged version by @sergiopaniego in https://github.com/huggingface/trl/pull/4084
🧺 [4/N] Refactor _generate in GRPO/RLOO: Move forward_kwargs outside generation method by @qgallouedec in https://github.com/huggingface/trl/pull/4154
Fix missing CI slow tests: ImportError: vLLM is not installed by @albertvillanova in https://github.com/huggingface/trl/pull/4304
Added SFT LoRA notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4244
⚰️ Remove deprecated by @qgallouedec in https://github.com/huggingface/trl/pull/4301
Silence TRL experimental warnings in CI by @albertvillanova in https://github.com/huggingface/trl/pull/4307
Filter expected setup_chat_format deprecation warning in CI by @albertvillanova in https://github.com/huggingface/trl/pull/4306
[Activation-checkpointing] add tensor dedup and param offloading by @kashif in https://github.com/huggingface/trl/pull/4247
Remove parameterized as test extra dependency by @albertvillanova in https://github.com/huggingface/trl/pull/4315
Update notebooks README with latest additions by @sergiopaniego in https://github.com/huggingface/trl/pull/4316
🚚 Move BCO to trl.experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4312
🧺 [5/N] Refactor _generate in GRPO/RLOO: Insert images in the prompt by @qgallouedec in https://github.com/huggingface/trl/pull/4155
💤 Switch to sleep level=2 and split wake-ups in GRPO and RLOO trainers by @xxrjun in https://github.com/huggingface/trl/pull/4296
Replace unittest skipTest from transformers with pytest.skip by @albertvillanova in https://github.com/huggingface/trl/pull/4297
Add notebooks to Examples docs and restructure by @sergiopaniego in https://github.com/huggingface/trl/pull/4317
Fix attn_implementation name in OnlineDPO for transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4322
🕹️ Add rollout function for OpenEnv integration by @lewtun in https://github.com/huggingface/trl/pull/4310
Highlight OpenEnv in landing docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4327
Update OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4328
Move BCO tests to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4326
Hotfix: Fall back to config.text_config._name_or_path if missing config._name_or_path by @albertvillanova in https://github.com/huggingface/trl/pull/4324
Add OpenEnv blog to landing by @sergiopaniego in https://github.com/huggingface/trl/pull/4333
🗞️ Update "What's New" by @qgallouedec in https://github.com/huggingface/trl/pull/4338
Update Reducing Memory Consumption guide with more details by @sergiopaniego in https://github.com/huggingface/trl/pull/4332
Added custom prepare_model_for_kbit_training to save VRAM by @sergiopaniego in https://github.com/huggingface/trl/pull/4335
[vllm] update comment about communication group host ip by @kashif in https://github.com/huggingface/trl/pull/4337
Fix GRPO and RLOO trainers for continuous batching by @albertvillanova in https://github.com/huggingface/trl/pull/4348
Fixed links inside Tips in docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4360
Fix CI issue for vlm_gemma_3n model by @kaixuanliu in https://github.com/huggingface/trl/pull/4278
Add add_generation_prompt to processor_kwargs in GRPO and RLOO trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4361
Fix: add_generation_prompt=True for conversational only by @qgallouedec in https://github.com/huggingface/trl/pull/4362
Use explicit tiny-Qwen2_5_VL model_id parameter in CI tests by @albertvillanova in https://github.com/huggingface/trl/pull/4325
Move tests of experimental GRPO with replay buffer to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4329
Implement CI test workflow for experimental module by @albertvillanova in https://github.com/huggingface/trl/pull/4330
Replace deprecated AutoModelForVision2Seq with AutoModelForImageTextToText by @albertvillanova in https://github.com/huggingface/trl/pull/4353
Move tests of BCO trainer args to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4354
Remove ignored max_length parameter from PRMTrainer data collator by @albertvillanova in https://github.com/huggingface/trl/pull/4355
Replace deprecated list with tuple indexing in PPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4356
Add support for Trackio completions logging in GRPOTrainer by @taha-yassine in https://github.com/huggingface/trl/pull/4359
Fix add_generation_prompt arg for paged transformers in GRPO and RLOO trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4370
Align make test_experimental with make test by @albertvillanova in https://github.com/huggingface/trl/pull/4371
🔥 docs: Add RapidFire AI integration guide by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4340
👑 [experimental] GOLD Trainer by @kashif in https://github.com/huggingface/trl/pull/4349
Support chat_template_kwargs by @pramodith in https://github.com/huggingface/trl/pull/4350
[GOLD] Set teacher tokenizer name if using ULD loss by @kashif in https://github.com/huggingface/trl/pull/4389
Fix typo in GOLD docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4394
Hotfix CI for Python 3.9 by setting test as xfail until transformers release by @albertvillanova in https://github.com/huggingface/trl/pull/4388
[tests] Update rope_scaling configuration for tiny qwen-vl models by @kashif in https://github.com/huggingface/trl/pull/4405
[GOLD] Update code example for GOLD Trainer by @cmpatino in https://github.com/huggingface/trl/pull/4406
Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation by @albertvillanova in https://github.com/huggingface/trl/pull/4372
Fix paper link for "Towards Efficient and Exact Optimization of Language Model Alignment" by @qgallouedec in https://github.com/huggingface/trl/pull/4409
Migrate experimental trl feature docs by @ethanknights in https://github.com/huggingface/trl/pull/4411
Update SFT QLoRA notebook with 14B model on free Colab by @sergiopaniego in https://github.com/huggingface/trl/pull/4336
Add PAPOTrainer for preference-based optimization by @SolarWindRider in https://github.com/huggingface/trl/pull/4334
Fix GKD Liger memory spike by @qgallouedec in https://github.com/huggingface/trl/pull/4140
Remove liger loss in favor of liger kernel by @sergiopaniego in https://github.com/huggingface/trl/pull/4364
Add license to test file and disable docstyle in GOLD script by @qgallouedec in https://github.com/huggingface/trl/pull/4412
Replace duplicate test with model_id parametrized test by @albertvillanova in https://github.com/huggingface/trl/pull/4415
Fix raising of deprecation warning for liger_loss by @albertvillanova in https://github.com/huggingface/trl/pull/4417
Consolidate slow tests into main test files by @ishitab02 in https://github.com/huggingface/trl/pull/4408
Fix CI experimental tests TypeError for GRPOWithReplayBufferTrainer.update_with_replay_buffer by @albertvillanova in https://github.com/huggingface/trl/pull/4366
Fix GRPO with replay buffer by inserting images in the prompt by @albertvillanova in https://github.com/huggingface/trl/pull/4391
GRPO: ScaleRL -> Support casting LM Head to FP32 by @pramodith in https://github.com/huggingface/trl/pull/4303
Create "Talks" subsection by @sergiopaniego in https://github.com/huggingface/trl/pull/4414
Openenv wordle example by @burtenshaw in https://github.com/huggingface/trl/pull/4357
docs: Remove outdated conversational dataset conversion guidance by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4422
docs: List all trainers that support Liger Kernel by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4432
fix: Remove chat template setting from non-SFT trainer scripts by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4437
Add On-Policy Distillation from thinking labs to paper index. by @pramodith in https://github.com/huggingface/trl/pull/4410
Upload notebook with T4 selected by @sergiopaniego in https://github.com/huggingface/trl/pull/4449
Support casting to fp32 when word embeddings are tied to lm_head by @pramodith in https://github.com/huggingface/trl/pull/4446
Update tokenizer apply_chat_template with return_dict=True default by @albertvillanova in https://github.com/huggingface/trl/pull/4448
Removed outdated warning about batch contamination by @Harras3 in https://github.com/huggingface/trl/pull/4423
🐍 Drop Python 3.9 by @qgallouedec in https://github.com/huggingface/trl/pull/4183
Removed Sentiment Tuning Examples by @Harras3 in https://github.com/huggingface/trl/pull/4424
docs: Remove outdated notebooks by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4435
docs: Move Multi-Adapter RL section to PEFT integration by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4436
Moved masked_mean, masked_var and masked_whiten to ppo_trainer.py by @Harras3 in https://github.com/huggingface/trl/pull/4444
Update max_length explanation for VLM in online trainers by @sergiopaniego in https://github.com/huggingface/trl/pull/4220
[fix] wordle model_id updates by @burtenshaw in https://github.com/huggingface/trl/pull/4453
Updated OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4418
add llasa-tutorial by @Deep-unlearning in https://github.com/huggingface/trl/pull/4456
💬 Add chat to vLLM client and server, update trainer calls by @qgallouedec in https://github.com/huggingface/trl/pull/4450
[GFPO] fix the GFPO loss calculation error caused by unmodified old_per_token_logps by @Peter-Chou in https://github.com/huggingface/trl/pull/4454
🖼️ Fix reporting images with vLLM by @qgallouedec in https://github.com/huggingface/trl/pull/4476
Release: v0.25 by @qgallouedec in https://github.com/huggingface/trl/pull/4478

New Contributors

@xxrjun made their first contribution in https://github.com/huggingface/trl/pull/4296
@taha-yassine made their first contribution in https://github.com/huggingface/trl/pull/4359
@kamran-rapidfireAI made their first contribution in https://github.com/huggingface/trl/pull/4340
@ethanknights made their first contribution in https://github.com/huggingface/trl/pull/4411
@SolarWindRider made their first contribution in https://github.com/huggingface/trl/pull/4334
@ishitab02 made their first contribution in https://github.com/huggingface/trl/pull/4408
@Harras3 made their first contribution in https://github.com/huggingface/trl/pull/4423
@Deep-unlearning made their first contribution in https://github.com/huggingface/trl/pull/4456

Full Changelog: https://github.com/huggingface/trl/compare/v0.24.0...v0.25.0

Features

Experimental

Fixes

Documentation and Examples

Deprecations

What's Changed

New Contributors

More from Hugging Face

More from Hugging Face