v0.25.0
Features
- ๐ค Switch to sleep level=2 and split wake-ups in GRPO and RLOO trainers by @xxrjun in https://github.com/huggingface/trl/pull/4296
- Added custom
prepare_model_for_kbit_trainingto save VRAM by @sergiopaniego in https://github.com/huggingface/trl/pull/4335 - Add
add_generation_promptto processor_kwargs in GRPO and RLOO trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4361 - Add support for Trackio completions logging in GRPOTrainer by @taha-yassine in https://github.com/huggingface/trl/pull/4359
- Support chat_template_kwargs by @pramodith in https://github.com/huggingface/trl/pull/4350
- GRPO: ScaleRL -> Support casting LM Head to FP32 by @pramodith in https://github.com/huggingface/trl/pull/4303
- Support casting to fp32 when word embeddings are tied to lm_head by @pramodith in https://github.com/huggingface/trl/pull/4446
- ๐ฌ Add chat to vLLM client and server, update trainer calls by @qgallouedec in https://github.com/huggingface/trl/pull/4450
Experimental
- ๐ Move BCO to
trl.experimentalby @qgallouedec in https://github.com/huggingface/trl/pull/4312 - ๐ [experimental] GOLD Trainer by @kashif in https://github.com/huggingface/trl/pull/4349
- Add PAPOTrainer for preference-based optimization by @SolarWindRider in https://github.com/huggingface/trl/pull/4334
- [GFPO] fix the GFPO loss calculation error caused by unmodified old_per_token_logps by @Peter-Chou in https://github.com/huggingface/trl/pull/4454
- ๐น๏ธ Add rollout function for OpenEnv integration by @lewtun in https://github.com/huggingface/trl/pull/4310
Fixes
- [Activation-checkpointing] add tensor dedup and param offloading by @kashif in https://github.com/huggingface/trl/pull/4247
- Fix attn_implementation name in OnlineDPO for transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4322
- Hotfix: Fall back to config.text_config._name_or_path if missing config._name_or_path by @albertvillanova in https://github.com/huggingface/trl/pull/4324
- Fix GRPO and RLOO trainers for continuous batching by @albertvillanova in https://github.com/huggingface/trl/pull/4348
- Fix:
add_generation_prompt=Truefor conversational only by @qgallouedec in https://github.com/huggingface/trl/pull/4362 - Remove ignored max_length parameter from PRMTrainer data collator by @albertvillanova in https://github.com/huggingface/trl/pull/4355
- Fix add_generation_prompt arg for paged transformers in GRPO and RLOO trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4370
- Fix GKD Liger memory spike by @qgallouedec in https://github.com/huggingface/trl/pull/4140
- Fix GRPO with replay buffer by inserting images in the prompt by @albertvillanova in https://github.com/huggingface/trl/pull/4391
- fix: Remove chat template setting from non-SFT trainer scripts by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4437
- ๐ผ๏ธ Fix reporting images with vLLM by @qgallouedec in https://github.com/huggingface/trl/pull/4476
Documentation and Examples
- Added SFT LoRA notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4244
- Update notebooks README with latest additions by @sergiopaniego in https://github.com/huggingface/trl/pull/4316
- Add notebooks to Examples docs and restructure by @sergiopaniego in https://github.com/huggingface/trl/pull/4317
- Highlight OpenEnv in landing docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4327
- Update OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4328
- Add OpenEnv blog to landing by @sergiopaniego in https://github.com/huggingface/trl/pull/4333
- ๐๏ธ Update "What's New" by @qgallouedec in https://github.com/huggingface/trl/pull/4338
- Update Reducing Memory Consumption guide with more details by @sergiopaniego in https://github.com/huggingface/trl/pull/4332
- Fixed links inside Tips in docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4360
- ๐ฅ docs: Add RapidFire AI integration guide by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4340
- Fix paper link for "Towards Efficient and Exact Optimization of Language Model Alignment" by @qgallouedec in https://github.com/huggingface/trl/pull/4409
- Migrate experimental trl feature docs by @ethanknights in https://github.com/huggingface/trl/pull/4411
- Update SFT QLoRA notebook with 14B model on free Colab by @sergiopaniego in https://github.com/huggingface/trl/pull/4336
- Create "Talks" subsection by @sergiopaniego in https://github.com/huggingface/trl/pull/4414
- Openenv wordle example by @burtenshaw in https://github.com/huggingface/trl/pull/4357
- docs: Remove outdated conversational dataset conversion guidance by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4422
- docs: List all trainers that support Liger Kernel by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4432
- Add On-Policy Distillation from thinking labs to paper index. by @pramodith in https://github.com/huggingface/trl/pull/4410
- Upload notebook with T4 selected by @sergiopaniego in https://github.com/huggingface/trl/pull/4449
- Removed outdated warning about batch contamination by @Harras3 in https://github.com/huggingface/trl/pull/4423
- Removed Sentiment Tuning Examples by @Harras3 in https://github.com/huggingface/trl/pull/4424
- docs: Remove outdated notebooks by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4435
- docs: Move Multi-Adapter RL section to PEFT integration by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4436
- Update
max_lengthexplanation for VLM in online trainers by @sergiopaniego in https://github.com/huggingface/trl/pull/4220 - Updated OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4418
- add llasa-tutorial by @Deep-unlearning in https://github.com/huggingface/trl/pull/4456
Deprecations
- Replace deprecated AutoModelForVision2Seq with AutoModelForImageTextToText by @albertvillanova in https://github.com/huggingface/trl/pull/4353
- Replace deprecated list with tuple indexing in PPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4356
- Remove liger loss in favor of liger kernel by @sergiopaniego in https://github.com/huggingface/trl/pull/4364
- ๐ Drop Python 3.9 by @qgallouedec in https://github.com/huggingface/trl/pull/4183
What's Changed
- โฌ๏ธ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/4293
- Update links to docs in README to latest packaged version by @sergiopaniego in https://github.com/huggingface/trl/pull/4084
- ๐งบ [4/N] Refactor
_generatein GRPO/RLOO: Moveforward_kwargsoutside generation method by @qgallouedec in https://github.com/huggingface/trl/pull/4154 - Fix missing CI slow tests: ImportError: vLLM is not installed by @albertvillanova in https://github.com/huggingface/trl/pull/4304
- Added SFT LoRA notebook by @sergiopaniego in https://github.com/huggingface/trl/pull/4244
- โฐ๏ธ Remove deprecated by @qgallouedec in https://github.com/huggingface/trl/pull/4301
- Silence TRL experimental warnings in CI by @albertvillanova in https://github.com/huggingface/trl/pull/4307
- Filter expected setup_chat_format deprecation warning in CI by @albertvillanova in https://github.com/huggingface/trl/pull/4306
- [Activation-checkpointing] add tensor dedup and param offloading by @kashif in https://github.com/huggingface/trl/pull/4247
- Remove parameterized as test extra dependency by @albertvillanova in https://github.com/huggingface/trl/pull/4315
- Update notebooks README with latest additions by @sergiopaniego in https://github.com/huggingface/trl/pull/4316
- ๐ Move BCO to
trl.experimentalby @qgallouedec in https://github.com/huggingface/trl/pull/4312 - ๐งบ [5/N] Refactor
_generatein GRPO/RLOO: Insert images in the prompt by @qgallouedec in https://github.com/huggingface/trl/pull/4155 - ๐ค Switch to sleep level=2 and split wake-ups in GRPO and RLOO trainers by @xxrjun in https://github.com/huggingface/trl/pull/4296
- Replace unittest skipTest from transformers with pytest.skip by @albertvillanova in https://github.com/huggingface/trl/pull/4297
- Add notebooks to Examples docs and restructure by @sergiopaniego in https://github.com/huggingface/trl/pull/4317
- Fix attn_implementation name in OnlineDPO for transformers v5 by @albertvillanova in https://github.com/huggingface/trl/pull/4322
- ๐น๏ธ Add rollout function for OpenEnv integration by @lewtun in https://github.com/huggingface/trl/pull/4310
- Highlight OpenEnv in landing docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4327
- Update OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4328
- Move BCO tests to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4326
- Hotfix: Fall back to config.text_config._name_or_path if missing config._name_or_path by @albertvillanova in https://github.com/huggingface/trl/pull/4324
- Add OpenEnv blog to landing by @sergiopaniego in https://github.com/huggingface/trl/pull/4333
- ๐๏ธ Update "What's New" by @qgallouedec in https://github.com/huggingface/trl/pull/4338
- Update Reducing Memory Consumption guide with more details by @sergiopaniego in https://github.com/huggingface/trl/pull/4332
- Added custom
prepare_model_for_kbit_trainingto save VRAM by @sergiopaniego in https://github.com/huggingface/trl/pull/4335 - [vllm] update comment about communication group host ip by @kashif in https://github.com/huggingface/trl/pull/4337
- Fix GRPO and RLOO trainers for continuous batching by @albertvillanova in https://github.com/huggingface/trl/pull/4348
- Fixed links inside Tips in docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4360
- Fix CI issue for vlm_gemma_3n model by @kaixuanliu in https://github.com/huggingface/trl/pull/4278
- Add
add_generation_promptto processor_kwargs in GRPO and RLOO trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4361 - Fix:
add_generation_prompt=Truefor conversational only by @qgallouedec in https://github.com/huggingface/trl/pull/4362 - Use explicit tiny-Qwen2_5_VL model_id parameter in CI tests by @albertvillanova in https://github.com/huggingface/trl/pull/4325
- Move tests of experimental GRPO with replay buffer to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4329
- Implement CI test workflow for experimental module by @albertvillanova in https://github.com/huggingface/trl/pull/4330
- Replace deprecated AutoModelForVision2Seq with AutoModelForImageTextToText by @albertvillanova in https://github.com/huggingface/trl/pull/4353
- Move tests of BCO trainer args to tests/experimental by @albertvillanova in https://github.com/huggingface/trl/pull/4354
- Remove ignored max_length parameter from PRMTrainer data collator by @albertvillanova in https://github.com/huggingface/trl/pull/4355
- Replace deprecated list with tuple indexing in PPOTrainer by @albertvillanova in https://github.com/huggingface/trl/pull/4356
- Add support for Trackio completions logging in GRPOTrainer by @taha-yassine in https://github.com/huggingface/trl/pull/4359
- Fix add_generation_prompt arg for paged transformers in GRPO and RLOO trainers by @albertvillanova in https://github.com/huggingface/trl/pull/4370
- Align make test_experimental with make test by @albertvillanova in https://github.com/huggingface/trl/pull/4371
- ๐ฅ docs: Add RapidFire AI integration guide by @kamran-rapidfireAI in https://github.com/huggingface/trl/pull/4340
- ๐ [experimental] GOLD Trainer by @kashif in https://github.com/huggingface/trl/pull/4349
- Support chat_template_kwargs by @pramodith in https://github.com/huggingface/trl/pull/4350
- [GOLD] Set teacher tokenizer name if using ULD loss by @kashif in https://github.com/huggingface/trl/pull/4389
- Fix typo in GOLD docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4394
- Hotfix CI for Python 3.9 by setting test as xfail until transformers release by @albertvillanova in https://github.com/huggingface/trl/pull/4388
- [tests] Update rope_scaling configuration for tiny qwen-vl models by @kashif in https://github.com/huggingface/trl/pull/4405
- [GOLD] Update code example for GOLD Trainer by @cmpatino in https://github.com/huggingface/trl/pull/4406
- Hotfix CI with dev dependencies: xfail test_prepare_inputs_for_generation by @albertvillanova in https://github.com/huggingface/trl/pull/4372
- Fix paper link for "Towards Efficient and Exact Optimization of Language Model Alignment" by @qgallouedec in https://github.com/huggingface/trl/pull/4409
- Migrate experimental trl feature docs by @ethanknights in https://github.com/huggingface/trl/pull/4411
- Update SFT QLoRA notebook with 14B model on free Colab by @sergiopaniego in https://github.com/huggingface/trl/pull/4336
- Add PAPOTrainer for preference-based optimization by @SolarWindRider in https://github.com/huggingface/trl/pull/4334
- Fix GKD Liger memory spike by @qgallouedec in https://github.com/huggingface/trl/pull/4140
- Remove liger loss in favor of liger kernel by @sergiopaniego in https://github.com/huggingface/trl/pull/4364
- Add license to test file and disable docstyle in GOLD script by @qgallouedec in https://github.com/huggingface/trl/pull/4412
- Replace duplicate test with model_id parametrized test by @albertvillanova in https://github.com/huggingface/trl/pull/4415
- Fix raising of deprecation warning for liger_loss by @albertvillanova in https://github.com/huggingface/trl/pull/4417
- Consolidate slow tests into main test files by @ishitab02 in https://github.com/huggingface/trl/pull/4408
- Fix CI experimental tests TypeError for GRPOWithReplayBufferTrainer.update_with_replay_buffer by @albertvillanova in https://github.com/huggingface/trl/pull/4366
- Fix GRPO with replay buffer by inserting images in the prompt by @albertvillanova in https://github.com/huggingface/trl/pull/4391
- GRPO: ScaleRL -> Support casting LM Head to FP32 by @pramodith in https://github.com/huggingface/trl/pull/4303
- Create "Talks" subsection by @sergiopaniego in https://github.com/huggingface/trl/pull/4414
- Openenv wordle example by @burtenshaw in https://github.com/huggingface/trl/pull/4357
- docs: Remove outdated conversational dataset conversion guidance by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4422
- docs: List all trainers that support Liger Kernel by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4432
- fix: Remove chat template setting from non-SFT trainer scripts by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4437
- Add On-Policy Distillation from thinking labs to paper index. by @pramodith in https://github.com/huggingface/trl/pull/4410
- Upload notebook with T4 selected by @sergiopaniego in https://github.com/huggingface/trl/pull/4449
- Support casting to fp32 when word embeddings are tied to lm_head by @pramodith in https://github.com/huggingface/trl/pull/4446
- Update tokenizer apply_chat_template with return_dict=True default by @albertvillanova in https://github.com/huggingface/trl/pull/4448
- Removed outdated warning about batch contamination by @Harras3 in https://github.com/huggingface/trl/pull/4423
- ๐ Drop Python 3.9 by @qgallouedec in https://github.com/huggingface/trl/pull/4183
- Removed Sentiment Tuning Examples by @Harras3 in https://github.com/huggingface/trl/pull/4424
- docs: Remove outdated notebooks by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4435
- docs: Move Multi-Adapter RL section to PEFT integration by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4436
- Moved masked_mean, masked_var and masked_whiten to ppo_trainer.py by @Harras3 in https://github.com/huggingface/trl/pull/4444
- Update
max_lengthexplanation for VLM in online trainers by @sergiopaniego in https://github.com/huggingface/trl/pull/4220 - [fix] wordle model_id updates by @burtenshaw in https://github.com/huggingface/trl/pull/4453
- Updated OpenEnv docs by @sergiopaniego in https://github.com/huggingface/trl/pull/4418
- add llasa-tutorial by @Deep-unlearning in https://github.com/huggingface/trl/pull/4456
- ๐ฌ Add chat to vLLM client and server, update trainer calls by @qgallouedec in https://github.com/huggingface/trl/pull/4450
- [GFPO] fix the GFPO loss calculation error caused by unmodified old_per_token_logps by @Peter-Chou in https://github.com/huggingface/trl/pull/4454
- ๐ผ๏ธ Fix reporting images with vLLM by @qgallouedec in https://github.com/huggingface/trl/pull/4476
- Release: v0.25 by @qgallouedec in https://github.com/huggingface/trl/pull/4478
New Contributors
- @xxrjun made their first contribution in https://github.com/huggingface/trl/pull/4296
- @taha-yassine made their first contribution in https://github.com/huggingface/trl/pull/4359
- @kamran-rapidfireAI made their first contribution in https://github.com/huggingface/trl/pull/4340
- @ethanknights made their first contribution in https://github.com/huggingface/trl/pull/4411
- @SolarWindRider made their first contribution in https://github.com/huggingface/trl/pull/4334
- @ishitab02 made their first contribution in https://github.com/huggingface/trl/pull/4408
- @Harras3 made their first contribution in https://github.com/huggingface/trl/pull/4423
- @Deep-unlearning made their first contribution in https://github.com/huggingface/trl/pull/4456
Full Changelog: https://github.com/huggingface/trl/compare/v0.24.0...v0.25.0
Fetched April 7, 2026
