v0.9.3 RLOO / PPOv2 Trainer, RM Visualization
We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:
https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84
SFTTrainer] Add warning in SFTTrainer when dataset already processed by @younesbelkada in https://github.com/huggingface/trl/pull/1577SftArgumentParser by @younesbelkada in https://github.com/huggingface/trl/pull/1602KTOTrainer] add BCO (reward shift and underlying distribution matching) by @seanexp in https://github.com/huggingface/trl/pull/1599tests/ from package data by @jamesbraza in https://github.com/huggingface/trl/pull/1607evaluation_strategy by @muellerzr in https://github.com/huggingface/trl/pull/1559enable_input_require_grads issues with PPO models by @younesbelkada in https://github.com/huggingface/trl/pull/1664args=None by @younesbelkada in https://github.com/huggingface/trl/pull/1678Full Changelog: https://github.com/huggingface/trl/compare/v0.8.6...v0.9.2
Fetched April 7, 2026