v0.9.6 release
We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:
Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input loss_type="simpo" and cpo_alpha=0 in the CPOConfig and use it with the CPOTrainer.
Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
Added Efficient Exact Optimization (EXO) by @haozheji
We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!
numpy to !=2.0.0 for CI and to users by @younesbelkada in https://github.com/huggingface/trl/pull/1747TrlParser: Add ignore extra args option by @younesbelkada in https://github.com/huggingface/trl/pull/1748KTOTrainer: Remove old tests by @younesbelkada in https://github.com/huggingface/trl/pull/1750process function in the example of DPO by @AIR-hl in https://github.com/huggingface/trl/pull/1753evaluation_strategy to eval_strategy by @qgallouedec in https://github.com/huggingface/trl/pull/1771torch_dtype handling in {DPO,SFT}Trainer when provided via CLI by @alvarobartt in https://github.com/huggingface/trl/pull/1807TRL_USE_RICH environment variable handling by @alvarobartt in https://github.com/huggingface/trl/pull/1808Full Changelog: https://github.com/huggingface/trl/compare/v0.9.4...v0.9.6
Fetched April 7, 2026