In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer
SFTTrainer:Docs] Fix sft mistakes by @younesbelkada in https://github.com/huggingface/trl/pull/717core] Bump peft to 0.4.0 by @younesbelkada in https://github.com/huggingface/trl/pull/720SFTTrainer] Check correctly for condition by @younesbelkada in https://github.com/huggingface/trl/pull/668core] Fix import of randn_tensor by @younesbelkada in https://github.com/huggingface/trl/pull/751prepare_model_for_kbit_training by @mnoukhov in https://github.com/huggingface/trl/pull/728PPOTrainer by @davidberenstein1957 in https://github.com/huggingface/trl/pull/665RewardConfig is backwards compatible by @lewtun in https://github.com/huggingface/trl/pull/748log_with argument by @filippobistaffa in https://github.com/huggingface/trl/pull/792DPO] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in https://github.com/huggingface/trl/pull/799Docs] Clarify PEFT docs by @younesbelkada in https://github.com/huggingface/trl/pull/797PPOTrainer] Fixes ppo trainer generate nit by @younesbelkada in https://github.com/huggingface/trl/pull/798create_reference_model() when ZeRO-3 is enabled by @lewtun in https://github.com/huggingface/trl/pull/840lewtun power by @lvwerra in https://github.com/huggingface/trl/pull/856core] Fix import issues by @younesbelkada in https://github.com/huggingface/trl/pull/859Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2
Fetched April 7, 2026