v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models
We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:
We also fixed important bugs with respect to DPO / PEFT and Flash Attention
DPOTrainer] Fix DPO trainer + mistral + FA2 by @younesbelkada in https://github.com/huggingface/trl/pull/1290Data processing is now faster for multi-GPU envs
DPOTrainer] Load data only on main process + fix dpo example test by @younesbelkada in https://github.com/huggingface/trl/pull/1291Other DPO bugfixes:
PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in https://github.com/huggingface/trl/pull/1289Models now gets tagged correctly even if users do not call trainer.push_to_hub()
core / xxxTrainer] Automatic tagging by @younesbelkada in https://github.com/huggingface/trl/pull/1329DPOTrainer docstrings by @alvarobartt in https://github.com/huggingface/trl/pull/1298core / DDPO] Fix diffusers import issue by @younesbelkada in https://github.com/huggingface/trl/pull/1314CI] Add tests on transformers peft main on push main by @younesbelkada in https://github.com/huggingface/trl/pull/1328Full Changelog: https://github.com/huggingface/trl/compare/v0.7.10...v0.7.11
Fetched April 7, 2026