v0.9.6 — TRL — releases.sh

We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:

Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input loss_type="simpo" and cpo_alpha=0 in the CPOConfig and use it with the CPOTrainer.
<img width="880" alt="image" src="https://github.com/huggingface/trl/assets/5555347/87551147-3f58-4c6a-9a78-70b513dea76e">
Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
Added Efficient Exact Optimization (EXO) by @haozheji

We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1710
Add a variant of CPO, SimPO by @fe1ixxu in https://github.com/huggingface/trl/pull/1703
[RPO] fix nll loss by @kashif in https://github.com/huggingface/trl/pull/1705
fix yaml parser for derived config classes by @mnoukhov in https://github.com/huggingface/trl/pull/1713
Fix default padding_value in dpo_config.py by @mnoukhov in https://github.com/huggingface/trl/pull/1692
feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/trl/pull/1721
ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in https://github.com/huggingface/trl/pull/1724
adds AOT by @imelnyk in https://github.com/huggingface/trl/pull/1701
Workflow: Notify tests results on slack channel by @younesbelkada in https://github.com/huggingface/trl/pull/1744
better trl parser with yaml config by @mnoukhov in https://github.com/huggingface/trl/pull/1739
CI / core: Pin numpy to !=2.0.0 for CI and to users by @younesbelkada in https://github.com/huggingface/trl/pull/1747
TrlParser: Add ignore extra args option by @younesbelkada in https://github.com/huggingface/trl/pull/1748
small KTO fixes by @kawine in https://github.com/huggingface/trl/pull/1734
CPO / DPO: Fix red CI by @younesbelkada in https://github.com/huggingface/trl/pull/1749
prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in https://github.com/huggingface/trl/pull/1728
CI / KTOTrainer: Remove old tests by @younesbelkada in https://github.com/huggingface/trl/pull/1750
change the process function in the example of DPO by @AIR-hl in https://github.com/huggingface/trl/pull/1753
Integrate f-divergence to DPO (Follow up) by @1485840691 in https://github.com/huggingface/trl/pull/1610
Support for returning past_key_values from the model by @idanshen in https://github.com/huggingface/trl/pull/1742
Fix masking of response tokens by @mertsayar8 in https://github.com/huggingface/trl/pull/1718
Support num_train_epochs by @vwxyzjn in https://github.com/huggingface/trl/pull/1743
Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in https://github.com/huggingface/trl/pull/1758
New sentiment and descriptiveness dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1757
Add CPO-SimPO method by @fe1ixxu in https://github.com/huggingface/trl/pull/1760
Added Reward Backpropogation Support by @mihirp1998 in https://github.com/huggingface/trl/pull/1585
MoE Models: option to add load balancing loss by @claralp in https://github.com/huggingface/trl/pull/1765
evaluation_strategy to eval_strategy by @qgallouedec in https://github.com/huggingface/trl/pull/1771
add Efficient Exact Optimization (EXO) by @haozheji in https://github.com/huggingface/trl/pull/1735
Remove the leading space in the tldr preference dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1773
Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in https://github.com/huggingface/trl/pull/1774
Visual DPO by @qgallouedec in https://github.com/huggingface/trl/pull/1647
[DOCS] fix docs and cli example script by @kashif in https://github.com/huggingface/trl/pull/1780
Fixed typo in SFT trainer docs by @detsutut in https://github.com/huggingface/trl/pull/1788
[SFT] add model_init_kwargs to training_args by @kashif in https://github.com/huggingface/trl/pull/1787
Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in https://github.com/huggingface/trl/pull/1794
Clean examples by @qgallouedec in https://github.com/huggingface/trl/pull/1791
Remove extra print in reward_trainer.py by @mnoukhov in https://github.com/huggingface/trl/pull/1799
Fix torch_dtype handling in {DPO,SFT}Trainer when provided via CLI by @alvarobartt in https://github.com/huggingface/trl/pull/1807
Fix TRL_USE_RICH environment variable handling by @alvarobartt in https://github.com/huggingface/trl/pull/1808
0.9.6 release by @vwxyzjn in https://github.com/huggingface/trl/pull/1816

New Contributors

@McPatate made their first contribution in https://github.com/huggingface/trl/pull/1721
@jetlime made their first contribution in https://github.com/huggingface/trl/pull/1724
@imelnyk made their first contribution in https://github.com/huggingface/trl/pull/1701
@AIR-hl made their first contribution in https://github.com/huggingface/trl/pull/1753
@1485840691 made their first contribution in https://github.com/huggingface/trl/pull/1610
@idanshen made their first contribution in https://github.com/huggingface/trl/pull/1742
@mertsayar8 made their first contribution in https://github.com/huggingface/trl/pull/1718
@scottsuk0306 made their first contribution in https://github.com/huggingface/trl/pull/1758
@mihirp1998 made their first contribution in https://github.com/huggingface/trl/pull/1585
@haozheji made their first contribution in https://github.com/huggingface/trl/pull/1735
@Mubin17 made their first contribution in https://github.com/huggingface/trl/pull/1774
@detsutut made their first contribution in https://github.com/huggingface/trl/pull/1788
@noahlt made their first contribution in https://github.com/huggingface/trl/pull/1794

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.4...v0.9.6