DDPO for diffusion models

We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models. Read more about it directly in the docs.

Before	After DDPO finetuning
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_squirrel.png"/></div>	<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_squirrel.png"/></div>
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_starfish.png"/></div>	<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_starfish.png"/></div>

Denoising Diffusion Policy Optimization by @metric-space in https://github.com/huggingface/trl/pull/508

Bug fixes and other enhancements

The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below

What's Changed

Release: v0.5.0 by @younesbelkada in https://github.com/huggingface/trl/pull/607
Set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/608
[Modeling] Add token support for hf_hub_download by @younesbelkada in https://github.com/huggingface/trl/pull/604
Add docs explaining logged metrics by @vwxyzjn in https://github.com/huggingface/trl/pull/616
[DPO] stack-llama-2 training scripts by @kashif in https://github.com/huggingface/trl/pull/611
Use log_with argument in SFT example by @hitorilabs in https://github.com/huggingface/trl/pull/620
Allow already tokenized sequences for response_template in DataCollatorForCompletionOnlyLM by @ivsanro1 in https://github.com/huggingface/trl/pull/622
Improve docs by @lvwerra in https://github.com/huggingface/trl/pull/612
Move repo by @lvwerra in https://github.com/huggingface/trl/pull/628
Add score scaling/normalization/clipping by @zfang in https://github.com/huggingface/trl/pull/560
Disable dropout in DPO Training by @NouamaneTazi in https://github.com/huggingface/trl/pull/639
Add checks on backward batch size by @vwxyzjn in https://github.com/huggingface/trl/pull/651
Resolve various typos throughout the docs by @tomaarsen in https://github.com/huggingface/trl/pull/654
Update README.md by @Santosh-Gupta in https://github.com/huggingface/trl/pull/657
Allow for ref_model=None in DPOTrainer by @vincentmin in https://github.com/huggingface/trl/pull/640
Add more args to SFT example by @photomz in https://github.com/huggingface/trl/pull/642
Handle potentially long sequences with DataCollatorForCompletionOnlyLM by @tannonk in https://github.com/huggingface/trl/pull/644
[sft_llama2] Add check of arguments by @younesbelkada in https://github.com/huggingface/trl/pull/660
Fix DPO blogpost thumbnail by @lvwerra in https://github.com/huggingface/trl/pull/673
propagating eval_batch_size to TrainingArguments by @rahuljha in https://github.com/huggingface/trl/pull/675
[CI] Fix unmutable TrainingArguments issue by @younesbelkada in https://github.com/huggingface/trl/pull/676
Update sft_llama2.py by @msaad02 in https://github.com/huggingface/trl/pull/678
fix PeftConfig loading from a remote repo. by @w32zhong in https://github.com/huggingface/trl/pull/649
Simplify immutable TrainingArgs fix using dataclasses.replace by @tomaarsen in https://github.com/huggingface/trl/pull/682

New Contributors

@hitorilabs made their first contribution in https://github.com/huggingface/trl/pull/620
@ivsanro1 made their first contribution in https://github.com/huggingface/trl/pull/622
@zfang made their first contribution in https://github.com/huggingface/trl/pull/560
@NouamaneTazi made their first contribution in https://github.com/huggingface/trl/pull/639
@Santosh-Gupta made their first contribution in https://github.com/huggingface/trl/pull/657
@vincentmin made their first contribution in https://github.com/huggingface/trl/pull/640
@photomz made their first contribution in https://github.com/huggingface/trl/pull/642
@tannonk made their first contribution in https://github.com/huggingface/trl/pull/644
@rahuljha made their first contribution in https://github.com/huggingface/trl/pull/675
@msaad02 made their first contribution in https://github.com/huggingface/trl/pull/678
@w32zhong made their first contribution in https://github.com/huggingface/trl/pull/649

Full Changelog: https://github.com/huggingface/trl/compare/v0.5.0...v0.6.0