We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models. Read more about it directly in the docs.
| Before | After DDPO finetuning |
|---|---|
| <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_squirrel.png"/></div> | <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_squirrel.png"/></div> |
| <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_starfish.png"/></div> | <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_starfish.png"/></div> |
The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below
Modeling] Add token support for hf_hub_download by @younesbelkada in https://github.com/huggingface/trl/pull/604response_template in DataCollatorForCompletionOnlyLM by @ivsanro1 in https://github.com/huggingface/trl/pull/622sft_llama2] Add check of arguments by @younesbelkada in https://github.com/huggingface/trl/pull/660CI] Fix unmutable TrainingArguments issue by @younesbelkada in https://github.com/huggingface/trl/pull/676dataclasses.replace by @tomaarsen in https://github.com/huggingface/trl/pull/682Full Changelog: https://github.com/huggingface/trl/compare/v0.5.0...v0.6.0
Fetched April 7, 2026