0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with `SFTTrainer`:

Update sft_trainer.mdx to highlight Flash Attention features by @younesbelkada in https://github.com/huggingface/trl/pull/807

What's Changed

Release: v0.7.1 by @younesbelkada in https://github.com/huggingface/trl/pull/709
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/710
fix device issue by @backpropper in https://github.com/huggingface/trl/pull/681
Update docs on gms8k by @vwxyzjn in https://github.com/huggingface/trl/pull/711
[Docs] Fix sft mistakes by @younesbelkada in https://github.com/huggingface/trl/pull/717
Fix: RuntimeError: 'weight' must be 2-D issue by @jp1924 in https://github.com/huggingface/trl/pull/687
Add pyproject.toml by @mnoukhov in https://github.com/huggingface/trl/pull/690
[core] Bump peft to 0.4.0 by @younesbelkada in https://github.com/huggingface/trl/pull/720
Refactor RewardTrainer hyperparameters into dedicated dataclass by @lewtun in https://github.com/huggingface/trl/pull/726
Fix DeepSpeed ZeRO-3 in PPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/730
[SFTTrainer] Check correctly for condition by @younesbelkada in https://github.com/huggingface/trl/pull/668
Add epsilon to score normalization by @zfang in https://github.com/huggingface/trl/pull/727
Enable gradient checkpointing to be disabled for reward modelling by @lewtun in https://github.com/huggingface/trl/pull/725
[DPO] fixed metrics typo by @kashif in https://github.com/huggingface/trl/pull/743
Seq2Seq model support for DPO by @gaetanlop in https://github.com/huggingface/trl/pull/586
[DPO] fix ref_model by @i4never in https://github.com/huggingface/trl/pull/745
[core] Fix import of randn_tensor by @younesbelkada in https://github.com/huggingface/trl/pull/751
Add benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/752
update to prepare_model_for_kbit_training by @mnoukhov in https://github.com/huggingface/trl/pull/728
benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/755
EOS token processing for multi-turn DPO by @natolambert in https://github.com/huggingface/trl/pull/741
Extend DeepSpeed integration to ZeRO-{1,2,3} by @lewtun in https://github.com/huggingface/trl/pull/758
Imrpove benchmark ci by @vwxyzjn in https://github.com/huggingface/trl/pull/760
[PPOTrainer] - add comment of zero masking (from second query token) by @zuoxingdong in https://github.com/huggingface/trl/pull/763
Refactor and benchmark by @vwxyzjn in https://github.com/huggingface/trl/pull/662
Benchmark CI (actual) by @vwxyzjn in https://github.com/huggingface/trl/pull/754
docs: add initial version of docs for PPOTrainer by @davidberenstein1957 in https://github.com/huggingface/trl/pull/665
Support fork in benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/764
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/773
Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/775
Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/776
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/777
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/778
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/779
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/780
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/781
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/782
Ensure RewardConfig is backwards compatible by @lewtun in https://github.com/huggingface/trl/pull/748
Temp benchmark ci dir by @vwxyzjn in https://github.com/huggingface/trl/pull/765
Changed the default value of the log_with argument by @filippobistaffa in https://github.com/huggingface/trl/pull/792
Add default Optim to DPO example by @natolambert in https://github.com/huggingface/trl/pull/759
Add margin to RM training by @jvhoffbauer in https://github.com/huggingface/trl/pull/719
[DPO] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in https://github.com/huggingface/trl/pull/799
Add deepspeed experiment by @vwxyzjn in https://github.com/huggingface/trl/pull/795
[Docs] Clarify PEFT docs by @younesbelkada in https://github.com/huggingface/trl/pull/797
Fix docs bug on sft_trainer.mdx by @younesbelkada in https://github.com/huggingface/trl/pull/808
[PPOTrainer] Fixes ppo trainer generate nit by @younesbelkada in https://github.com/huggingface/trl/pull/798
Allow passing the token_ids as instruction_template in DataCollatorForCompletionOnlyLM by @devxpy in https://github.com/huggingface/trl/pull/749
init custom eval loop for further DPO evals by @natolambert in https://github.com/huggingface/trl/pull/766
Add RMSProp back to DPO by @natolambert in https://github.com/huggingface/trl/pull/821
[DPO] add option for compute_metrics in DPOTrainer by @kashif in https://github.com/huggingface/trl/pull/822
Small fixes to the PPO trainer doc and script. by @namin in https://github.com/huggingface/trl/pull/811
Unify sentiment documentation by @vwxyzjn in https://github.com/huggingface/trl/pull/803
Fix DeepSpeed ZeRO-{1,2} for DPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/825
Set trust remote code to false by default by @lewtun in https://github.com/huggingface/trl/pull/833
[MINOR:TYPOS] Update README.md by @cakiki in https://github.com/huggingface/trl/pull/829
Clarify docstrings, help messages, assert messages in merge_peft_adapter.py by @larekrow in https://github.com/huggingface/trl/pull/838
add DDPO to index by @lvwerra in https://github.com/huggingface/trl/pull/826
Raise error in create_reference_model() when ZeRO-3 is enabled by @lewtun in https://github.com/huggingface/trl/pull/840
Use uniform config by @vwxyzjn in https://github.com/huggingface/trl/pull/817
Give lewtun power by @lvwerra in https://github.com/huggingface/trl/pull/856
Standardise example scripts by @lewtun in https://github.com/huggingface/trl/pull/842
Fix version check in import_utils.py by @adampauls in https://github.com/huggingface/trl/pull/853
dont use get_peft_model if model is already peft by @abhishekkrthakur in https://github.com/huggingface/trl/pull/857
[core] Fix import issues by @younesbelkada in https://github.com/huggingface/trl/pull/859
Support both old and new diffusers import path by @osanseviero in https://github.com/huggingface/trl/pull/843

New Contributors

@backpropper made their first contribution in https://github.com/huggingface/trl/pull/681
@jp1924 made their first contribution in https://github.com/huggingface/trl/pull/687
@i4never made their first contribution in https://github.com/huggingface/trl/pull/745
@zuoxingdong made their first contribution in https://github.com/huggingface/trl/pull/763
@davidberenstein1957 made their first contribution in https://github.com/huggingface/trl/pull/665
@filippobistaffa made their first contribution in https://github.com/huggingface/trl/pull/792
@devxpy made their first contribution in https://github.com/huggingface/trl/pull/749
@namin made their first contribution in https://github.com/huggingface/trl/pull/811
@cakiki made their first contribution in https://github.com/huggingface/trl/pull/829
@larekrow made their first contribution in https://github.com/huggingface/trl/pull/838
@adampauls made their first contribution in https://github.com/huggingface/trl/pull/853
@abhishekkrthakur made their first contribution in https://github.com/huggingface/trl/pull/857
@osanseviero made their first contribution in https://github.com/huggingface/trl/pull/843

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2

v0.7.2

0.7.2: Flash Attention documentation and Minor bugfixes

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

How to use Flash Attention with `SFTTrainer`: