`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

Introducing the Iterative Trainer by @gaetanlop in https://github.com/huggingface/trl/pull/737

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

[SFTTrainer] Adds NEFTune into SFTTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/871
[NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889
Generalize NEFTune for FSDP, DDP, ... by @younesbelkada in https://github.com/huggingface/trl/pull/924
[NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

[DPO] fix DPO + GC issues by @younesbelkada in https://github.com/huggingface/trl/pull/927
[core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in https://github.com/huggingface/trl/pull/912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

[DPO] add SLiC hinge loss to DPOTrainer by @kashif in https://github.com/huggingface/trl/pull/866
Fix DPOTrainer + PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/941
[DPO] Merge initial peft model if trainer has a peft_config by @kashif in https://github.com/huggingface/trl/pull/956
Adds model kwargs to SFT and DPO trainers by @edbeeching in https://github.com/huggingface/trl/pull/951
fix: dpo trainer ds config by @mengban in https://github.com/huggingface/trl/pull/957
hotfix for dpo trainer by @mnoukhov in https://github.com/huggingface/trl/pull/919
Fix dpo_llama2.py by @younesbelkada in https://github.com/huggingface/trl/pull/934

What's Changed

Release: v0.7.2 by @younesbelkada in https://github.com/huggingface/trl/pull/863
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/864
Remove duplicate key in reward_modeling.py by @vwxyzjn in https://github.com/huggingface/trl/pull/890
fix peft_config type by @u2takey in https://github.com/huggingface/trl/pull/883
fix: remove useless token by @rtrompier in https://github.com/huggingface/trl/pull/896
[reward_modeling] Cleaning example script by @gaetanlop in https://github.com/huggingface/trl/pull/882
Fix couple wrong links on lib homepage by @paulbricman in https://github.com/huggingface/trl/pull/908
Add whiten ops before compute advatanges by @SingL3 in https://github.com/huggingface/trl/pull/887
Fix broken link/markdown by @osanseviero in https://github.com/huggingface/trl/pull/903
[Update reward_trainer.py] append PeftSavingCallback if callbacks is not None by @zuoxingdong in https://github.com/huggingface/trl/pull/910
deactivate MacOS CI by @lvwerra in https://github.com/huggingface/trl/pull/913
fix stackllama2 sft gradient checkpointing by @nrailg in https://github.com/huggingface/trl/pull/906
updating PPOTrainer docstring by @lomahony in https://github.com/huggingface/trl/pull/897
Bump minimum tyro version by @brentyi in https://github.com/huggingface/trl/pull/928
[Feature] Enable Intel XPU support by @abhilash1910 in https://github.com/huggingface/trl/pull/839
[SFTTrainer] Make sure to not conflict between transformers and TRL implementation by @younesbelkada in https://github.com/huggingface/trl/pull/933
Fix stale bot by @younesbelkada in https://github.com/huggingface/trl/pull/935
Optionally logging reference response by @vwxyzjn in https://github.com/huggingface/trl/pull/847
[CI] Fix CI with new transformers release by @younesbelkada in https://github.com/huggingface/trl/pull/946
Fix unwrapping peft models by @kkteru in https://github.com/huggingface/trl/pull/948
Added support for custom EncoderDecoder models by @ribesstefano in https://github.com/huggingface/trl/pull/911

New Contributors

@u2takey made their first contribution in https://github.com/huggingface/trl/pull/883
@rtrompier made their first contribution in https://github.com/huggingface/trl/pull/896
@paulbricman made their first contribution in https://github.com/huggingface/trl/pull/908
@SingL3 made their first contribution in https://github.com/huggingface/trl/pull/887
@nrailg made their first contribution in https://github.com/huggingface/trl/pull/906
@lomahony made their first contribution in https://github.com/huggingface/trl/pull/897
@brentyi made their first contribution in https://github.com/huggingface/trl/pull/928
@abhilash1910 made their first contribution in https://github.com/huggingface/trl/pull/839
@kkteru made their first contribution in https://github.com/huggingface/trl/pull/948
@ribesstefano made their first contribution in https://github.com/huggingface/trl/pull/911
@mengban made their first contribution in https://github.com/huggingface/trl/pull/957

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.2...v0.7.3

v0.7.3

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training