releases.shpreview

v0.7.3

v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

November 10, 2023TRLView original ↗
$npx -y @buildinternet/releases show rel_P7OVg55Da2PNv4EQI0MJA

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

Read more about it here

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.2...v0.7.3

Fetched April 7, 2026