v0.9.3 — TRL — releases.sh

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.

https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84

New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1568
fix add_special_tokens issue for data with template by @edixiong in https://github.com/huggingface/trl/pull/1509
[DPO] add 'bco_pair' loss_type by @seanexp in https://github.com/huggingface/trl/pull/1524
[DPO] DPOConfig class by @kashif in https://github.com/huggingface/trl/pull/1554
[SFT] add SFT Trainer Config dataclass by @kashif in https://github.com/huggingface/trl/pull/1530
FIX: Fix CI on transformers main by @younesbelkada in https://github.com/huggingface/trl/pull/1576
[SFTTrainer] Add warning in SFTTrainer when dataset already processed by @younesbelkada in https://github.com/huggingface/trl/pull/1577
Fix typo detoxifying doc by @qgallouedec in https://github.com/huggingface/trl/pull/1594
Core: removed unexisting SftArgumentParser by @younesbelkada in https://github.com/huggingface/trl/pull/1602
[KTOTrainer] add BCO (reward shift and underlying distribution matching) by @seanexp in https://github.com/huggingface/trl/pull/1599
[CLI] Use auto device map for model load by @lewtun in https://github.com/huggingface/trl/pull/1596
Removing tests/ from package data by @jamesbraza in https://github.com/huggingface/trl/pull/1607
Docs: Fix build main documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1604
support loss function for Self-play Preference Optimization by @winglian in https://github.com/huggingface/trl/pull/1612
Update HH dataset on helpful only subset by @vwxyzjn in https://github.com/huggingface/trl/pull/1613
corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in https://github.com/huggingface/trl/pull/1615
Fix ZeRO-3 generation context manager by @lewtun in https://github.com/huggingface/trl/pull/1617
fixed adding bos and eos token unconditionally by @jasonyux in https://github.com/huggingface/trl/pull/1591
visualize rm prediction by @vwxyzjn in https://github.com/huggingface/trl/pull/1636
[ORPO] Correct label mask for pad tokens by @IlyaGusev in https://github.com/huggingface/trl/pull/1625
Update sft_llama2.py to work with the latest API by @xianbaoqian in https://github.com/huggingface/trl/pull/1637
Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in https://github.com/huggingface/trl/pull/1641
Pairwise Noise Contrastive Alignment by @winglian in https://github.com/huggingface/trl/pull/1632
don't cast the trainable lora layers to half precision by @pacman100 in https://github.com/huggingface/trl/pull/1644
PPO / Reinforce Trainers by @vwxyzjn in https://github.com/huggingface/trl/pull/1540
Apply deprecated evaluation_strategy by @muellerzr in https://github.com/huggingface/trl/pull/1559
FEAT: Add support for training collator in PPOTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1658
Correct Documentation for cDPO Usage by @AliBakly in https://github.com/huggingface/trl/pull/1655
Fix inheritance order in PPOv2Config by @Nicolinho in https://github.com/huggingface/trl/pull/1659
[DPO] Add 'robust' loss_type by @Abilityguy in https://github.com/huggingface/trl/pull/1653
🤫 TR-DPO implementation by @syrn1k in https://github.com/huggingface/trl/pull/1593
Do not upcast adapters when using FSDP+QLoRA by @pacman100 in https://github.com/huggingface/trl/pull/1654
[Tests] update eval_strategy API by @kashif in https://github.com/huggingface/trl/pull/1662
Fix ppov2 test case by @vwxyzjn in https://github.com/huggingface/trl/pull/1661
FIX / PPO: Fix enable_input_require_grads issues with PPO models by @younesbelkada in https://github.com/huggingface/trl/pull/1664
fix dataset load error by @sywangyi in https://github.com/huggingface/trl/pull/1670
FIX / SFTTrainer: Fix SFTTrainer with args=None by @younesbelkada in https://github.com/huggingface/trl/pull/1678
Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in https://github.com/huggingface/trl/pull/1588
intial RPO loss by @kashif in https://github.com/huggingface/trl/pull/1686
Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in https://github.com/huggingface/trl/pull/1690
Skip packing validation by @alex-jw-brooks in https://github.com/huggingface/trl/pull/1673
Fix typo in DPOTrainer's warnings by @qgallouedec in https://github.com/huggingface/trl/pull/1688
Quick fix on GPT4-eval by @vwxyzjn in https://github.com/huggingface/trl/pull/1696
Release 0.9.2 by @vwxyzjn in https://github.com/huggingface/trl/pull/1697

New Contributors

@edixiong made their first contribution in https://github.com/huggingface/trl/pull/1509
@seanexp made their first contribution in https://github.com/huggingface/trl/pull/1524
@jamesbraza made their first contribution in https://github.com/huggingface/trl/pull/1607
@winglian made their first contribution in https://github.com/huggingface/trl/pull/1612
@angelahzyuan made their first contribution in https://github.com/huggingface/trl/pull/1615
@jasonyux made their first contribution in https://github.com/huggingface/trl/pull/1591
@IlyaGusev made their first contribution in https://github.com/huggingface/trl/pull/1625
@xianbaoqian made their first contribution in https://github.com/huggingface/trl/pull/1637
@bartoszzuk made their first contribution in https://github.com/huggingface/trl/pull/1641
@muellerzr made their first contribution in https://github.com/huggingface/trl/pull/1559
@AliBakly made their first contribution in https://github.com/huggingface/trl/pull/1655
@Nicolinho made their first contribution in https://github.com/huggingface/trl/pull/1659
@Abilityguy made their first contribution in https://github.com/huggingface/trl/pull/1653
@syrn1k made their first contribution in https://github.com/huggingface/trl/pull/1593
@alexisrozhkov made their first contribution in https://github.com/huggingface/trl/pull/1690
@alex-jw-brooks made their first contribution in https://github.com/huggingface/trl/pull/1673

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.6...v0.9.2