ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

ORPO trainer by @kashif in https://github.com/huggingface/trl/pull/1435
[ORPO] use log1p for loss by @kashif in https://github.com/huggingface/trl/pull/1491

CPO Trainer

Add CPOTrainer by @fe1ixxu in https://github.com/huggingface/trl/pull/1382
Add use_cache=False in {ORPO,CPO}Trainer.concatenated_forward by @alvarobartt in https://github.com/huggingface/trl/pull/1478
[ORPO] Update NLL loss to use input_ids instead by @alvarobartt in https://github.com/huggingface/trl/pull/1516

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava ! See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in https://github.com/huggingface/trl/pull/1518

KTO Fixes

Many fixes were introduced for the KTOTrainer:

Update KTO example to use better model and ChatML support by @lewtun in https://github.com/huggingface/trl/pull/1485
[KTO] Use batching to speed up data processing by @lewtun in https://github.com/huggingface/trl/pull/1470
Update KTO example with good dataset & chat format by @lewtun in https://github.com/huggingface/trl/pull/1481
[KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in https://github.com/huggingface/trl/pull/1499
[KTO] fix metric logging by @claralp in https://github.com/huggingface/trl/pull/1514

10x PPO !

Speed up PPO with ZeRO-3 by 10x 🔥 by @lewtun in https://github.com/huggingface/trl/pull/1483

Other fixes

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1463
Use the standard dataset for DPO CLI by @vwxyzjn in https://github.com/huggingface/trl/pull/1456
[peft] Update test_reward_trainer.py to fix tests by @kashif in https://github.com/huggingface/trl/pull/1471
Fix hyperparameters in KTO example by @lewtun in https://github.com/huggingface/trl/pull/1474
docs: add missing Trainer classes and sort alphabetically by @anakin87 in https://github.com/huggingface/trl/pull/1479
hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in https://github.com/huggingface/trl/pull/1488
Ignore chat files by @lewtun in https://github.com/huggingface/trl/pull/1486
Add DPO link in README by @qgallouedec in https://github.com/huggingface/trl/pull/1502
Fix typo in how_to_train.md by @ftorres16 in https://github.com/huggingface/trl/pull/1503
Fix DPO Unsloth example in Docs by @arnavgarg1 in https://github.com/huggingface/trl/pull/1494
Correct ppo_epochs usage by @muhammed-shihebi in https://github.com/huggingface/trl/pull/1480
Fix RichProgressCallback by @eggry in https://github.com/huggingface/trl/pull/1496
Change the device index to device:index by @yuanwu2017 in https://github.com/huggingface/trl/pull/1490
FIX: use kwargs for RMTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1515
Allow streaming (datasets.IterableDataset) by @BramVanroy in https://github.com/huggingface/trl/pull/1468
Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in https://github.com/huggingface/trl/pull/1520
[DOC] Add data description for sfttrainer doc by @BramVanroy in https://github.com/huggingface/trl/pull/1521
Release: v0.8.2 by @younesbelkada in https://github.com/huggingface/trl/pull/1522

New Contributors

@fe1ixxu made their first contribution in https://github.com/huggingface/trl/pull/1382
@anakin87 made their first contribution in https://github.com/huggingface/trl/pull/1479
@galtay made their first contribution in https://github.com/huggingface/trl/pull/1488
@qgallouedec made their first contribution in https://github.com/huggingface/trl/pull/1502
@ftorres16 made their first contribution in https://github.com/huggingface/trl/pull/1503
@arnavgarg1 made their first contribution in https://github.com/huggingface/trl/pull/1494
@muhammed-shihebi made their first contribution in https://github.com/huggingface/trl/pull/1480
@eggry made their first contribution in https://github.com/huggingface/trl/pull/1496
@claralp made their first contribution in https://github.com/huggingface/trl/pull/1514

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.1...v0.8.2