v0.8.0 — TRL — releases.sh

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

fix bugs in KTO implementation by @kawine in https://github.com/huggingface/trl/pull/1380
[KTO] merge eval dataset only if it exists by @kashif in https://github.com/huggingface/trl/pull/1383
[KTO] prevent nans from appearing in metrics by @kawine in https://github.com/huggingface/trl/pull/1386
Kto trainer by @kashif in https://github.com/huggingface/trl/pull/1181
[KTO] fix tokenization bugs by @kawine in https://github.com/huggingface/trl/pull/1418
[KTO] model init when args are given by @kashif in https://github.com/huggingface/trl/pull/1413
[KTO] fix various bugs by @kawine in https://github.com/huggingface/trl/pull/1402

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FEAT: Add CLIs in TRL ! by @younesbelkada in https://github.com/huggingface/trl/pull/1419
CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in https://github.com/huggingface/trl/pull/1446
chat cli by @lvwerra in https://github.com/huggingface/trl/pull/1431
Fix yaml parsing issue by @younesbelkada in https://github.com/huggingface/trl/pull/1450
model --> model_name_or_path by @lvwerra in https://github.com/huggingface/trl/pull/1452
FEAT: Update README to add DPO + CLIs by @younesbelkada in https://github.com/huggingface/trl/pull/1448

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in https://github.com/huggingface/trl/pull/1416

Other fixes

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1332
Update stack llama 2 example to reflect #aa35fec by @nautsimon in https://github.com/huggingface/trl/pull/1333
FIX: More user friendly error when users don't have PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/1350
fix 8-bit multi-gpu training bug by @fancyerii in https://github.com/huggingface/trl/pull/1353
set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in https://github.com/huggingface/trl/pull/1357
Fix transformers version checking for Python < 3.8 by @samuki in https://github.com/huggingface/trl/pull/1363
Add some arguments for support XPU by @yuanwu2017 in https://github.com/huggingface/trl/pull/1366
ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in https://github.com/huggingface/trl/pull/1370
FEAT: [SFTTrainer] Add eval_packing by @younesbelkada in https://github.com/huggingface/trl/pull/1369
FEAT: force_use_ref_model for power users by @younesbelkada in https://github.com/huggingface/trl/pull/1367
FIX: fix after #1370 by @younesbelkada in https://github.com/huggingface/trl/pull/1372
FIX: Change ci to fail-fast=False by @younesbelkada in https://github.com/huggingface/trl/pull/1373
FIX: Fix the CI again .. by @younesbelkada in https://github.com/huggingface/trl/pull/1374
Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in https://github.com/huggingface/trl/pull/1391
Fix the pad_token_id error by @yuanwu2017 in https://github.com/huggingface/trl/pull/1394
FIX [RewardModeling] Fix RM script for PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/1393
Fix import error from deprecation in transformers by @lewtun in https://github.com/huggingface/trl/pull/1415
CI: Fix CI on main by @younesbelkada in https://github.com/huggingface/trl/pull/1422
[Kto] torch_dtype kwargs fix by @kashif in https://github.com/huggingface/trl/pull/1429
Create standard dataset for TRL by @vwxyzjn in https://github.com/huggingface/trl/pull/1424
FIX: fix doc build on main by @younesbelkada in https://github.com/huggingface/trl/pull/1437
Fix PPOTrainer README example by @nikihowe in https://github.com/huggingface/trl/pull/1441
Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in https://github.com/huggingface/trl/pull/1439
Release: v0.8.0 by @younesbelkada in https://github.com/huggingface/trl/pull/1453

New Contributors

@nautsimon made their first contribution in https://github.com/huggingface/trl/pull/1333
@fancyerii made their first contribution in https://github.com/huggingface/trl/pull/1353
@samuki made their first contribution in https://github.com/huggingface/trl/pull/1363
@yuanwu2017 made their first contribution in https://github.com/huggingface/trl/pull/1366
@kawine made their first contribution in https://github.com/huggingface/trl/pull/1380
@skavulya made their first contribution in https://github.com/huggingface/trl/pull/1391
@pengwei715 made their first contribution in https://github.com/huggingface/trl/pull/1439

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.11...v0.8.0