v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1207
Check tokenize params on DPOTrainer by @pablovicente in https://github.com/huggingface/trl/pull/1197
Fix shape descriptions in calculate_loss method by @yuta0x89 in https://github.com/huggingface/trl/pull/1204
Fix FSDP error by @mgerstgrasser in https://github.com/huggingface/trl/pull/1196
Update Unsloth SFT, DPO docs by @danielhanchen in https://github.com/huggingface/trl/pull/1213
Fix args type by @zspo in https://github.com/huggingface/trl/pull/1214
[core / Docker] Add workflow to build TRL docker images by @younesbelkada in https://github.com/huggingface/trl/pull/1215
Refactor RewardConfig to own module by @lewtun in https://github.com/huggingface/trl/pull/1221
Add support for ChatML dataset format in by @philschmid in https://github.com/huggingface/trl/pull/1208
Add slow test workflow file by @younesbelkada in https://github.com/huggingface/trl/pull/1223
Remove a repeating line in how_to_train.md by @kykim0 in https://github.com/huggingface/trl/pull/1226
Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in https://github.com/huggingface/trl/pull/1160
fix: improve error message when pad_token_id is not configured by @yumemio in https://github.com/huggingface/trl/pull/1152
[core / tests ] v1 slow tests by @younesbelkada in https://github.com/huggingface/trl/pull/1218
[core / SFTTrainer] Fix breaking change by @younesbelkada in https://github.com/huggingface/trl/pull/1229
Fixes slow tests by @younesbelkada in https://github.com/huggingface/trl/pull/1241
Fix weird doc bug by @younesbelkada in https://github.com/huggingface/trl/pull/1244
Add setup_chat_format for adding new special tokens to model for training chat models by @philschmid in https://github.com/huggingface/trl/pull/1242
Fix chatml template by @philschmid in https://github.com/huggingface/trl/pull/1248
fix: fix loss_type and some args desc by @zspo in https://github.com/huggingface/trl/pull/1247
Release: v0.7.10 by @younesbelkada in https://github.com/huggingface/trl/pull/1253

New Contributors

@yuta0x89 made their first contribution in https://github.com/huggingface/trl/pull/1204
@danielhanchen made their first contribution in https://github.com/huggingface/trl/pull/1213
@zspo made their first contribution in https://github.com/huggingface/trl/pull/1214
@philschmid made their first contribution in https://github.com/huggingface/trl/pull/1208
@kykim0 made their first contribution in https://github.com/huggingface/trl/pull/1226
@AjayP13 made their first contribution in https://github.com/huggingface/trl/pull/1160
@yumemio made their first contribution in https://github.com/huggingface/trl/pull/1152

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.9...v0.7.10

v0.7.10

v0.7.10: Minor fixes, Automatic templating, setup_chat_format API, stronger tests

What's Changed

New Contributors

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests