releases.shpreview

Hugging Face/TRL/v0.7.5

v0.7.5

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

December 22, 2023TRLView original ↗

$npx -y @buildinternet/releases show rel_lR1i0TIss1QxqNl1ek_R_

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for DPOTrainer:

IPO-loss for a better generalization of DPO algorithm
KTO & cDPO loss
You can also pass pre-computed logits to DPOTrainer

[DPO] Refactor eval logging of dpo trainer by @mnoukhov in https://github.com/huggingface/trl/pull/954
Fixes reward and text gathering in distributed training by @edbeeching in https://github.com/huggingface/trl/pull/850
remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in https://github.com/huggingface/trl/pull/1045
Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in https://github.com/huggingface/trl/pull/1047
Fix DPOTrainer + PEFT 2 by @rdk31 in https://github.com/huggingface/trl/pull/1049
[DPO] IPO Training loss by @kashif in https://github.com/huggingface/trl/pull/1022
[DPO] cDPO loss by @kashif in https://github.com/huggingface/trl/pull/1035
[DPO] use ref model logprobs if it exists in the data by @kashif in https://github.com/huggingface/trl/pull/885
[DP0] save eval_dataset for subsequent calls by @kashif in https://github.com/huggingface/trl/pull/1125
[DPO] rename kto loss by @kashif in https://github.com/huggingface/trl/pull/1127
[DPO] add KTO loss by @kashif in https://github.com/huggingface/trl/pull/1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

[xxxTrainer] Add tags to all trainers in TRL by @younesbelkada in https://github.com/huggingface/trl/pull/1120

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

[Docs] Add unsloth optimizations in TRL's documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1119

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/970
[Tests] Add non optional packages tests by @younesbelkada in https://github.com/huggingface/trl/pull/974
[DOCS] Fix outdated references to examples/ by @alvarobartt in https://github.com/huggingface/trl/pull/977
Update README.md by @GeekDream-x in https://github.com/huggingface/trl/pull/994
[DataCollatorForCompletionOnlyLM] Warn on identical eos_token_id and pad_token_id by @MustSave in https://github.com/huggingface/trl/pull/988
[DataCollatorForCompletionOnlyLM] Add more clarification / guidance in the case tokenizer.pad_token_id == tokenizer.eos_token_id by @younesbelkada in https://github.com/huggingface/trl/pull/992
make distributed true for multiple process by @allanj in https://github.com/huggingface/trl/pull/997
Fixed wrong trigger for warning by @zabealbe in https://github.com/huggingface/trl/pull/971
Update how_to_train.md by @halfrot in https://github.com/huggingface/trl/pull/1003
Adds requires_grad to input for non-quantized peft models by @younesbelkada in https://github.com/huggingface/trl/pull/1006
[Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in https://github.com/huggingface/trl/pull/982
Remove duplicate data loading in rl_training.py by @viethoangtranduong in https://github.com/huggingface/trl/pull/1020
[Document] Minor fixes of sft_trainer document by @mutichung in https://github.com/huggingface/trl/pull/1029
Update utils.py by @ZihanWang314 in https://github.com/huggingface/trl/pull/1012
spelling is hard by @grahamannett in https://github.com/huggingface/trl/pull/1043
Fixing accelerator version function call. by @ParthaEth in https://github.com/huggingface/trl/pull/1056
[SFT Trainer] precompute packed iterable into a dataset by @lvwerra in https://github.com/huggingface/trl/pull/979
Update doc CI by @lewtun in https://github.com/huggingface/trl/pull/1060
Improve PreTrainedModelWrapper._get_current_device by @billvsme in https://github.com/huggingface/trl/pull/1048
Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in https://github.com/huggingface/trl/pull/1062
[core] Fix failing tests on main by @younesbelkada in https://github.com/huggingface/trl/pull/1065
[SFTTrainer] Fix Trainer when args is None by @younesbelkada in https://github.com/huggingface/trl/pull/1064
enable multiple eval datasets by @peter-sk in https://github.com/huggingface/trl/pull/1052
Add missing loss_type in ValueError message by @alvarobartt in https://github.com/huggingface/trl/pull/1067
Add args to SFT example by @lewtun in https://github.com/huggingface/trl/pull/1079
add local folder support as input for rl_training. by @sywangyi in https://github.com/huggingface/trl/pull/1078
Make CI happy by @younesbelkada in https://github.com/huggingface/trl/pull/1080
Removing tyro in sft_llama2.py by @vwxyzjn in https://github.com/huggingface/trl/pull/1081
Log arg consistency by @tcapelle in https://github.com/huggingface/trl/pull/1084
Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in https://github.com/huggingface/trl/pull/1092
[Feature] Add Ascend NPU accelerator support by @statelesshz in https://github.com/huggingface/trl/pull/1096
peft_module_casting_to_bf16 util method, append_concat_token flag, remove callback PeftSavingCallback by @pacman100 in https://github.com/huggingface/trl/pull/1110
Make prepending of bos token configurable. by @pacman100 in https://github.com/huggingface/trl/pull/1114
fix gradient checkpointing when using PEFT by @pacman100 in https://github.com/huggingface/trl/pull/1118
Update description in setup.py by @alvarobartt in https://github.com/huggingface/trl/pull/1101

New Contributors

@alvarobartt made their first contribution in https://github.com/huggingface/trl/pull/977
@GeekDream-x made their first contribution in https://github.com/huggingface/trl/pull/994
@MustSave made their first contribution in https://github.com/huggingface/trl/pull/988
@allanj made their first contribution in https://github.com/huggingface/trl/pull/997
@zabealbe made their first contribution in https://github.com/huggingface/trl/pull/971
@viethoangtranduong made their first contribution in https://github.com/huggingface/trl/pull/1020
@mutichung made their first contribution in https://github.com/huggingface/trl/pull/1029
@ZihanWang314 made their first contribution in https://github.com/huggingface/trl/pull/1012
@grahamannett made their first contribution in https://github.com/huggingface/trl/pull/1043
@ChanderG made their first contribution in https://github.com/huggingface/trl/pull/1045
@rdk31 made their first contribution in https://github.com/huggingface/trl/pull/1049
@ParthaEth made their first contribution in https://github.com/huggingface/trl/pull/1056
@billvsme made their first contribution in https://github.com/huggingface/trl/pull/1048
@albertauyeung made their first contribution in https://github.com/huggingface/trl/pull/1062
@peter-sk made their first contribution in https://github.com/huggingface/trl/pull/1052
@sywangyi made their first contribution in https://github.com/huggingface/trl/pull/1078
@tcapelle made their first contribution in https://github.com/huggingface/trl/pull/1084
@cm2435 made their first contribution in https://github.com/huggingface/trl/pull/1092
@statelesshz made their first contribution in https://github.com/huggingface/trl/pull/1096
@pacman100 made their first contribution in https://github.com/huggingface/trl/pull/1110

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5

Fetched April 7, 2026