Hugging Face/TRL

TRL

$npx -y @buildinternet/releases show trl

Sun

Mon

Tue

Wed

Thu

Fri

Sat

AprMayJunJulAugSepOctNovDecJanFebMarApr

Less

Releases10Avg3/moVersionsv0.27.0 → v1.2.0

Dec 22, 2023

v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`

Patch release: Multi-tag instead of single tags for `xxxTrainer`

This is a patch release to push multiple tags (e.g. trl & sft) instead of one tag

What's Changed

Release: v0.7.5 by @younesbelkada in https://github.com/huggingface/trl/pull/1131
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1132
[xxxTrainer] multi-tags support for tagging by @younesbelkada in https://github.com/huggingface/trl/pull/1133

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.5...v0.7.6

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for DPOTrainer:

IPO-loss for a better generalization of DPO algorithm
KTO & cDPO loss
You can also pass pre-computed logits to DPOTrainer

[DPO] Refactor eval logging of dpo trainer by @mnoukhov in https://github.com/huggingface/trl/pull/954
Fixes reward and text gathering in distributed training by @edbeeching in https://github.com/huggingface/trl/pull/850
remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in https://github.com/huggingface/trl/pull/1045
Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in https://github.com/huggingface/trl/pull/1047
Fix DPOTrainer + PEFT 2 by @rdk31 in https://github.com/huggingface/trl/pull/1049
[DPO] IPO Training loss by @kashif in https://github.com/huggingface/trl/pull/1022
[DPO] cDPO loss by @kashif in https://github.com/huggingface/trl/pull/1035
[DPO] use ref model logprobs if it exists in the data by @kashif in https://github.com/huggingface/trl/pull/885
[DP0] save eval_dataset for subsequent calls by @kashif in https://github.com/huggingface/trl/pull/1125
[DPO] rename kto loss by @kashif in https://github.com/huggingface/trl/pull/1127
[DPO] add KTO loss by @kashif in https://github.com/huggingface/trl/pull/1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

[xxxTrainer] Add tags to all trainers in TRL by @younesbelkada in https://github.com/huggingface/trl/pull/1120

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

[Docs] Add unsloth optimizations in TRL's documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1119

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/970
[Tests] Add non optional packages tests by @younesbelkada in https://github.com/huggingface/trl/pull/974
[DOCS] Fix outdated references to examples/ by @alvarobartt in https://github.com/huggingface/trl/pull/977
Update README.md by @GeekDream-x in https://github.com/huggingface/trl/pull/994
[DataCollatorForCompletionOnlyLM] Warn on identical eos_token_id and pad_token_id by @MustSave in https://github.com/huggingface/trl/pull/988
[DataCollatorForCompletionOnlyLM] Add more clarification / guidance in the case tokenizer.pad_token_id == tokenizer.eos_token_id by @younesbelkada in https://github.com/huggingface/trl/pull/992
make distributed true for multiple process by @allanj in https://github.com/huggingface/trl/pull/997
Fixed wrong trigger for warning by @zabealbe in https://github.com/huggingface/trl/pull/971
Update how_to_train.md by @halfrot in https://github.com/huggingface/trl/pull/1003
Adds requires_grad to input for non-quantized peft models by @younesbelkada in https://github.com/huggingface/trl/pull/1006
[Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in https://github.com/huggingface/trl/pull/982
Remove duplicate data loading in rl_training.py by @viethoangtranduong in https://github.com/huggingface/trl/pull/1020
[Document] Minor fixes of sft_trainer document by @mutichung in https://github.com/huggingface/trl/pull/1029
Update utils.py by @ZihanWang314 in https://github.com/huggingface/trl/pull/1012
spelling is hard by @grahamannett in https://github.com/huggingface/trl/pull/1043
Fixing accelerator version function call. by @ParthaEth in https://github.com/huggingface/trl/pull/1056
[SFT Trainer] precompute packed iterable into a dataset by @lvwerra in https://github.com/huggingface/trl/pull/979
Update doc CI by @lewtun in https://github.com/huggingface/trl/pull/1060
Improve PreTrainedModelWrapper._get_current_device by @billvsme in https://github.com/huggingface/trl/pull/1048
Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in https://github.com/huggingface/trl/pull/1062
[core] Fix failing tests on main by @younesbelkada in https://github.com/huggingface/trl/pull/1065
[SFTTrainer] Fix Trainer when args is None by @younesbelkada in https://github.com/huggingface/trl/pull/1064
enable multiple eval datasets by @peter-sk in https://github.com/huggingface/trl/pull/1052
Add missing loss_type in ValueError message by @alvarobartt in https://github.com/huggingface/trl/pull/1067
Add args to SFT example by @lewtun in https://github.com/huggingface/trl/pull/1079
add local folder support as input for rl_training. by @sywangyi in https://github.com/huggingface/trl/pull/1078
Make CI happy by @younesbelkada in https://github.com/huggingface/trl/pull/1080
Removing tyro in sft_llama2.py by @vwxyzjn in https://github.com/huggingface/trl/pull/1081
Log arg consistency by @tcapelle in https://github.com/huggingface/trl/pull/1084
Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in https://github.com/huggingface/trl/pull/1092
[Feature] Add Ascend NPU accelerator support by @statelesshz in https://github.com/huggingface/trl/pull/1096
peft_module_casting_to_bf16 util method, append_concat_token flag, remove callback PeftSavingCallback by @pacman100 in https://github.com/huggingface/trl/pull/1110
Make prepending of bos token configurable. by @pacman100 in https://github.com/huggingface/trl/pull/1114
fix gradient checkpointing when using PEFT by @pacman100 in https://github.com/huggingface/trl/pull/1118
Update description in setup.py by @alvarobartt in https://github.com/huggingface/trl/pull/1101

New Contributors

@alvarobartt made their first contribution in https://github.com/huggingface/trl/pull/977
@GeekDream-x made their first contribution in https://github.com/huggingface/trl/pull/994
@MustSave made their first contribution in https://github.com/huggingface/trl/pull/988
@allanj made their first contribution in https://github.com/huggingface/trl/pull/997
@zabealbe made their first contribution in https://github.com/huggingface/trl/pull/971
@viethoangtranduong made their first contribution in https://github.com/huggingface/trl/pull/1020
@mutichung made their first contribution in https://github.com/huggingface/trl/pull/1029
@ZihanWang314 made their first contribution in https://github.com/huggingface/trl/pull/1012
@grahamannett made their first contribution in https://github.com/huggingface/trl/pull/1043
@ChanderG made their first contribution in https://github.com/huggingface/trl/pull/1045
@rdk31 made their first contribution in https://github.com/huggingface/trl/pull/1049
@ParthaEth made their first contribution in https://github.com/huggingface/trl/pull/1056
@billvsme made their first contribution in https://github.com/huggingface/trl/pull/1048
@albertauyeung made their first contribution in https://github.com/huggingface/trl/pull/1062
@peter-sk made their first contribution in https://github.com/huggingface/trl/pull/1052
@sywangyi made their first contribution in https://github.com/huggingface/trl/pull/1078
@tcapelle made their first contribution in https://github.com/huggingface/trl/pull/1084
@cm2435 made their first contribution in https://github.com/huggingface/trl/pull/1092
@statelesshz made their first contribution in https://github.com/huggingface/trl/pull/1096
@pacman100 made their first contribution in https://github.com/huggingface/trl/pull/1110

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5

Nov 10, 2023

v0.7.4: Patch Release

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

Release: v0.7.3 by @younesbelkada in https://github.com/huggingface/trl/pull/965
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/966
[core] Fix peft config typehint by @younesbelkada in https://github.com/huggingface/trl/pull/967
Pin bnb to <=0.41.1 by @younesbelkada in https://github.com/huggingface/trl/pull/968

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.3...v0.7.4

v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

Introducing the Iterative Trainer by @gaetanlop in https://github.com/huggingface/trl/pull/737

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

[SFTTrainer] Adds NEFTune into SFTTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/871
[NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889
Generalize NEFTune for FSDP, DDP, ... by @younesbelkada in https://github.com/huggingface/trl/pull/924
[NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

[DPO] fix DPO + GC issues by @younesbelkada in https://github.com/huggingface/trl/pull/927
[core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in https://github.com/huggingface/trl/pull/912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

[DPO] add SLiC hinge loss to DPOTrainer by @kashif in https://github.com/huggingface/trl/pull/866
Fix DPOTrainer + PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/941
[DPO] Merge initial peft model if trainer has a peft_config by @kashif in https://github.com/huggingface/trl/pull/956
Adds model kwargs to SFT and DPO trainers by @edbeeching in https://github.com/huggingface/trl/pull/951
fix: dpo trainer ds config by @mengban in https://github.com/huggingface/trl/pull/957
hotfix for dpo trainer by @mnoukhov in https://github.com/huggingface/trl/pull/919
Fix dpo_llama2.py by @younesbelkada in https://github.com/huggingface/trl/pull/934

What's Changed

Release: v0.7.2 by @younesbelkada in https://github.com/huggingface/trl/pull/863
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/864
Remove duplicate key in reward_modeling.py by @vwxyzjn in https://github.com/huggingface/trl/pull/890
fix peft_config type by @u2takey in https://github.com/huggingface/trl/pull/883
fix: remove useless token by @rtrompier in https://github.com/huggingface/trl/pull/896
[reward_modeling] Cleaning example script by @gaetanlop in https://github.com/huggingface/trl/pull/882
Fix couple wrong links on lib homepage by @paulbricman in https://github.com/huggingface/trl/pull/908
Add whiten ops before compute advatanges by @SingL3 in https://github.com/huggingface/trl/pull/887
Fix broken link/markdown by @osanseviero in https://github.com/huggingface/trl/pull/903
[Update reward_trainer.py] append PeftSavingCallback if callbacks is not None by @zuoxingdong in https://github.com/huggingface/trl/pull/910
deactivate MacOS CI by @lvwerra in https://github.com/huggingface/trl/pull/913
fix stackllama2 sft gradient checkpointing by @nrailg in https://github.com/huggingface/trl/pull/906
updating PPOTrainer docstring by @lomahony in https://github.com/huggingface/trl/pull/897
Bump minimum tyro version by @brentyi in https://github.com/huggingface/trl/pull/928
[Feature] Enable Intel XPU support by @abhilash1910 in https://github.com/huggingface/trl/pull/839
[SFTTrainer] Make sure to not conflict between transformers and TRL implementation by @younesbelkada in https://github.com/huggingface/trl/pull/933
Fix stale bot by @younesbelkada in https://github.com/huggingface/trl/pull/935
Optionally logging reference response by @vwxyzjn in https://github.com/huggingface/trl/pull/847
[CI] Fix CI with new transformers release by @younesbelkada in https://github.com/huggingface/trl/pull/946
Fix unwrapping peft models by @kkteru in https://github.com/huggingface/trl/pull/948
Added support for custom EncoderDecoder models by @ribesstefano in https://github.com/huggingface/trl/pull/911

New Contributors

@u2takey made their first contribution in https://github.com/huggingface/trl/pull/883
@rtrompier made their first contribution in https://github.com/huggingface/trl/pull/896
@paulbricman made their first contribution in https://github.com/huggingface/trl/pull/908
@SingL3 made their first contribution in https://github.com/huggingface/trl/pull/887
@nrailg made their first contribution in https://github.com/huggingface/trl/pull/906
@lomahony made their first contribution in https://github.com/huggingface/trl/pull/897
@brentyi made their first contribution in https://github.com/huggingface/trl/pull/928
@abhilash1910 made their first contribution in https://github.com/huggingface/trl/pull/839
@kkteru made their first contribution in https://github.com/huggingface/trl/pull/948
@ribesstefano made their first contribution in https://github.com/huggingface/trl/pull/911
@mengban made their first contribution in https://github.com/huggingface/trl/pull/957

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.2...v0.7.3

Oct 12, 2023

0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with `SFTTrainer`:

Update sft_trainer.mdx to highlight Flash Attention features by @younesbelkada in https://github.com/huggingface/trl/pull/807

What's Changed

Release: v0.7.1 by @younesbelkada in https://github.com/huggingface/trl/pull/709
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/710
fix device issue by @backpropper in https://github.com/huggingface/trl/pull/681
Update docs on gms8k by @vwxyzjn in https://github.com/huggingface/trl/pull/711
[Docs] Fix sft mistakes by @younesbelkada in https://github.com/huggingface/trl/pull/717
Fix: RuntimeError: 'weight' must be 2-D issue by @jp1924 in https://github.com/huggingface/trl/pull/687
Add pyproject.toml by @mnoukhov in https://github.com/huggingface/trl/pull/690
[core] Bump peft to 0.4.0 by @younesbelkada in https://github.com/huggingface/trl/pull/720
Refactor RewardTrainer hyperparameters into dedicated dataclass by @lewtun in https://github.com/huggingface/trl/pull/726
Fix DeepSpeed ZeRO-3 in PPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/730
[SFTTrainer] Check correctly for condition by @younesbelkada in https://github.com/huggingface/trl/pull/668
Add epsilon to score normalization by @zfang in https://github.com/huggingface/trl/pull/727
Enable gradient checkpointing to be disabled for reward modelling by @lewtun in https://github.com/huggingface/trl/pull/725
[DPO] fixed metrics typo by @kashif in https://github.com/huggingface/trl/pull/743
Seq2Seq model support for DPO by @gaetanlop in https://github.com/huggingface/trl/pull/586
[DPO] fix ref_model by @i4never in https://github.com/huggingface/trl/pull/745
[core] Fix import of randn_tensor by @younesbelkada in https://github.com/huggingface/trl/pull/751
Add benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/752
update to prepare_model_for_kbit_training by @mnoukhov in https://github.com/huggingface/trl/pull/728
benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/755
EOS token processing for multi-turn DPO by @natolambert in https://github.com/huggingface/trl/pull/741
Extend DeepSpeed integration to ZeRO-{1,2,3} by @lewtun in https://github.com/huggingface/trl/pull/758
Imrpove benchmark ci by @vwxyzjn in https://github.com/huggingface/trl/pull/760
[PPOTrainer] - add comment of zero masking (from second query token) by @zuoxingdong in https://github.com/huggingface/trl/pull/763
Refactor and benchmark by @vwxyzjn in https://github.com/huggingface/trl/pull/662
Benchmark CI (actual) by @vwxyzjn in https://github.com/huggingface/trl/pull/754
docs: add initial version of docs for PPOTrainer by @davidberenstein1957 in https://github.com/huggingface/trl/pull/665
Support fork in benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/764
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/773
Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/775
Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/776
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/777
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/778
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/779
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/780
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/781
Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/782
Ensure RewardConfig is backwards compatible by @lewtun in https://github.com/huggingface/trl/pull/748
Temp benchmark ci dir by @vwxyzjn in https://github.com/huggingface/trl/pull/765
Changed the default value of the log_with argument by @filippobistaffa in https://github.com/huggingface/trl/pull/792
Add default Optim to DPO example by @natolambert in https://github.com/huggingface/trl/pull/759
Add margin to RM training by @jvhoffbauer in https://github.com/huggingface/trl/pull/719
[DPO] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in https://github.com/huggingface/trl/pull/799
Add deepspeed experiment by @vwxyzjn in https://github.com/huggingface/trl/pull/795
[Docs] Clarify PEFT docs by @younesbelkada in https://github.com/huggingface/trl/pull/797
Fix docs bug on sft_trainer.mdx by @younesbelkada in https://github.com/huggingface/trl/pull/808
[PPOTrainer] Fixes ppo trainer generate nit by @younesbelkada in https://github.com/huggingface/trl/pull/798
Allow passing the token_ids as instruction_template in DataCollatorForCompletionOnlyLM by @devxpy in https://github.com/huggingface/trl/pull/749
init custom eval loop for further DPO evals by @natolambert in https://github.com/huggingface/trl/pull/766
Add RMSProp back to DPO by @natolambert in https://github.com/huggingface/trl/pull/821
[DPO] add option for compute_metrics in DPOTrainer by @kashif in https://github.com/huggingface/trl/pull/822
Small fixes to the PPO trainer doc and script. by @namin in https://github.com/huggingface/trl/pull/811
Unify sentiment documentation by @vwxyzjn in https://github.com/huggingface/trl/pull/803
Fix DeepSpeed ZeRO-{1,2} for DPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/825
Set trust remote code to false by default by @lewtun in https://github.com/huggingface/trl/pull/833
[MINOR:TYPOS] Update README.md by @cakiki in https://github.com/huggingface/trl/pull/829
Clarify docstrings, help messages, assert messages in merge_peft_adapter.py by @larekrow in https://github.com/huggingface/trl/pull/838
add DDPO to index by @lvwerra in https://github.com/huggingface/trl/pull/826
Raise error in create_reference_model() when ZeRO-3 is enabled by @lewtun in https://github.com/huggingface/trl/pull/840
Use uniform config by @vwxyzjn in https://github.com/huggingface/trl/pull/817
Give lewtun power by @lvwerra in https://github.com/huggingface/trl/pull/856
Standardise example scripts by @lewtun in https://github.com/huggingface/trl/pull/842
Fix version check in import_utils.py by @adampauls in https://github.com/huggingface/trl/pull/853
dont use get_peft_model if model is already peft by @abhishekkrthakur in https://github.com/huggingface/trl/pull/857
[core] Fix import issues by @younesbelkada in https://github.com/huggingface/trl/pull/859
Support both old and new diffusers import path by @osanseviero in https://github.com/huggingface/trl/pull/843

New Contributors

@backpropper made their first contribution in https://github.com/huggingface/trl/pull/681
@jp1924 made their first contribution in https://github.com/huggingface/trl/pull/687
@i4never made their first contribution in https://github.com/huggingface/trl/pull/745
@zuoxingdong made their first contribution in https://github.com/huggingface/trl/pull/763
@davidberenstein1957 made their first contribution in https://github.com/huggingface/trl/pull/665
@filippobistaffa made their first contribution in https://github.com/huggingface/trl/pull/792
@devxpy made their first contribution in https://github.com/huggingface/trl/pull/749
@namin made their first contribution in https://github.com/huggingface/trl/pull/811
@cakiki made their first contribution in https://github.com/huggingface/trl/pull/829
@larekrow made their first contribution in https://github.com/huggingface/trl/pull/838
@adampauls made their first contribution in https://github.com/huggingface/trl/pull/853
@abhishekkrthakur made their first contribution in https://github.com/huggingface/trl/pull/857
@osanseviero made their first contribution in https://github.com/huggingface/trl/pull/843

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2

Aug 30, 2023

v0.7.1: Patch release

Patch release: fix bug with `PPOTrainer` and `log_stats`

Fixed a bug with log_stats of PPOTrainer to avoid breaking behaviour

[PPOTrainer] A workaround for failing log_stats by @younesbelkada in https://github.com/huggingface/trl/pull/708

What's Changed

Release: v0.7.0 by @younesbelkada in https://github.com/huggingface/trl/pull/706
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/707

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.0...v0.7.1

v0.7.0: Text Environments, Agents & Tools

Text environments, LLMs with tools and agents!

Text environments provide a learning ground for language agents. It allows a language model to use tools to accomplish a task such as using a Python interpreter to answer math questions or using a search index for trivia questions. Having access to tools allows language models to solve tasks that would be very hard for the models itself but can be trivial for the appropriate tools.

We are excited to bring to the community a complete set of functionalities and full examples to train LLMs to use tools!

Check out the documentation page here and few examples below:

What's Changed

Release: v0.6.0 by @younesbelkada in https://github.com/huggingface/trl/pull/684
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/685
[DPO] fix DPO ref_model=None by @kashif in https://github.com/huggingface/trl/pull/703
[Docs] fix example README.md by @kashif in https://github.com/huggingface/trl/pull/705
TextEnvironments by @lvwerra in https://github.com/huggingface/trl/pull/424

Full Changelog: https://github.com/huggingface/trl/compare/v0.6.0...v0.7.0

Aug 25, 2023

DDPO for diffusion models

We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models. Read more about it directly in the docs.

Before	After DDPO finetuning
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_squirrel.png"/></div>	<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_squirrel.png"/></div>
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_starfish.png"/></div>	<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_starfish.png"/></div>

Denoising Diffusion Policy Optimization by @metric-space in https://github.com/huggingface/trl/pull/508

Bug fixes and other enhancements

The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below

What's Changed

Release: v0.5.0 by @younesbelkada in https://github.com/huggingface/trl/pull/607
Set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/608
[Modeling] Add token support for hf_hub_download by @younesbelkada in https://github.com/huggingface/trl/pull/604
Add docs explaining logged metrics by @vwxyzjn in https://github.com/huggingface/trl/pull/616
[DPO] stack-llama-2 training scripts by @kashif in https://github.com/huggingface/trl/pull/611
Use log_with argument in SFT example by @hitorilabs in https://github.com/huggingface/trl/pull/620
Allow already tokenized sequences for response_template in DataCollatorForCompletionOnlyLM by @ivsanro1 in https://github.com/huggingface/trl/pull/622
Improve docs by @lvwerra in https://github.com/huggingface/trl/pull/612
Move repo by @lvwerra in https://github.com/huggingface/trl/pull/628
Add score scaling/normalization/clipping by @zfang in https://github.com/huggingface/trl/pull/560
Disable dropout in DPO Training by @NouamaneTazi in https://github.com/huggingface/trl/pull/639
Add checks on backward batch size by @vwxyzjn in https://github.com/huggingface/trl/pull/651
Resolve various typos throughout the docs by @tomaarsen in https://github.com/huggingface/trl/pull/654
Update README.md by @Santosh-Gupta in https://github.com/huggingface/trl/pull/657
Allow for ref_model=None in DPOTrainer by @vincentmin in https://github.com/huggingface/trl/pull/640
Add more args to SFT example by @photomz in https://github.com/huggingface/trl/pull/642
Handle potentially long sequences with DataCollatorForCompletionOnlyLM by @tannonk in https://github.com/huggingface/trl/pull/644
[sft_llama2] Add check of arguments by @younesbelkada in https://github.com/huggingface/trl/pull/660
Fix DPO blogpost thumbnail by @lvwerra in https://github.com/huggingface/trl/pull/673
propagating eval_batch_size to TrainingArguments by @rahuljha in https://github.com/huggingface/trl/pull/675
[CI] Fix unmutable TrainingArguments issue by @younesbelkada in https://github.com/huggingface/trl/pull/676
Update sft_llama2.py by @msaad02 in https://github.com/huggingface/trl/pull/678
fix PeftConfig loading from a remote repo. by @w32zhong in https://github.com/huggingface/trl/pull/649
Simplify immutable TrainingArgs fix using dataclasses.replace by @tomaarsen in https://github.com/huggingface/trl/pull/682

New Contributors

@hitorilabs made their first contribution in https://github.com/huggingface/trl/pull/620
@ivsanro1 made their first contribution in https://github.com/huggingface/trl/pull/622
@zfang made their first contribution in https://github.com/huggingface/trl/pull/560
@NouamaneTazi made their first contribution in https://github.com/huggingface/trl/pull/639
@Santosh-Gupta made their first contribution in https://github.com/huggingface/trl/pull/657
@vincentmin made their first contribution in https://github.com/huggingface/trl/pull/640
@photomz made their first contribution in https://github.com/huggingface/trl/pull/642
@tannonk made their first contribution in https://github.com/huggingface/trl/pull/644
@rahuljha made their first contribution in https://github.com/huggingface/trl/pull/675
@msaad02 made their first contribution in https://github.com/huggingface/trl/pull/678
@w32zhong made their first contribution in https://github.com/huggingface/trl/pull/649

Full Changelog: https://github.com/huggingface/trl/compare/v0.5.0...v0.6.0

Aug 2, 2023

v0.5.0 DPOTrainer and multiple bug fixes on PPOTrainer and SFTTrainer

This release includes multiple important bugfixes (SFTTrainer, PPOTrainer), the release also extends the current DataCollatorForCompletionOnlyLM to support chat-like training.

DPO Trainer

The DPO algorithm (Direct Policy Optimization) has been introduced by Rafailov et al. in this paper and introduces a way of performing RL training without having to rely on a reward model. The DPOTrainer is now part of TRL library for anyone that wants to use it thanks to the amazing contributors!

DPO Trainer by @kashif in https://github.com/lvwerra/trl/pull/416
[DPO] make sure all the concated batches are on same device by @kashif in https://github.com/lvwerra/trl/pull/528
[DPO] remove response/pairs from the DPO side by @kashif in https://github.com/lvwerra/trl/pull/540
[DPO] remove unnecessary batch size arg to Collator by @kashif in https://github.com/lvwerra/trl/pull/554
[DPO] Resolve logging for DPOTrainer by @tomaarsen in https://github.com/lvwerra/trl/pull/570

What's Changed

Reward trainer multi-gpu eval bug by @rlindskog in https://github.com/lvwerra/trl/pull/513
Use local process index for _get_current_device() by @lewtun in https://github.com/lvwerra/trl/pull/515

Extending the `DataCollatorForCompletionOnlyLM`

You can now mask out the users prompts in the DataCollatorForCompletionOnlyLM data collator and train only on chat completions. Check out the PR below or the appropriate section on the documentation to learn more about it!

Introducing DataCollatorForChatCompletionOnlyLM by @gaetanlop in https://github.com/lvwerra/trl/pull/456

Important bug fixes

Multiple bugs on the supported trainers have been raised by the community and fixed in the below PRs

[core] Fix offline case by @younesbelkada in https://github.com/lvwerra/trl/pull/538
Relax reward trainer constraint by @younesbelkada in https://github.com/lvwerra/trl/pull/539
ADD: num_proc to SFTTrainer by @BramVanroy in https://github.com/lvwerra/trl/pull/547
[SFTTrainer] Add warning for wrong padding_side by @younesbelkada in https://github.com/lvwerra/trl/pull/550
Minor typo and whitespace fixes by @tmm1 in https://github.com/lvwerra/trl/pull/559
[SFTTrainer] Add epochs and num steps on CLI by @younesbelkada in https://github.com/lvwerra/trl/pull/562
Add DataCollatorForCompletionOnlyLM in the docs by @younesbelkada in https://github.com/lvwerra/trl/pull/565
Add comment to explain how the sentiment pipeline is used to run the … by @jvhoffbauer in https://github.com/lvwerra/trl/pull/555
Fix model output dim in reward trainer example by @liutianlin0121 in https://github.com/lvwerra/trl/pull/566
Computes the KL penalty using the entire distribution by @edbeeching in https://github.com/lvwerra/trl/pull/541
Add missing max_seq_length arg to example sft_trainer.py by @SharkWipf in https://github.com/lvwerra/trl/pull/585
[PPO] fix corner cases with PPO batch size and forward_batch_size by @younesbelkada in https://github.com/lvwerra/trl/pull/563
Update the example sft_trainer.py by @ZeusFSX in https://github.com/lvwerra/trl/pull/587
docs: Replace SFTTrainer with RewardTrainer in comment by @tomaarsen in https://github.com/lvwerra/trl/pull/589
Fix comparison in DataCollatorForCompletionOnlyLM (#588) by @RyujiTamaki in https://github.com/lvwerra/trl/pull/594
refactor grad accum by @vwxyzjn in https://github.com/lvwerra/trl/pull/546

Big refactor of examples and documentation

The examples and documentation has been refactored, check the PRs below for more details

[examples] Big refactor of examples and documentation by @younesbelkada in https://github.com/lvwerra/trl/pull/509
[examples] Fix sentiment nit by @younesbelkada in https://github.com/lvwerra/trl/pull/517
[examples] make the sft script more modulable by @younesbelkada in https://github.com/lvwerra/trl/pull/543
Add use_auth_token arg to sft_trainer example by @corey-lambda in https://github.com/lvwerra/trl/pull/544

New Contributors

@rlindskog made their first contribution in https://github.com/lvwerra/trl/pull/513
@corey-lambda made their first contribution in https://github.com/lvwerra/trl/pull/544
@tmm1 made their first contribution in https://github.com/lvwerra/trl/pull/559
@jvhoffbauer made their first contribution in https://github.com/lvwerra/trl/pull/555
@liutianlin0121 made their first contribution in https://github.com/lvwerra/trl/pull/566
@SharkWipf made their first contribution in https://github.com/lvwerra/trl/pull/585
@ZeusFSX made their first contribution in https://github.com/lvwerra/trl/pull/587
@gaetanlop made their first contribution in https://github.com/lvwerra/trl/pull/456
@RyujiTamaki made their first contribution in https://github.com/lvwerra/trl/pull/594

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.7...v0.5.0

Jul 13, 2023

Patch release: `SFTTrainer` and `PPOTrainer` bug fixes

What's Changed

Make shuffle optional by @lopez-hector in https://github.com/lvwerra/trl/pull/457
Pre-commit by @vwxyzjn in https://github.com/lvwerra/trl/pull/448
Debug the tortuous logic in _prepare_dataset function by @BeibinLi in https://github.com/lvwerra/trl/pull/464
[CI] Fix CI RM by @younesbelkada in https://github.com/lvwerra/trl/pull/468
Update sft_trainer.py by @JulesGM in https://github.com/lvwerra/trl/pull/474
Refactor README by @younesbelkada in https://github.com/lvwerra/trl/pull/460
add ratio threshold to avoid spikes by @lvwerra in https://github.com/lvwerra/trl/pull/488
fix typo in reward_modeling.py by @csyourui in https://github.com/lvwerra/trl/pull/494
FIX: contributing guidelines command by @BramVanroy in https://github.com/lvwerra/trl/pull/493
Remove padding in batched generation. by @lvwerra in https://github.com/lvwerra/trl/pull/487
Adds some options to stabilize the KL penalty by @edbeeching in https://github.com/lvwerra/trl/pull/486
correctly implement gradient checkpointing to multi-adapter example by @mnoukhov in https://github.com/lvwerra/trl/pull/479
Disable mlm by default in DataCollatorForCompletionOnlyLM, add ignore_index and docstring by @BramVanroy in https://github.com/lvwerra/trl/pull/476
Use float instead of double to avoid issues with MPS device by @younesbelkada in https://github.com/lvwerra/trl/pull/499
[PPOTrainer] Add prefix tuning support by @younesbelkada in https://github.com/lvwerra/trl/pull/501
[PPOTrainer] Add prompt tuning support on TRL by @younesbelkada in https://github.com/lvwerra/trl/pull/500
[SFTTrainer] Fix the sequence length check of SFTTrainer by @younesbelkada in https://github.com/lvwerra/trl/pull/512

New Contributors

@lopez-hector made their first contribution in https://github.com/lvwerra/trl/pull/457
@BeibinLi made their first contribution in https://github.com/lvwerra/trl/pull/464
@csyourui made their first contribution in https://github.com/lvwerra/trl/pull/494
@BramVanroy made their first contribution in https://github.com/lvwerra/trl/pull/493

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.6...v0.4.7

Jun 23, 2023

Patch release

Patch release to fix a bug on google colab with PPOTrainer & PPOConfig + wandb

What's Changed

Fix google colab issue by @younesbelkada in https://github.com/lvwerra/trl/pull/459

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.5...v0.4.6

Patch release 1 - `SFTTrainer` enhancements and fixes

This patch release adds multiple fixes for the SFTTrainer and enhancements. Another patch release is coming for fixing an issue with PPOTrainer and Google Colab combined with wandb logging

What's Changed

Add slurm utility by @vwxyzjn in https://github.com/lvwerra/trl/pull/412
Enable autotag feature w/ wandb by @vwxyzjn in https://github.com/lvwerra/trl/pull/411
[doc build] Use secrets by @mishig25 in https://github.com/lvwerra/trl/pull/420
Update test_reward_trainer.py by @younesbelkada in https://github.com/lvwerra/trl/pull/421
best-of-n sampler class by @metric-space in https://github.com/lvwerra/trl/pull/375
handle the offline case by @younesbelkada in https://github.com/lvwerra/trl/pull/431
Fix correct gradient accumulation by @younesbelkada in https://github.com/lvwerra/trl/pull/407
Drop support for Python 3.7 by @younesbelkada in https://github.com/lvwerra/trl/pull/441
[SFTTrainer] Relax dataset constraints by @younesbelkada in https://github.com/lvwerra/trl/pull/442
[SFTTrainer] Fix non packed dataset by @younesbelkada in https://github.com/lvwerra/trl/pull/444
[core] Add stale bot by @younesbelkada in https://github.com/lvwerra/trl/pull/447
[SFTTrainer] Introducing DataCollatorForCompletionOnlyLM by @younesbelkada in https://github.com/lvwerra/trl/pull/445
[ConstantLengthDataset] Fix packed dataset issue by @younesbelkada in https://github.com/lvwerra/trl/pull/452
Update accelerate arg passthrourgh for tensorboard logging to reflect logging_dir deprecation. by @jganitkevitch in https://github.com/lvwerra/trl/pull/437
Multi adapter RL (MARL) - a single model for RM & Value Head by @younesbelkada in https://github.com/lvwerra/trl/pull/373

New Contributors

@jganitkevitch made their first contribution in https://github.com/lvwerra/trl/pull/437

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.4...v0.4.5

Jun 8, 2023

Patch release

[core] unpin accelerate by @younesbelkada in https://github.com/lvwerra/trl/pull/418

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.3...v0.4.4

0.4.3 Patch release

Patch release - pin accelerate version

Skip flaky test until next transformers release by @younesbelkada in https://github.com/lvwerra/trl/pull/410
Pin accelerate version by @younesbelkada in https://github.com/lvwerra/trl/pull/414

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.2...v0.4.3

Jun 7, 2023

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing `SFTTrainer` and `RewardTrainer`

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

[core] officially support SFT (Supervised Finetuning) by @younesbelkada in https://github.com/lvwerra/trl/pull/323
[SFT] Fix sft issues by @younesbelkada in https://github.com/lvwerra/trl/pull/336
[docs] fix SFT doc by @younesbelkada in https://github.com/lvwerra/trl/pull/367
[core] Officially Support Reward Modeling by @younesbelkada in https://github.com/lvwerra/trl/pull/303
Resolve broken evaluation/prediction for RewardTrainer by @tomaarsen in https://github.com/lvwerra/trl/pull/404

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

[core] Add 4bit QLora by @younesbelkada in https://github.com/lvwerra/trl/pull/383
[bnb] fix 4 bit SFT by @younesbelkada in https://github.com/lvwerra/trl/pull/396

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

StackLLaMA: correctly merge peft model by @mnoukhov in https://github.com/lvwerra/trl/pull/398
StackLlama: fixed RL training and added args by @mnoukhov in https://github.com/lvwerra/trl/pull/400
Fixed some type annotations of trl.trainer.PPoTrainer by @JulesGM in https://github.com/lvwerra/trl/pull/392
StackLLaMA: fix supervised finetuning and reward model training by @mnoukhov in https://github.com/lvwerra/trl/pull/399

Bug fixes and improvements

[core] refactor peft API by @younesbelkada in https://github.com/lvwerra/trl/pull/231
Batched generation by @lvwerra in https://github.com/lvwerra/trl/pull/228
Reduce memory consumption in batched_forward_pass by @ohashi56225 in https://github.com/lvwerra/trl/pull/234
[core] Add warning when negative KL by @younesbelkada in https://github.com/lvwerra/trl/pull/239
adds early stopping by @edbeeching in https://github.com/lvwerra/trl/pull/238
PPO config init is bloated by @GauravVirmani in https://github.com/lvwerra/trl/pull/241
feat(ci): enable pip cache by @SauravMaheshkar in https://github.com/lvwerra/trl/pull/198
Improve logging for PPO + Docs page by @natolambert in https://github.com/lvwerra/trl/pull/243
Fix typo by @heya5 in https://github.com/lvwerra/trl/pull/253
Using batched generate in sentiment scripts by @GauravVirmani in https://github.com/lvwerra/trl/pull/249
[core] Fix DeepSpeed zero-3 issue by @younesbelkada in https://github.com/lvwerra/trl/pull/182
[distributed] Fix early stopping and DP by @younesbelkada in https://github.com/lvwerra/trl/pull/254
[core] Fix ds issue by @younesbelkada in https://github.com/lvwerra/trl/pull/260
Add LlaMa in tests + create_reference_model by @younesbelkada in https://github.com/lvwerra/trl/pull/261
Use active model to generate response in example on README (#269) by @rmill040 in https://github.com/lvwerra/trl/pull/271
stack-llama by @edbeeching in https://github.com/lvwerra/trl/pull/273
Adding pointer back to Meta's LLaMA. by @meg-huggingface in https://github.com/lvwerra/trl/pull/277
fix doc string problem in ppo trainer loss function by @thuwyh in https://github.com/lvwerra/trl/pull/279
Add LLaMA tutorial to docs by @natolambert in https://github.com/lvwerra/trl/pull/278
Fix swapped helper texts by @philipp-classen in https://github.com/lvwerra/trl/pull/284
fix typo in gpt2-sentiment.ipynb by @eltociear in https://github.com/lvwerra/trl/pull/293
add functionality to push best models to the hub during training by @Bearnardd in https://github.com/lvwerra/trl/pull/275
Small improvements / fixes to toxicity example by @natolambert in https://github.com/lvwerra/trl/pull/266
Fix arguments description by @lvzii in https://github.com/lvwerra/trl/pull/298
[t5] Fix negative kl issue by @younesbelkada in https://github.com/lvwerra/trl/pull/262
Log Token distribution of Query / Response by @natolambert in https://github.com/lvwerra/trl/pull/295
clean examples folder by @natolambert in https://github.com/lvwerra/trl/pull/294
fixed typo in error message by @soerenarlt in https://github.com/lvwerra/trl/pull/312
fix DS for peft ref_model in ppo trainer by @halfrot in https://github.com/lvwerra/trl/pull/309
[CI] Fix broken tests by @younesbelkada in https://github.com/lvwerra/trl/pull/318
[Docs] Add details on multi-GPU / multi-node by @younesbelkada in https://github.com/lvwerra/trl/pull/320
Give a key to the wandb PPOConfig config entry by @JulesGM in https://github.com/lvwerra/trl/pull/315
added doc for using torch.distributed.launch/run by @oroojlooy in https://github.com/lvwerra/trl/pull/324
Fix argument's description by @vinhkhuc in https://github.com/lvwerra/trl/pull/339
stack_llama: update instructions in README, fix broken _get_submodules and save tokenizer by @teticio in https://github.com/lvwerra/trl/pull/358
stack_llama: add parameter to control max_length (to mitigate OOM errors) by @teticio in https://github.com/lvwerra/trl/pull/359
[PPO] Relax negative KL constraint by @younesbelkada in https://github.com/lvwerra/trl/pull/352
[PPOTrainer] Fix tensorboard issue by @younesbelkada in https://github.com/lvwerra/trl/pull/330
140/best n sampling by @metric-space in https://github.com/lvwerra/trl/pull/326
Fix bug when loading local peft model by @Opdoop in https://github.com/lvwerra/trl/pull/342
add is_trainable in kwargs by @Opdoop in https://github.com/lvwerra/trl/pull/363
Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements by @teticio in https://github.com/lvwerra/trl/pull/366
Delete test_training.py by @younesbelkada in https://github.com/lvwerra/trl/pull/371
[core] Fix warning issue by @younesbelkada in https://github.com/lvwerra/trl/pull/377
Update customization.mdx by @binganao in https://github.com/lvwerra/trl/pull/390
fix dataloader typo in ppo_trainer.py by @LZY-the-boys in https://github.com/lvwerra/trl/pull/389
from_pretrain with peft adapter on the hub (# 379) by @glerzing in https://github.com/lvwerra/trl/pull/380
keep state_dict kwargs instead of popping it in save_pretrained by @rizar in https://github.com/lvwerra/trl/pull/393
Remove unused imports in docs. by @vwxyzjn in https://github.com/lvwerra/trl/pull/406

New Contributors

@ohashi56225 made their first contribution in https://github.com/lvwerra/trl/pull/234
@GauravVirmani made their first contribution in https://github.com/lvwerra/trl/pull/241
@SauravMaheshkar made their first contribution in https://github.com/lvwerra/trl/pull/198
@heya5 made their first contribution in https://github.com/lvwerra/trl/pull/253
@rmill040 made their first contribution in https://github.com/lvwerra/trl/pull/271
@thuwyh made their first contribution in https://github.com/lvwerra/trl/pull/279
@philipp-classen made their first contribution in https://github.com/lvwerra/trl/pull/284
@Bearnardd made their first contribution in https://github.com/lvwerra/trl/pull/275
@lvzii made their first contribution in https://github.com/lvwerra/trl/pull/298
@soerenarlt made their first contribution in https://github.com/lvwerra/trl/pull/312
@halfrot made their first contribution in https://github.com/lvwerra/trl/pull/309
@oroojlooy made their first contribution in https://github.com/lvwerra/trl/pull/324
@vinhkhuc made their first contribution in https://github.com/lvwerra/trl/pull/339
@teticio made their first contribution in https://github.com/lvwerra/trl/pull/358
@metric-space made their first contribution in https://github.com/lvwerra/trl/pull/326
@Opdoop made their first contribution in https://github.com/lvwerra/trl/pull/342
@binganao made their first contribution in https://github.com/lvwerra/trl/pull/390
@LZY-the-boys made their first contribution in https://github.com/lvwerra/trl/pull/389
@glerzing made their first contribution in https://github.com/lvwerra/trl/pull/380
@rizar made their first contribution in https://github.com/lvwerra/trl/pull/393
@mnoukhov made their first contribution in https://github.com/lvwerra/trl/pull/398
@tomaarsen made their first contribution in https://github.com/lvwerra/trl/pull/404
@vwxyzjn made their first contribution in https://github.com/lvwerra/trl/pull/406

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2

Mar 17, 2023

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

This release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.

Naive Pipeline Parallelism support

Let's support naive Pipeline Parallelism by @younesbelkada in https://github.com/lvwerra/trl/pull/210

We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model

`peft` Data Parallelism support

[peft] Fix DP issues by @younesbelkada in https://github.com/lvwerra/trl/pull/221
[core] fix DP issue by @younesbelkada in https://github.com/lvwerra/trl/pull/222

There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)

Memory optimization

Your training runs can be now much more memory efficient thanks to few tricks / bug fixes: Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues

Grad accumulation and memory bugfix by @edbeeching in https://github.com/lvwerra/trl/pull/220
adds a missing detach to the ratio by @edbeeching in https://github.com/lvwerra/trl/pull/224

Pytorch 2.0 fixes

This release also includes minor fixes related to PyTorch 2.0 release

[test] attempt to fix CI test for PT 2.0 by @younesbelkada in https://github.com/lvwerra/trl/pull/225

What's Changed

adds sentiment example for a 20b model by @edbeeching in https://github.com/lvwerra/trl/pull/208
Update README.md blog post link by @TeamDman in https://github.com/lvwerra/trl/pull/212
spell mistakes by @k-for-code in https://github.com/lvwerra/trl/pull/213
spell corrections by @k-for-code in https://github.com/lvwerra/trl/pull/214
Small changes when integrating into H4 by @natolambert in https://github.com/lvwerra/trl/pull/216

New Contributors

@TeamDman made their first contribution in https://github.com/lvwerra/trl/pull/212
@k-for-code made their first contribution in https://github.com/lvwerra/trl/pull/213

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.0...v0.4.1

Mar 9, 2023

`v0.4.0`: `peft` integration

Apply RLHF and fine-tune your favorite large model on consumer GPU using peft and trl ! Share also easily your trained RLHF adapters on the Hub with few lines of code

With this integration you can train gpt-neo-x (20B parameter model - 40GB in bfloat16) on a 24GB consumer GPU!

What's Changed

Allow running evaluate-toxicity with cpu by @jordimas in https://github.com/lvwerra/trl/pull/195
[core] Fix quality issue by @younesbelkada in https://github.com/lvwerra/trl/pull/197
Add 1.12.1 torch compatibility in sum method by @PanchenkoYehor in https://github.com/lvwerra/trl/pull/190
peft integration by @edbeeching in https://github.com/lvwerra/trl/pull/163
[core] Update dependency by @younesbelkada in https://github.com/lvwerra/trl/pull/206

New Contributors

@PanchenkoYehor made their first contribution in https://github.com/lvwerra/trl/pull/190

Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.1...v0.4.0

Mar 2, 2023

What's Changed

Clarifications of acronyms and initialisms by @meg-huggingface in https://github.com/lvwerra/trl/pull/185
Update detoxifying_a_lm.mdx by @younesbelkada in https://github.com/lvwerra/trl/pull/186
Fix reference to example by @jordimas in https://github.com/lvwerra/trl/pull/184

New Contributors

@meg-huggingface made their first contribution in https://github.com/lvwerra/trl/pull/185
@jordimas made their first contribution in https://github.com/lvwerra/trl/pull/184

Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.0...v0.3.1

Mar 1, 2023

What's Changed

fix style, typos, license by @natolambert in https://github.com/lvwerra/trl/pull/103
fix re-added file by @natolambert in https://github.com/lvwerra/trl/pull/116
add citation by @natolambert in https://github.com/lvwerra/trl/pull/124
add manual seeding for RL experiments by @natolambert in https://github.com/lvwerra/trl/pull/118
add set_seed to init.py by @lvwerra in https://github.com/lvwerra/trl/pull/127
update docs with Seq2seq models, set_seed, and create_reference_model by @lvwerra in https://github.com/lvwerra/trl/pull/128
[bug] Update gpt2-sentiment.py by @younesbelkada in https://github.com/lvwerra/trl/pull/132
Fix Sentiment control notebook by @lvwerra in https://github.com/lvwerra/trl/pull/126
realign values by @lvwerra in https://github.com/lvwerra/trl/pull/137
Change unclear variables & fix typos by @natolambert in https://github.com/lvwerra/trl/pull/134
Feat/reward summarization example by @TristanThrush in https://github.com/lvwerra/trl/pull/115
[core] Small refactor of forward pass by @younesbelkada in https://github.com/lvwerra/trl/pull/136
[tests] Add correct repo name by @younesbelkada in https://github.com/lvwerra/trl/pull/138
fix forward batching for seq2seq and right padding models. by @lvwerra in https://github.com/lvwerra/trl/pull/139
fix bug in batched_forward_pass by @ArvinZhuang in https://github.com/lvwerra/trl/pull/144
[core] Add torch_dtype support by @younesbelkada in https://github.com/lvwerra/trl/pull/147
[core] Fix dataloader issue by @younesbelkada in https://github.com/lvwerra/trl/pull/154
[core] enable bf16 training by @younesbelkada in https://github.com/lvwerra/trl/pull/156
[core] fix saving multi-gpu by @younesbelkada in https://github.com/lvwerra/trl/pull/157
Added imports by @BirgerMoell in https://github.com/lvwerra/trl/pull/159
Add CITATION.cff by @kashif in https://github.com/lvwerra/trl/pull/169
[Doc] Add how to use Lion optimizer by @younesbelkada in https://github.com/lvwerra/trl/pull/152
policy kl [old | new] by @kashif in https://github.com/lvwerra/trl/pull/168
add minibatching by @lvwerra in https://github.com/lvwerra/trl/pull/153
fix bugs in tutorial by @shizhediao in https://github.com/lvwerra/trl/pull/175
[core] Add max_grad_norm support by @younesbelkada in https://github.com/lvwerra/trl/pull/177
Add toxcitiy example by @younesbelkada in https://github.com/lvwerra/trl/pull/162
[Docs] Fix barplot by @younesbelkada in https://github.com/lvwerra/trl/pull/181

New Contributors

@natolambert made their first contribution in https://github.com/lvwerra/trl/pull/103
@ArvinZhuang made their first contribution in https://github.com/lvwerra/trl/pull/144
@BirgerMoell made their first contribution in https://github.com/lvwerra/trl/pull/159
@kashif made their first contribution in https://github.com/lvwerra/trl/pull/169
@shizhediao made their first contribution in https://github.com/lvwerra/trl/pull/175

Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.1...v0.3.0

Jan 25, 2023

What's Changed

Update customization.mdx by @younesbelkada in https://github.com/lvwerra/trl/pull/109
add datasets as a dependancy by @lvwerra in https://github.com/lvwerra/trl/pull/110
[Docs] Add hlinks to scripts & notebooks by @younesbelkada in https://github.com/lvwerra/trl/pull/111
Fix Mapping in core for Python 3.10 by @lvwerra in https://github.com/lvwerra/trl/pull/112

Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.0...v0.2.1

Previous 2 3 4 5 Next

Similar releases

Other sources from this team

Similar sources

Latest

v1.2.0

Source

@huggingface/trl

Tracking Since

Jan 25, 2023

Last checked Apr 19, 2026

.json·.md·.atom

TRL

Patch release: Multi-tag instead of single tags for xxxTrainer

What's Changed

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

Automatic xxxTrainer tagging on the Hub

unsloth 🤝 TRL

What's Changed

New Contributors

Patch Release

What's Changed

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

IterativeTrainer

NEFTune

Major bugfixes

DPOTrainer enhancements and fixes

What's Changed

New Contributors

0.7.2: Flash Attention documentation and Minor bugfixes

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

Patch release: fix bug with PPOTrainer and log_stats

What's Changed

Text environments, LLMs with tools and agents!

What's Changed

DDPO for diffusion models

Bug fixes and other enhancements

What's Changed

New Contributors

v0.5.0 DPOTrainer and multiple bug fixes on PPOTrainer and SFTTrainer

DPO Trainer

What's Changed

Extending the DataCollatorForCompletionOnlyLM

Important bug fixes

Big refactor of examples and documentation

New Contributors

Patch release: SFTTrainer and PPOTrainer bug fixes

What's Changed

New Contributors

Patch release

What's Changed

Patch release 1 - SFTTrainer enhancements and fixes

What's Changed

New Contributors

Patch release

0.4.3 Patch release

QLoRA RLHF, SFT Trainer and RewardTrainer

Introducing SFTTrainer and RewardTrainer

QLoRA integration

Updated StackLlama example

Bug fixes and improvements

New Contributors

Large models training, Naive Pipeline Parallelism, peft Data Parallelism support and distributed training bug fixes

Naive Pipeline Parallelism support

peft Data Parallelism support

Memory optimization

Pytorch 2.0 fixes

What's Changed

New Contributors

v0.4.0: peft integration

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

More from this team

Similar releases

Other sources from this team

Similar sources

More from this team

Similar releases

Other sources from this team

Similar sources

Patch release: Multi-tag instead of single tags for `xxxTrainer`

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

Automatic `xxxTrainer` tagging on the Hub

`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

How to use Flash Attention with `SFTTrainer`:

Patch release: fix bug with `PPOTrainer` and `log_stats`

Extending the `DataCollatorForCompletionOnlyLM`

Patch release: `SFTTrainer` and `PPOTrainer` bug fixes

Patch release 1 - `SFTTrainer` enhancements and fixes

Introducing `SFTTrainer` and `RewardTrainer`

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

`peft` Data Parallelism support

`v0.4.0`: `peft` integration