Sun

Mon

Tue

Wed

Thu

Fri

Sat

AprMayJunJulAugSepOctNovDecJanFebMarApr

Less

Releases10Avg3/moVersionsv0.27.0 → v1.2.0

Oct 10, 2024

What's Changed

[GKD] interpolate in prob. space by @kashif in https://github.com/huggingface/trl/pull/2204
Drop decoder_input_ids in DPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/2208
Update incorrect data processing in DataCollatorForChatML by @ruijunfeng in https://github.com/huggingface/trl/pull/2172

New Contributors

@ruijunfeng made their first contribution in https://github.com/huggingface/trl/pull/2172

Full Changelog: https://github.com/huggingface/trl/compare/v0.11.2...v0.11.3

Oct 7, 2024

What's Changed

Fix RLOO checkpointing by @bartoszzuk in https://github.com/huggingface/trl/pull/2114

Full Changelog: https://github.com/huggingface/trl/compare/v0.11.1...v0.11.2

Sep 24, 2024

Bug fix

allow parse-args as list of floats for Online DPO, XPO and Nash-MD configs by @kashif in #2108

Full Changelog: https://github.com/huggingface/trl/compare/v0.11.0...v0.11.1

Sep 19, 2024

We are excited to introduce the new v0.11.0 release, with many new features and post-training algorithms. The highlights are as follows:

New post-training methods

Generalized Knowledge Distillation

Generalized Knowledge Distillation (GKD) is a post-training method from Google DeepMind that extends standard knowledge distillation by allowing the student to generate outputs during training and receive online feedback from the teacher. It consistently outperforms SFT and in some cases enables the student model to match the performance of the teacher, but with far fewer parameters.

To train models with this method, check out the GKDTrainer.

Exploratory Preference Optimization

Exploratory Preference Optimization is an online post-training method from researchers at Microsoft, MIT, and Wisconsin that extends DPO to incorporate online feedback from reward models or LLM judges. It is similar to online DPO, but has a slightly different theoretical basis concerning sample efficiency.

To train models with this method, check out the XPOTrainer.

Nash Learning with Human Feedback

Nash Learning with Human Feedback is a novel post-training method from Google DeepMind that uses pairwise preference models which are conditioned on two inputs, instead of the single one used in reward models. These preference models are then used to train a policy that consistently produces responses that are preferred over those from competing policies, thus approximating a Nash equilibrium (i.e. a two player game where actions are responses and payoffs are given by the preference model).

To train models with this method, check out the NashMDTrainer.

New trainer features

Online DPO now supports training LoRA adapters with PEFT, which means you can dramatically reduce the amount of VRAM needed to train models with this method. By @qgallouedec in https://github.com/huggingface/trl/pull/2041
The OrpoTrainer has better integration with PyTorchXLA for faster step time on TPUs ⚡ . By @wenxindongwork in https://github.com/huggingface/trl/pull/2001

Deprecations 🚨

The PPOTrainer is marked for deprecated in favour of PPOv2Trainer to provide a consistent API across TRL's trainers. It will be removed in v0.12.0. By @qgallouedec in https://github.com/huggingface/trl/pull/2016
The RichProgressCallback has been removed from the example scripts as it caused a variety of problems with logging in distributed environments. You can still use it by adding it manually to the trainer callbacks. By @lewtun in https://github.com/huggingface/trl/pull/2053

Bugfixes and improvements

Adds experimental Liger support to SFT script by @edbeeching in https://github.com/huggingface/trl/pull/1992
move slow-tests CI to new cluster by @glegendre01 in https://github.com/huggingface/trl/pull/1996
[Online-DPO] fixes to the training scripts and setup.py by @kashif in https://github.com/huggingface/trl/pull/1997
[pre-commit] update pre-commit yaml by @kashif in https://github.com/huggingface/trl/pull/2002
[Docs] Add Liger-Kernel usage to SFTTrainer page by @ryankert01 in https://github.com/huggingface/trl/pull/2007
[ci] pin numpy to < 2 on windows by @kashif in https://github.com/huggingface/trl/pull/2009
Remove prompts arg from WinrateCallback by @qgallouedec in https://github.com/huggingface/trl/pull/2010
Allow WinRateCallback to be used without reference model by @qgallouedec in https://github.com/huggingface/trl/pull/2013
Feat: Add support for APO-zero in KTOTrainer by @KarelDO in https://github.com/huggingface/trl/pull/1952
Clean configs documentation by @qgallouedec in https://github.com/huggingface/trl/pull/1944
Refactor reward modelling script to work with chat models by @lewtun in https://github.com/huggingface/trl/pull/2026
correct formatting of star sign in kto_trainer.mdx by @mattany in https://github.com/huggingface/trl/pull/2031
Remove unused functions in core.py by @northern-64bit in https://github.com/huggingface/trl/pull/2017
Improves formatting of docstring + newlines by @northern-64bit in https://github.com/huggingface/trl/pull/2006
Fix packing doc in SFTConfig and fix error when neither dataset_text_field nor formatting_func is provided. by @qgallouedec in https://github.com/huggingface/trl/pull/2035
fix: unpackaging error in Custom Mixture of Experts model when aux_loss_enabled is set to True. by @Jonathanjordan21 in https://github.com/huggingface/trl/pull/2039
Drop canonical namespaces by @qgallouedec in https://github.com/huggingface/trl/pull/2048
Change non_eos_penalty to be consistent across OnPolicy trainers by @RylanSchaeffer in https://github.com/huggingface/trl/pull/2033
Temporary pin the transformers hash in the CI by @qgallouedec in https://github.com/huggingface/trl/pull/2049
[XPO] xpo trainer by @kashif in https://github.com/huggingface/trl/pull/1943
Fix logits compuation in KTO trainer prediction step by @issamemari in https://github.com/huggingface/trl/pull/2050
[Draft, don't merge] Fix failing windows by @LysandreJik in https://github.com/huggingface/trl/pull/2051
Clean up DPO example by @lewtun in https://github.com/huggingface/trl/pull/2043
Remove debug and sanity_check args by @qgallouedec in https://github.com/huggingface/trl/pull/2055
Gkd trainer by @kashif in https://github.com/huggingface/trl/pull/1814
Documentation dataset format by @qgallouedec in https://github.com/huggingface/trl/pull/2020
Add missing autodocs by @qgallouedec in https://github.com/huggingface/trl/pull/2056
Mask loss in gkd when generating from the student by @gaetanlop in https://github.com/huggingface/trl/pull/2058
©️ Copyrights by @qgallouedec in https://github.com/huggingface/trl/pull/2063
Support for SFTTrainer.evaluate() and SFTTrainer.predict() with null train_dataset by @Sohaib9920 in https://github.com/huggingface/trl/pull/2004
make cuda-only tests device-agnostic by @faaany in https://github.com/huggingface/trl/pull/2044
Make ConstantLengthDataset (or packing=True) shuffle examples before they are packed by @muupan in https://github.com/huggingface/trl/pull/2037
Standardise API for WinRateCallback and LogCompletionsCallback by @lewtun in https://github.com/huggingface/trl/pull/2061
Fix dataset in GKD script by @lewtun in https://github.com/huggingface/trl/pull/2067
[online models] remove min_new_tokens=args.max_new_tokens by @kashif in https://github.com/huggingface/trl/pull/2069
Standardising datasets for testing by @qgallouedec in https://github.com/huggingface/trl/pull/2065
[KTO] learning rate recomentations for kto by @kashif in https://github.com/huggingface/trl/pull/2070
Nash md by @kashif in https://github.com/huggingface/trl/pull/1853
Use transformers utilities when possible by @qgallouedec in https://github.com/huggingface/trl/pull/2064
Minor doc fixes and comments by @qgallouedec in https://github.com/huggingface/trl/pull/2073
Added error check to RLOO, PPOv2, OnlineDPO that ref_policy and policy have different identities by @RylanSchaeffer in https://github.com/huggingface/trl/pull/2057
processor(prompt, images=image) to processor(images=image, text=prompt) by @qgallouedec in https://github.com/huggingface/trl/pull/2076
Use wrapped model for reference completions in WinRateCallback and set default freq to eval_steps in LogCompletionsCallback` by @lewtun in https://github.com/huggingface/trl/pull/2074
Conversational dataset support for Online DPO by @qgallouedec in https://github.com/huggingface/trl/pull/2075
[WIP] Fix logits/chosen and logits/rejected metrics in kto_trainer. by @PhilipMay in https://github.com/huggingface/trl/pull/2077
Standardize dataset naming by @qgallouedec in https://github.com/huggingface/trl/pull/2081
Fix deepspeed for PPOv2Trainer by @qgallouedec in https://github.com/huggingface/trl/pull/2080

New Contributors

@AdnaneKhan made their first contribution in https://github.com/huggingface/trl/pull/1822
@mkopecki made their first contribution in https://github.com/huggingface/trl/pull/1825
@DZ9 made their first contribution in https://github.com/huggingface/trl/pull/1836
@MAOJIASONG made their first contribution in https://github.com/huggingface/trl/pull/1840
@davanstrien made their first contribution in https://github.com/huggingface/trl/pull/1845
@eliebak made their first contribution in https://github.com/huggingface/trl/pull/1863
@Rishav-hub made their first contribution in https://github.com/huggingface/trl/pull/1862
@cemiu made their first contribution in https://github.com/huggingface/trl/pull/1738
@SunMarc made their first contribution in https://github.com/huggingface/trl/pull/1919
@karel-contextual made their first contribution in https://github.com/huggingface/trl/pull/1928
@RylanSchaeffer made their first contribution in https://github.com/huggingface/trl/pull/1932
@mina-parham made their first contribution in https://github.com/huggingface/trl/pull/1961
@RhuiDih made their first contribution in https://github.com/huggingface/trl/pull/1887
@SeungyounShin made their first contribution in https://github.com/huggingface/trl/pull/1969
@kit1980 made their first contribution in https://github.com/huggingface/trl/pull/1933
@akakakakakaa made their first contribution in https://github.com/huggingface/trl/pull/1987
@hvaara made their first contribution in https://github.com/huggingface/trl/pull/1990
@glegendre01 made their first contribution in https://github.com/huggingface/trl/pull/1996
@ryankert01 made their first contribution in https://github.com/huggingface/trl/pull/2007
@KarelDO made their first contribution in https://github.com/huggingface/trl/pull/1952
@mattany made their first contribution in https://github.com/huggingface/trl/pull/2031
@northern-64bit made their first contribution in https://github.com/huggingface/trl/pull/2017
@Jonathanjordan21 made their first contribution in https://github.com/huggingface/trl/pull/2039
@issamemari made their first contribution in https://github.com/huggingface/trl/pull/2050
@wenxindongwork made their first contribution in https://github.com/huggingface/trl/pull/2001
@Sohaib9920 made their first contribution in https://github.com/huggingface/trl/pull/2004
@faaany made their first contribution in https://github.com/huggingface/trl/pull/2044
@muupan made their first contribution in https://github.com/huggingface/trl/pull/2037
@PhilipMay made their first contribution in https://github.com/huggingface/trl/pull/2077

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.6...v0.11.0

Aug 29, 2024

We are excited to introduce the new v0.10.1 release, with many new exciting features and post-training algorithms. The highlights are as follows:

Online DPO

Online DPO is a new alignment method from DeepMind to boost the performance of LLMs. With Online DPO, data is generated on the fly by the trained model (instead of pre-collected). For each prompt, two completions are generated, with a reward model selecting the preferred one. This approach:

Eliminates the need for a pre-collected preference dataset (it's generated online)
Enables continuous model improvement
Yields better results than traditional DPO

To train models with this method, use the OnlineDPOTrainer

Liger Triton kernels for supercharged SFT

We've integrated LinkedIn's Liger Triton kernels to the SFTTrainer for faster throughput and lower memory usage. To use them, set use_liger_kernel in SFTConfig

DPO for VLMs

We've added support to align vision-language models with DPO, now covering architectures LLaVa-1.5, PaliGemma, and Idefics2. To train VLMs with DPO, use the dpo_visual.py script as follows

accelerate launch examples/scripts/dpo_visual.py \
    --dataset_name HuggingFaceH4/rlaif-v_formatted \
    --model_name_or_path google/paligemma-3b-pt-224 \
    --trust_remote_code \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --output_dir dpo_paligemma_rlaif-v \
    --bf16 \
    --torch_dtype bfloat16

WinRate callback for LLM as a judge

We've added support to compute win rates over the reference model for methods like DPO. To do so, configure the callback to point to the LLM as judge API (OpenAI or Hugging Face Inference API) and then add:

trainer = DPOTrainer(...)
win_rate_callback = WinRateCallback(..., trainer=trainer)
trainer.add_callback(win_rate_callback)

Anchored Preference Optimisation (APO) for fine-grained human/AI feedback

Added the APO method, which is an "anchored" version of the alignment objective. There are two variants: apo_zero and apo_down. The apo_zero loss increases the likelihood of winning outputs while decreasing the likelihood of losing outputs, making it suitable when the model is less performant than the winning outputs. On the other hand, apo_down decreases the likelihood of both winning and losing outputs, but with a stronger emphasis on reducing the likelihood of losing outputs. This variant is more effective when the model is better than the winning outputs. To use these losses, set loss_type="apo_zero" or loss_type="apo_down" in the DPOConfig

What's Changed

Set dev version by @vwxyzjn in https://github.com/huggingface/trl/pull/1817
Upgrade GitHub actions by @qgallouedec in https://github.com/huggingface/trl/pull/1818
DPO Llava 1.5 and PaliGemma support by @qgallouedec in https://github.com/huggingface/trl/pull/1797
Delete unused benchmark.yml workflow by @AdnaneKhan in https://github.com/huggingface/trl/pull/1822
Consistent use of trust_remote_code by @qgallouedec in https://github.com/huggingface/trl/pull/1806
Fix: authentication token kwarg not passed when loading PEFT adapters by @mkopecki in https://github.com/huggingface/trl/pull/1825
refactor trainer callbacks by @kashif in https://github.com/huggingface/trl/pull/1826
Uniform model_ref naming by @qgallouedec in https://github.com/huggingface/trl/pull/1835
fix ppov2_trainer tensorboard logging bug by @DZ9 in https://github.com/huggingface/trl/pull/1836
Fix issues of KTOTrainer by @MAOJIASONG in https://github.com/huggingface/trl/pull/1840
add link to DPO datasets collection by @davanstrien in https://github.com/huggingface/trl/pull/1845
fix arg parsing in chat.py by @lvwerra in https://github.com/huggingface/trl/pull/1846
DPO for VLM blog post in doc by @qgallouedec in https://github.com/huggingface/trl/pull/1844
Add WinRateCallback and Judges by @lewtun in https://github.com/huggingface/trl/pull/1598
Remove CI_HUB_USER_TOKEN by @qgallouedec in https://github.com/huggingface/trl/pull/1852
Online DPO and Online trainer refactor by @vwxyzjn in https://github.com/huggingface/trl/pull/1809
[online-DPO] online dpo cleanups by @kashif in https://github.com/huggingface/trl/pull/1864
arXiv to HF Papers by @qgallouedec in https://github.com/huggingface/trl/pull/1870
fix fsdp & qlora support by @eliebak in https://github.com/huggingface/trl/pull/1863
Import missing setup_chat_format by @Rishav-hub in https://github.com/huggingface/trl/pull/1862
Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM by @Rishav-hub in https://github.com/huggingface/trl/pull/1861
Small fixes to online dpo example by @edbeeching in https://github.com/huggingface/trl/pull/1879
Skip BigBird save and load test until next transformers version by @qgallouedec in https://github.com/huggingface/trl/pull/1874
Llama in modelling value head tests by @qgallouedec in https://github.com/huggingface/trl/pull/1878
Improve judges by @qgallouedec in https://github.com/huggingface/trl/pull/1856
[Do not merge] Re-add BigBird Pegasus save/load test by @qgallouedec in https://github.com/huggingface/trl/pull/1876
Re-add BigBird Pegasus save/load test by @qgallouedec in https://github.com/huggingface/trl/pull/1882
Move BCO to separate BCOTrainer with fixes by @claralp in https://github.com/huggingface/trl/pull/1869
Update example overview documentation section by @qgallouedec in https://github.com/huggingface/trl/pull/1883
fix dpo_trainer bug for LLMs without bos_token in config by @DZ9 in https://github.com/huggingface/trl/pull/1885
Fix SFT for VLM example by @qgallouedec in https://github.com/huggingface/trl/pull/1865
evaluation_strategy -> eval_strategy by @qgallouedec in https://github.com/huggingface/trl/pull/1894
fix serialization of RunningMoments on multiple GPUs by @claralp in https://github.com/huggingface/trl/pull/1892
[WIP] Fix CI by @qgallouedec in https://github.com/huggingface/trl/pull/1897
Drop setUpClass in reward tester by @qgallouedec in https://github.com/huggingface/trl/pull/1895
Support IterableDataset for SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/1899
Fix data processing in ORPO example script by @qgallouedec in https://github.com/huggingface/trl/pull/1903
[RPO] use loss from v3 of paper by @kashif in https://github.com/huggingface/trl/pull/1904
Support Rank Stabilized LoRA in the ModelConfig/LoraConfig by @JohnGiorgi in https://github.com/huggingface/trl/pull/1877
[Online-DPO] num_generation_per_prompt is fixed by @kashif in https://github.com/huggingface/trl/pull/1898
Fix GPT2 sentiment notebook reward by @cemiu in https://github.com/huggingface/trl/pull/1738
Fix AlignPropTrainer import by @qgallouedec in https://github.com/huggingface/trl/pull/1908
Various args and test fix by @qgallouedec in https://github.com/huggingface/trl/pull/1909
lr_scheduler.step() after optimizer.step() by @qgallouedec in https://github.com/huggingface/trl/pull/1918
torch.cuda.amp.autocast() -> torch.amp.autocast("cuda") by @qgallouedec in https://github.com/huggingface/trl/pull/1921
Fix orpo trainer loss device by @SunMarc in https://github.com/huggingface/trl/pull/1919
Add transformers library name for TRL repos by @lewtun in https://github.com/huggingface/trl/pull/1922
Standardize dataset_num_proc usage by @qgallouedec in https://github.com/huggingface/trl/pull/1925
PartialState().local_main_process_first() when map in examples by @qgallouedec in https://github.com/huggingface/trl/pull/1926
minor BCO fixes by @claralp in https://github.com/huggingface/trl/pull/1923
Improve DPO/loss doc by @qgallouedec in https://github.com/huggingface/trl/pull/1929
feat: anchored pref optimization by @karel-contextual in https://github.com/huggingface/trl/pull/1928
Add tests for DPO for VLM by @qgallouedec in https://github.com/huggingface/trl/pull/1935
fix model to save in ppov2 by @mnoukhov in https://github.com/huggingface/trl/pull/1776
Optional Additional Loss to Center Reward Models' Outputs by @RylanSchaeffer in https://github.com/huggingface/trl/pull/1932
Properly label all models when pushed to the hub by @qgallouedec in https://github.com/huggingface/trl/pull/1940
Skip token in push_to_hub by @qgallouedec in https://github.com/huggingface/trl/pull/1945
Fix model wrapping for online DPO by @lewtun in https://github.com/huggingface/trl/pull/1946
Don't mark issues as stale if nobody answered by @qgallouedec in https://github.com/huggingface/trl/pull/1949
Add a simple-to-understand example for online DPO by @vwxyzjn in https://github.com/huggingface/trl/pull/1947
Log WandB tables on main process by @lewtun in https://github.com/huggingface/trl/pull/1951
[ODPO] Fix global step for consistent checkpointing with global updates by @lewtun in https://github.com/huggingface/trl/pull/1950
"help wanted" in label to exempt from stale by @qgallouedec in https://github.com/huggingface/trl/pull/1956
Fix response truncation in examples/notebooks/gpt2-sentiment.ipynb by @qgallouedec in https://github.com/huggingface/trl/pull/1957
[ODPO] Refactor training script to use messages API by @lewtun in https://github.com/huggingface/trl/pull/1958
Support LLaVA-NeXT in Vision SFT by @qgallouedec in https://github.com/huggingface/trl/pull/1959
Add issue/PR templates, code of conduct & better contributing guide by @lewtun in https://github.com/huggingface/trl/pull/1963
Fix issue with precompute_ref_log_probs not working when rpo_alpha is None by @mina-parham in https://github.com/huggingface/trl/pull/1961
add arg padding_free to DataCollatorForCompletionOnlyLM by @RhuiDih in https://github.com/huggingface/trl/pull/1887
Optimize DPO log probability calculation by retaining necessary cache, saving up to 30GB of memory (#1968) by @SeungyounShin in https://github.com/huggingface/trl/pull/1969
New mismatch pair creation strategy by @qgallouedec in https://github.com/huggingface/trl/pull/1970
Fix issue templates location by @qgallouedec in https://github.com/huggingface/trl/pull/1973
Use weights_only for load by @kit1980 in https://github.com/huggingface/trl/pull/1933
Fix flaky Hub tests by @lewtun in https://github.com/huggingface/trl/pull/1981
fix a few minor bugs in ppo.py by @kykim0 in https://github.com/huggingface/trl/pull/1966
Test for #1970 by @qgallouedec in https://github.com/huggingface/trl/pull/1974
Restore reruns for flaky tests by @lewtun in https://github.com/huggingface/trl/pull/1982
Promote PairRMJudge to top-level import by @qgallouedec in https://github.com/huggingface/trl/pull/1985
[DPO] TR-DPO gather the target model params as well when syncing by @kashif in https://github.com/huggingface/trl/pull/1978
torch.load with weights_only=True by @qgallouedec in https://github.com/huggingface/trl/pull/1988
Skip the failing Online DPO test by @qgallouedec in https://github.com/huggingface/trl/pull/1989
Refactor Online DPO by @vwxyzjn in https://github.com/huggingface/trl/pull/1839
[DPO] tokenize and process DPO data via batches by @kashif in https://github.com/huggingface/trl/pull/1914
[RPO] Add ignore_index in DPOTrainer's nn.CrossEntropyLoss by @akakakakakaa in https://github.com/huggingface/trl/pull/1987
Relax numpy upper bound and bump deepspeed version by @hvaara in https://github.com/huggingface/trl/pull/1990
Adds experimental Liger support to SFT script by @edbeeching in https://github.com/huggingface/trl/pull/1992

New Contributors

@AdnaneKhan made their first contribution in https://github.com/huggingface/trl/pull/1822
@mkopecki made their first contribution in https://github.com/huggingface/trl/pull/1825
@DZ9 made their first contribution in https://github.com/huggingface/trl/pull/1836
@MAOJIASONG made their first contribution in https://github.com/huggingface/trl/pull/1840
@davanstrien made their first contribution in https://github.com/huggingface/trl/pull/1845
@eliebak made their first contribution in https://github.com/huggingface/trl/pull/1863
@Rishav-hub made their first contribution in https://github.com/huggingface/trl/pull/1862
@cemiu made their first contribution in https://github.com/huggingface/trl/pull/1738
@SunMarc made their first contribution in https://github.com/huggingface/trl/pull/1919
@karel-contextual made their first contribution in https://github.com/huggingface/trl/pull/1928
@RylanSchaeffer made their first contribution in https://github.com/huggingface/trl/pull/1932
@mina-parham made their first contribution in https://github.com/huggingface/trl/pull/1961
@RhuiDih made their first contribution in https://github.com/huggingface/trl/pull/1887
@SeungyounShin made their first contribution in https://github.com/huggingface/trl/pull/1969
@kit1980 made their first contribution in https://github.com/huggingface/trl/pull/1933
@akakakakakaa made their first contribution in https://github.com/huggingface/trl/pull/1987
@hvaara made their first contribution in https://github.com/huggingface/trl/pull/1990

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.6...v0.10

Jul 8, 2024

v0.9.6 release

We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:

Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input loss_type="simpo" and cpo_alpha=0 in the CPOConfig and use it with the CPOTrainer.
<img width="880" alt="image" src="https://github.com/huggingface/trl/assets/5555347/87551147-3f58-4c6a-9a78-70b513dea76e">
Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
Added Efficient Exact Optimization (EXO) by @haozheji

We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1710
Add a variant of CPO, SimPO by @fe1ixxu in https://github.com/huggingface/trl/pull/1703
[RPO] fix nll loss by @kashif in https://github.com/huggingface/trl/pull/1705
fix yaml parser for derived config classes by @mnoukhov in https://github.com/huggingface/trl/pull/1713
Fix default padding_value in dpo_config.py by @mnoukhov in https://github.com/huggingface/trl/pull/1692
feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/trl/pull/1721
ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in https://github.com/huggingface/trl/pull/1724
adds AOT by @imelnyk in https://github.com/huggingface/trl/pull/1701
Workflow: Notify tests results on slack channel by @younesbelkada in https://github.com/huggingface/trl/pull/1744
better trl parser with yaml config by @mnoukhov in https://github.com/huggingface/trl/pull/1739
CI / core: Pin numpy to !=2.0.0 for CI and to users by @younesbelkada in https://github.com/huggingface/trl/pull/1747
TrlParser: Add ignore extra args option by @younesbelkada in https://github.com/huggingface/trl/pull/1748
small KTO fixes by @kawine in https://github.com/huggingface/trl/pull/1734
CPO / DPO: Fix red CI by @younesbelkada in https://github.com/huggingface/trl/pull/1749
prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in https://github.com/huggingface/trl/pull/1728
CI / KTOTrainer: Remove old tests by @younesbelkada in https://github.com/huggingface/trl/pull/1750
change the process function in the example of DPO by @AIR-hl in https://github.com/huggingface/trl/pull/1753
Integrate f-divergence to DPO (Follow up) by @1485840691 in https://github.com/huggingface/trl/pull/1610
Support for returning past_key_values from the model by @idanshen in https://github.com/huggingface/trl/pull/1742
Fix masking of response tokens by @mertsayar8 in https://github.com/huggingface/trl/pull/1718
Support num_train_epochs by @vwxyzjn in https://github.com/huggingface/trl/pull/1743
Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in https://github.com/huggingface/trl/pull/1758
New sentiment and descriptiveness dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1757
Add CPO-SimPO method by @fe1ixxu in https://github.com/huggingface/trl/pull/1760
Added Reward Backpropogation Support by @mihirp1998 in https://github.com/huggingface/trl/pull/1585
MoE Models: option to add load balancing loss by @claralp in https://github.com/huggingface/trl/pull/1765
evaluation_strategy to eval_strategy by @qgallouedec in https://github.com/huggingface/trl/pull/1771
add Efficient Exact Optimization (EXO) by @haozheji in https://github.com/huggingface/trl/pull/1735
Remove the leading space in the tldr preference dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1773
Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in https://github.com/huggingface/trl/pull/1774
Visual DPO by @qgallouedec in https://github.com/huggingface/trl/pull/1647
[DOCS] fix docs and cli example script by @kashif in https://github.com/huggingface/trl/pull/1780
Fixed typo in SFT trainer docs by @detsutut in https://github.com/huggingface/trl/pull/1788
[SFT] add model_init_kwargs to training_args by @kashif in https://github.com/huggingface/trl/pull/1787
Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in https://github.com/huggingface/trl/pull/1794
Clean examples by @qgallouedec in https://github.com/huggingface/trl/pull/1791
Remove extra print in reward_trainer.py by @mnoukhov in https://github.com/huggingface/trl/pull/1799
Fix torch_dtype handling in {DPO,SFT}Trainer when provided via CLI by @alvarobartt in https://github.com/huggingface/trl/pull/1807
Fix TRL_USE_RICH environment variable handling by @alvarobartt in https://github.com/huggingface/trl/pull/1808
0.9.6 release by @vwxyzjn in https://github.com/huggingface/trl/pull/1816

New Contributors

@McPatate made their first contribution in https://github.com/huggingface/trl/pull/1721
@jetlime made their first contribution in https://github.com/huggingface/trl/pull/1724
@imelnyk made their first contribution in https://github.com/huggingface/trl/pull/1701
@AIR-hl made their first contribution in https://github.com/huggingface/trl/pull/1753
@1485840691 made their first contribution in https://github.com/huggingface/trl/pull/1610
@idanshen made their first contribution in https://github.com/huggingface/trl/pull/1742
@mertsayar8 made their first contribution in https://github.com/huggingface/trl/pull/1718
@scottsuk0306 made their first contribution in https://github.com/huggingface/trl/pull/1758
@mihirp1998 made their first contribution in https://github.com/huggingface/trl/pull/1585
@haozheji made their first contribution in https://github.com/huggingface/trl/pull/1735
@Mubin17 made their first contribution in https://github.com/huggingface/trl/pull/1774
@detsutut made their first contribution in https://github.com/huggingface/trl/pull/1788
@noahlt made their first contribution in https://github.com/huggingface/trl/pull/1794

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.4...v0.9.6

Jun 6, 2024

Mainly backward compatibility fixes with SFTTrainer.

What's Changed

Fixed doc string and related docs for the SFTConfig update by @GuilhermeFreire in https://github.com/huggingface/trl/pull/1706
SFTTrainer: Fix backward Compatibility issue with TrainingArguments by @younesbelkada in https://github.com/huggingface/trl/pull/1707
0.9.4 release by @vwxyzjn in https://github.com/huggingface/trl/pull/1708

New Contributors

@GuilhermeFreire made their first contribution in https://github.com/huggingface/trl/pull/1706

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.3...v0.9.4

Jun 5, 2024

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.

https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84

New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1568
fix add_special_tokens issue for data with template by @edixiong in https://github.com/huggingface/trl/pull/1509
[DPO] add 'bco_pair' loss_type by @seanexp in https://github.com/huggingface/trl/pull/1524
[DPO] DPOConfig class by @kashif in https://github.com/huggingface/trl/pull/1554
[SFT] add SFT Trainer Config dataclass by @kashif in https://github.com/huggingface/trl/pull/1530
FIX: Fix CI on transformers main by @younesbelkada in https://github.com/huggingface/trl/pull/1576
[SFTTrainer] Add warning in SFTTrainer when dataset already processed by @younesbelkada in https://github.com/huggingface/trl/pull/1577
Fix typo detoxifying doc by @qgallouedec in https://github.com/huggingface/trl/pull/1594
Core: removed unexisting SftArgumentParser by @younesbelkada in https://github.com/huggingface/trl/pull/1602
[KTOTrainer] add BCO (reward shift and underlying distribution matching) by @seanexp in https://github.com/huggingface/trl/pull/1599
[CLI] Use auto device map for model load by @lewtun in https://github.com/huggingface/trl/pull/1596
Removing tests/ from package data by @jamesbraza in https://github.com/huggingface/trl/pull/1607
Docs: Fix build main documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1604
support loss function for Self-play Preference Optimization by @winglian in https://github.com/huggingface/trl/pull/1612
Update HH dataset on helpful only subset by @vwxyzjn in https://github.com/huggingface/trl/pull/1613
corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in https://github.com/huggingface/trl/pull/1615
Fix ZeRO-3 generation context manager by @lewtun in https://github.com/huggingface/trl/pull/1617
fixed adding bos and eos token unconditionally by @jasonyux in https://github.com/huggingface/trl/pull/1591
visualize rm prediction by @vwxyzjn in https://github.com/huggingface/trl/pull/1636
[ORPO] Correct label mask for pad tokens by @IlyaGusev in https://github.com/huggingface/trl/pull/1625
Update sft_llama2.py to work with the latest API by @xianbaoqian in https://github.com/huggingface/trl/pull/1637
Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in https://github.com/huggingface/trl/pull/1641
Pairwise Noise Contrastive Alignment by @winglian in https://github.com/huggingface/trl/pull/1632
don't cast the trainable lora layers to half precision by @pacman100 in https://github.com/huggingface/trl/pull/1644
PPO / Reinforce Trainers by @vwxyzjn in https://github.com/huggingface/trl/pull/1540
Apply deprecated evaluation_strategy by @muellerzr in https://github.com/huggingface/trl/pull/1559
FEAT: Add support for training collator in PPOTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1658
Correct Documentation for cDPO Usage by @AliBakly in https://github.com/huggingface/trl/pull/1655
Fix inheritance order in PPOv2Config by @Nicolinho in https://github.com/huggingface/trl/pull/1659
[DPO] Add 'robust' loss_type by @Abilityguy in https://github.com/huggingface/trl/pull/1653
🤫 TR-DPO implementation by @syrn1k in https://github.com/huggingface/trl/pull/1593
Do not upcast adapters when using FSDP+QLoRA by @pacman100 in https://github.com/huggingface/trl/pull/1654
[Tests] update eval_strategy API by @kashif in https://github.com/huggingface/trl/pull/1662
Fix ppov2 test case by @vwxyzjn in https://github.com/huggingface/trl/pull/1661
FIX / PPO: Fix enable_input_require_grads issues with PPO models by @younesbelkada in https://github.com/huggingface/trl/pull/1664
fix dataset load error by @sywangyi in https://github.com/huggingface/trl/pull/1670
FIX / SFTTrainer: Fix SFTTrainer with args=None by @younesbelkada in https://github.com/huggingface/trl/pull/1678
Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in https://github.com/huggingface/trl/pull/1588
intial RPO loss by @kashif in https://github.com/huggingface/trl/pull/1686
Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in https://github.com/huggingface/trl/pull/1690
Skip packing validation by @alex-jw-brooks in https://github.com/huggingface/trl/pull/1673
Fix typo in DPOTrainer's warnings by @qgallouedec in https://github.com/huggingface/trl/pull/1688
Quick fix on GPT4-eval by @vwxyzjn in https://github.com/huggingface/trl/pull/1696
Release 0.9.2 by @vwxyzjn in https://github.com/huggingface/trl/pull/1697

New Contributors

@edixiong made their first contribution in https://github.com/huggingface/trl/pull/1509
@seanexp made their first contribution in https://github.com/huggingface/trl/pull/1524
@jamesbraza made their first contribution in https://github.com/huggingface/trl/pull/1607
@winglian made their first contribution in https://github.com/huggingface/trl/pull/1612
@angelahzyuan made their first contribution in https://github.com/huggingface/trl/pull/1615
@jasonyux made their first contribution in https://github.com/huggingface/trl/pull/1591
@IlyaGusev made their first contribution in https://github.com/huggingface/trl/pull/1625
@xianbaoqian made their first contribution in https://github.com/huggingface/trl/pull/1637
@bartoszzuk made their first contribution in https://github.com/huggingface/trl/pull/1641
@muellerzr made their first contribution in https://github.com/huggingface/trl/pull/1559
@AliBakly made their first contribution in https://github.com/huggingface/trl/pull/1655
@Nicolinho made their first contribution in https://github.com/huggingface/trl/pull/1659
@Abilityguy made their first contribution in https://github.com/huggingface/trl/pull/1653
@syrn1k made their first contribution in https://github.com/huggingface/trl/pull/1593
@alexisrozhkov made their first contribution in https://github.com/huggingface/trl/pull/1690
@alex-jw-brooks made their first contribution in https://github.com/huggingface/trl/pull/1673

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.6...v0.9.2

Apr 22, 2024

v0.8.6: Fixes for CLI

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1556
[CLI] Update init.py imports by @kashif in https://github.com/huggingface/trl/pull/1557
CLI: Add warning when ignored params are passed + parse config file if config if passed by @younesbelkada in https://github.com/huggingface/trl/pull/1565
Release: v0.8.6 by @younesbelkada in https://github.com/huggingface/trl/pull/1567

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.5...v0.8.6

Apr 18, 2024

v0.8.5: Important fixes for CLIs

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1548
FIX: make the train / test fields modulable by @younesbelkada in https://github.com/huggingface/trl/pull/1551
enable multiple eos tokens by @lvwerra in https://github.com/huggingface/trl/pull/1553
Release: v0.8.5 by @younesbelkada in https://github.com/huggingface/trl/pull/1555

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.4...v0.8.5

Apr 17, 2024

v0.8.4: CLI / CPO / KTO important fixes

This patch release includes important fixes for the CLI and KTO & CPO trainers

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1529
[CPO] fix memory leak due to retained value by @kashif in https://github.com/huggingface/trl/pull/1531
VSFT hotfix - adds gen prompt to template and processor to hub by @edbeeching in https://github.com/huggingface/trl/pull/1532
save_model -> save_pretrained in ppo_trainer.mdx by @ejmejm in https://github.com/huggingface/trl/pull/1537
[KTO] support to load the adapter twice by @claralp in https://github.com/huggingface/trl/pull/1542
CLI: Set dataset_text_field to None to allow ChatML automatic template by @younesbelkada in https://github.com/huggingface/trl/pull/1545
FIX: Fix slow test by @younesbelkada in https://github.com/huggingface/trl/pull/1546
Fixed ref model not used in PPO generation by @ejmejm in https://github.com/huggingface/trl/pull/1534
Release: v0.8.4 by @younesbelkada in https://github.com/huggingface/trl/pull/1547

New Contributors

@ejmejm made their first contribution in https://github.com/huggingface/trl/pull/1537

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.3...v0.8.4

Apr 12, 2024

v0.8.3: Patch release for CLI

What's Changed

This is a patch release that includes an import fix for CLIs

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1523
[CLI] fix imports by @kashif in https://github.com/huggingface/trl/pull/1527
Release: v0.8.3 by @younesbelkada in https://github.com/huggingface/trl/pull/1528

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.2...v0.8.3

Apr 11, 2024

v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

ORPO trainer by @kashif in https://github.com/huggingface/trl/pull/1435
[ORPO] use log1p for loss by @kashif in https://github.com/huggingface/trl/pull/1491

CPO Trainer

Add CPOTrainer by @fe1ixxu in https://github.com/huggingface/trl/pull/1382
Add use_cache=False in {ORPO,CPO}Trainer.concatenated_forward by @alvarobartt in https://github.com/huggingface/trl/pull/1478
[ORPO] Update NLL loss to use input_ids instead by @alvarobartt in https://github.com/huggingface/trl/pull/1516

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava ! See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in https://github.com/huggingface/trl/pull/1518

KTO Fixes

Many fixes were introduced for the KTOTrainer:

Update KTO example to use better model and ChatML support by @lewtun in https://github.com/huggingface/trl/pull/1485
[KTO] Use batching to speed up data processing by @lewtun in https://github.com/huggingface/trl/pull/1470
Update KTO example with good dataset & chat format by @lewtun in https://github.com/huggingface/trl/pull/1481
[KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in https://github.com/huggingface/trl/pull/1499
[KTO] fix metric logging by @claralp in https://github.com/huggingface/trl/pull/1514

10x PPO !

Speed up PPO with ZeRO-3 by 10x 🔥 by @lewtun in https://github.com/huggingface/trl/pull/1483

Other fixes

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1463
Use the standard dataset for DPO CLI by @vwxyzjn in https://github.com/huggingface/trl/pull/1456
[peft] Update test_reward_trainer.py to fix tests by @kashif in https://github.com/huggingface/trl/pull/1471
Fix hyperparameters in KTO example by @lewtun in https://github.com/huggingface/trl/pull/1474
docs: add missing Trainer classes and sort alphabetically by @anakin87 in https://github.com/huggingface/trl/pull/1479
hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in https://github.com/huggingface/trl/pull/1488
Ignore chat files by @lewtun in https://github.com/huggingface/trl/pull/1486
Add DPO link in README by @qgallouedec in https://github.com/huggingface/trl/pull/1502
Fix typo in how_to_train.md by @ftorres16 in https://github.com/huggingface/trl/pull/1503
Fix DPO Unsloth example in Docs by @arnavgarg1 in https://github.com/huggingface/trl/pull/1494
Correct ppo_epochs usage by @muhammed-shihebi in https://github.com/huggingface/trl/pull/1480
Fix RichProgressCallback by @eggry in https://github.com/huggingface/trl/pull/1496
Change the device index to device:index by @yuanwu2017 in https://github.com/huggingface/trl/pull/1490
FIX: use kwargs for RMTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1515
Allow streaming (datasets.IterableDataset) by @BramVanroy in https://github.com/huggingface/trl/pull/1468
Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in https://github.com/huggingface/trl/pull/1520
[DOC] Add data description for sfttrainer doc by @BramVanroy in https://github.com/huggingface/trl/pull/1521
Release: v0.8.2 by @younesbelkada in https://github.com/huggingface/trl/pull/1522

New Contributors

@fe1ixxu made their first contribution in https://github.com/huggingface/trl/pull/1382
@anakin87 made their first contribution in https://github.com/huggingface/trl/pull/1479
@galtay made their first contribution in https://github.com/huggingface/trl/pull/1488
@qgallouedec made their first contribution in https://github.com/huggingface/trl/pull/1502
@ftorres16 made their first contribution in https://github.com/huggingface/trl/pull/1503
@arnavgarg1 made their first contribution in https://github.com/huggingface/trl/pull/1494
@muhammed-shihebi made their first contribution in https://github.com/huggingface/trl/pull/1480
@eggry made their first contribution in https://github.com/huggingface/trl/pull/1496
@claralp made their first contribution in https://github.com/huggingface/trl/pull/1514

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.1...v0.8.2

Mar 20, 2024

v0.8.1: Patch release for CLIs

This patch release includes some important fixes for CLIs

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1454
Fix chat CLI for model revisions by @lewtun in https://github.com/huggingface/trl/pull/1458
[chat] add eos token to generate by @lvwerra in https://github.com/huggingface/trl/pull/1459
Release: v0.8.1 by @younesbelkada in https://github.com/huggingface/trl/pull/1462

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.0...v0.8.1

Mar 19, 2024

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

fix bugs in KTO implementation by @kawine in https://github.com/huggingface/trl/pull/1380
[KTO] merge eval dataset only if it exists by @kashif in https://github.com/huggingface/trl/pull/1383
[KTO] prevent nans from appearing in metrics by @kawine in https://github.com/huggingface/trl/pull/1386
Kto trainer by @kashif in https://github.com/huggingface/trl/pull/1181
[KTO] fix tokenization bugs by @kawine in https://github.com/huggingface/trl/pull/1418
[KTO] model init when args are given by @kashif in https://github.com/huggingface/trl/pull/1413
[KTO] fix various bugs by @kawine in https://github.com/huggingface/trl/pull/1402

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FEAT: Add CLIs in TRL ! by @younesbelkada in https://github.com/huggingface/trl/pull/1419
CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in https://github.com/huggingface/trl/pull/1446
chat cli by @lvwerra in https://github.com/huggingface/trl/pull/1431
Fix yaml parsing issue by @younesbelkada in https://github.com/huggingface/trl/pull/1450
model --> model_name_or_path by @lvwerra in https://github.com/huggingface/trl/pull/1452
FEAT: Update README to add DPO + CLIs by @younesbelkada in https://github.com/huggingface/trl/pull/1448

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in https://github.com/huggingface/trl/pull/1416

Other fixes

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1332
Update stack llama 2 example to reflect #aa35fec by @nautsimon in https://github.com/huggingface/trl/pull/1333
FIX: More user friendly error when users don't have PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/1350
fix 8-bit multi-gpu training bug by @fancyerii in https://github.com/huggingface/trl/pull/1353
set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in https://github.com/huggingface/trl/pull/1357
Fix transformers version checking for Python < 3.8 by @samuki in https://github.com/huggingface/trl/pull/1363
Add some arguments for support XPU by @yuanwu2017 in https://github.com/huggingface/trl/pull/1366
ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in https://github.com/huggingface/trl/pull/1370
FEAT: [SFTTrainer] Add eval_packing by @younesbelkada in https://github.com/huggingface/trl/pull/1369
FEAT: force_use_ref_model for power users by @younesbelkada in https://github.com/huggingface/trl/pull/1367
FIX: fix after #1370 by @younesbelkada in https://github.com/huggingface/trl/pull/1372
FIX: Change ci to fail-fast=False by @younesbelkada in https://github.com/huggingface/trl/pull/1373
FIX: Fix the CI again .. by @younesbelkada in https://github.com/huggingface/trl/pull/1374
Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in https://github.com/huggingface/trl/pull/1391
Fix the pad_token_id error by @yuanwu2017 in https://github.com/huggingface/trl/pull/1394
FIX [RewardModeling] Fix RM script for PEFT by @younesbelkada in https://github.com/huggingface/trl/pull/1393
Fix import error from deprecation in transformers by @lewtun in https://github.com/huggingface/trl/pull/1415
CI: Fix CI on main by @younesbelkada in https://github.com/huggingface/trl/pull/1422
[Kto] torch_dtype kwargs fix by @kashif in https://github.com/huggingface/trl/pull/1429
Create standard dataset for TRL by @vwxyzjn in https://github.com/huggingface/trl/pull/1424
FIX: fix doc build on main by @younesbelkada in https://github.com/huggingface/trl/pull/1437
Fix PPOTrainer README example by @nikihowe in https://github.com/huggingface/trl/pull/1441
Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in https://github.com/huggingface/trl/pull/1439
Release: v0.8.0 by @younesbelkada in https://github.com/huggingface/trl/pull/1453

New Contributors

@nautsimon made their first contribution in https://github.com/huggingface/trl/pull/1333
@fancyerii made their first contribution in https://github.com/huggingface/trl/pull/1353
@samuki made their first contribution in https://github.com/huggingface/trl/pull/1363
@yuanwu2017 made their first contribution in https://github.com/huggingface/trl/pull/1366
@kawine made their first contribution in https://github.com/huggingface/trl/pull/1380
@skavulya made their first contribution in https://github.com/huggingface/trl/pull/1391
@pengwei715 made their first contribution in https://github.com/huggingface/trl/pull/1439

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.11...v0.8.0

Feb 16, 2024

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

[DPO] average_log_prob when loss is IPO by @kashif in https://github.com/huggingface/trl/pull/1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

[DPOTrainer] Fix DPO trainer + mistral + FA2 by @younesbelkada in https://github.com/huggingface/trl/pull/1290

Data processing is now faster for multi-GPU envs

[DPOTrainer] Load data only on main process + fix dpo example test by @younesbelkada in https://github.com/huggingface/trl/pull/1291
Add multiprocessing in the DPO trainer. by @imraviagrawal in https://github.com/huggingface/trl/pull/1286

Other DPO bugfixes:

[PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in https://github.com/huggingface/trl/pull/1289
Fix wrong variable name in DPOTrainer documentation example by @ouhenio in https://github.com/huggingface/trl/pull/1280
fix padding in dpo trainer by @pacman100 in https://github.com/huggingface/trl/pull/1284
Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in https://github.com/huggingface/trl/pull/1313
[DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in https://github.com/huggingface/trl/pull/1307

Faster data processing and other enhancements:

Only load data on main process by @JohnGiorgi in https://github.com/huggingface/trl/pull/1255
Remove tyro by @vwxyzjn in https://github.com/huggingface/trl/pull/1176

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

[core / xxxTrainer] Automatic tagging by @younesbelkada in https://github.com/huggingface/trl/pull/1329

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1254
Update Model Generation config to reflect new special tokens by @philschmid in https://github.com/huggingface/trl/pull/1256
Fix a typo in variable name by @otlaitil in https://github.com/huggingface/trl/pull/1269
FIx SFTTrainer bugs on TRL main by @younesbelkada in https://github.com/huggingface/trl/pull/1276
Fix SFT tuner in CI by @vwxyzjn in https://github.com/huggingface/trl/pull/1278
Fix sft ci by @vwxyzjn in https://github.com/huggingface/trl/pull/1279
Fix DPO slow tests by @younesbelkada in https://github.com/huggingface/trl/pull/1292
Fix sft trainer when args is None by @younesbelkada in https://github.com/huggingface/trl/pull/1295
Fix DPOTrainer docstrings by @alvarobartt in https://github.com/huggingface/trl/pull/1298
Types: Fix PEP 484 implicit-optional compliance by @akx in https://github.com/huggingface/trl/pull/1297
Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in https://github.com/huggingface/trl/pull/1308
Codemod Unittest assertions to bare asserts by @akx in https://github.com/huggingface/trl/pull/1301
ENH: Run CI only if relevant files are modified by @younesbelkada in https://github.com/huggingface/trl/pull/1309
Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in https://github.com/huggingface/trl/pull/1312
Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in https://github.com/huggingface/trl/pull/1321
Best practice recommendation update for dpo_trainer.mdx by @R-seny in https://github.com/huggingface/trl/pull/1325
pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in https://github.com/huggingface/trl/pull/1300
Update README.md to clarify model requirement by @markstur in https://github.com/huggingface/trl/pull/1315
[core / DDPO] Fix diffusers import issue by @younesbelkada in https://github.com/huggingface/trl/pull/1314
[CI] Add tests on transformers peft main on push main by @younesbelkada in https://github.com/huggingface/trl/pull/1328
Release: v0.7.11 by @younesbelkada in https://github.com/huggingface/trl/pull/1331

New Contributors

@otlaitil made their first contribution in https://github.com/huggingface/trl/pull/1269
@JohnGiorgi made their first contribution in https://github.com/huggingface/trl/pull/1255
@ouhenio made their first contribution in https://github.com/huggingface/trl/pull/1280
@imraviagrawal made their first contribution in https://github.com/huggingface/trl/pull/1286
@akx made their first contribution in https://github.com/huggingface/trl/pull/1297
@esceptico made their first contribution in https://github.com/huggingface/trl/pull/1307
@johnowhitaker made their first contribution in https://github.com/huggingface/trl/pull/1308
@elhusseiniali made their first contribution in https://github.com/huggingface/trl/pull/1312
@maliozer made their first contribution in https://github.com/huggingface/trl/pull/1313
@j-cb made their first contribution in https://github.com/huggingface/trl/pull/1321
@R-seny made their first contribution in https://github.com/huggingface/trl/pull/1325
@markstur made their first contribution in https://github.com/huggingface/trl/pull/1315

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.10...v0.7.11

Jan 19, 2024

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1207
Check tokenize params on DPOTrainer by @pablovicente in https://github.com/huggingface/trl/pull/1197
Fix shape descriptions in calculate_loss method by @yuta0x89 in https://github.com/huggingface/trl/pull/1204
Fix FSDP error by @mgerstgrasser in https://github.com/huggingface/trl/pull/1196
Update Unsloth SFT, DPO docs by @danielhanchen in https://github.com/huggingface/trl/pull/1213
Fix args type by @zspo in https://github.com/huggingface/trl/pull/1214
[core / Docker] Add workflow to build TRL docker images by @younesbelkada in https://github.com/huggingface/trl/pull/1215
Refactor RewardConfig to own module by @lewtun in https://github.com/huggingface/trl/pull/1221
Add support for ChatML dataset format in by @philschmid in https://github.com/huggingface/trl/pull/1208
Add slow test workflow file by @younesbelkada in https://github.com/huggingface/trl/pull/1223
Remove a repeating line in how_to_train.md by @kykim0 in https://github.com/huggingface/trl/pull/1226
Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in https://github.com/huggingface/trl/pull/1160
fix: improve error message when pad_token_id is not configured by @yumemio in https://github.com/huggingface/trl/pull/1152
[core / tests ] v1 slow tests by @younesbelkada in https://github.com/huggingface/trl/pull/1218
[core / SFTTrainer] Fix breaking change by @younesbelkada in https://github.com/huggingface/trl/pull/1229
Fixes slow tests by @younesbelkada in https://github.com/huggingface/trl/pull/1241
Fix weird doc bug by @younesbelkada in https://github.com/huggingface/trl/pull/1244
Add setup_chat_format for adding new special tokens to model for training chat models by @philschmid in https://github.com/huggingface/trl/pull/1242
Fix chatml template by @philschmid in https://github.com/huggingface/trl/pull/1248
fix: fix loss_type and some args desc by @zspo in https://github.com/huggingface/trl/pull/1247
Release: v0.7.10 by @younesbelkada in https://github.com/huggingface/trl/pull/1253

New Contributors

@yuta0x89 made their first contribution in https://github.com/huggingface/trl/pull/1204
@danielhanchen made their first contribution in https://github.com/huggingface/trl/pull/1213
@zspo made their first contribution in https://github.com/huggingface/trl/pull/1214
@philschmid made their first contribution in https://github.com/huggingface/trl/pull/1208
@kykim0 made their first contribution in https://github.com/huggingface/trl/pull/1226
@AjayP13 made their first contribution in https://github.com/huggingface/trl/pull/1160
@yumemio made their first contribution in https://github.com/huggingface/trl/pull/1152

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.9...v0.7.10

Jan 9, 2024

v0.7.9: Patch release for DPO & SFTTrainer

This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM

What's Changed

Release: v0.7.8 by @younesbelkada in https://github.com/huggingface/trl/pull/1200
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1201
Fix instruction token masking by @mgerstgrasser in https://github.com/huggingface/trl/pull/1185
Fix reported KL in PPO trainer by @mgerstgrasser in https://github.com/huggingface/trl/pull/1180
[DPOTrainer] Fix peft + DPO + bf16 if one uses generate_during_eval or pre-computed logits by @younesbelkada in https://github.com/huggingface/trl/pull/1203
Revert "Address issue #1122" by @younesbelkada in https://github.com/huggingface/trl/pull/1205
Release: v0.7.9 by @younesbelkada in https://github.com/huggingface/trl/pull/1206

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.8...v0.7.9

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for `xxxTrainer`

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

[xxxTrainer] Add unsloth tag by @younesbelkada in https://github.com/huggingface/trl/pull/1130

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

Allow separate devices for target/ref models. by @jondurbin in https://github.com/huggingface/trl/pull/1190
Allow swapping PEFT adapters for target/ref model. by @jondurbin in https://github.com/huggingface/trl/pull/1193
Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in https://github.com/huggingface/trl/pull/1154

DDPO + PEFT

Now DDPO supports PEFT

add: support for peft in ddpo. by @sayakpaul in https://github.com/huggingface/trl/pull/1165

Other fixes

add peft_module_casting_to_bf16 in DPOTrainer by @sywangyi in https://github.com/huggingface/trl/pull/1143
SFT Tokenizer Fix by @ChrisCates in https://github.com/huggingface/trl/pull/1142
Minor fixes to some comments in some examples. by @mattholl in https://github.com/huggingface/trl/pull/1156
Correct shapes in docstring of PPOTrainer's train_minibatch method by @nikihowe in https://github.com/huggingface/trl/pull/1170
Update sft_trainer.py by @Hemanthkumar2112 in https://github.com/huggingface/trl/pull/1162
Fix batch all gather by @vwxyzjn in https://github.com/huggingface/trl/pull/1177
Address issue #1122 by @maneandrea in https://github.com/huggingface/trl/pull/1174
Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by @Jfhseh in https://github.com/huggingface/trl/pull/1171
SFTTrainer: follow args.remove_unused_columns by @mgerstgrasser in https://github.com/huggingface/trl/pull/1188
Handle last token from generation prompt by @pablovicente in https://github.com/huggingface/trl/pull/1153

New Contributors

@ChrisCates made their first contribution in https://github.com/huggingface/trl/pull/1142
@brcps12 made their first contribution in https://github.com/huggingface/trl/pull/1154
@mattholl made their first contribution in https://github.com/huggingface/trl/pull/1156
@sayakpaul made their first contribution in https://github.com/huggingface/trl/pull/1165
@nikihowe made their first contribution in https://github.com/huggingface/trl/pull/1170
@Hemanthkumar2112 made their first contribution in https://github.com/huggingface/trl/pull/1162
@maneandrea made their first contribution in https://github.com/huggingface/trl/pull/1174
@Jfhseh made their first contribution in https://github.com/huggingface/trl/pull/1171
@mgerstgrasser made their first contribution in https://github.com/huggingface/trl/pull/1188
@pablovicente made their first contribution in https://github.com/huggingface/trl/pull/1153
@jondurbin made their first contribution in https://github.com/huggingface/trl/pull/1190

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.7...v0.7.8

Dec 26, 2023

v0.7.7: Patch release PPO & DDPO tags

A fix has been introduce to fix a breaking change with PPOTrainer.push_to_hub() and DDPOTrainer.push_to_hub()

[PPOTrainer / DDPOTrainer] Fix ppo & ddpo push to Hub by @younesbelkada in https://github.com/huggingface/trl/pull/1141

What's Changed

Release: v0.7.6 by @younesbelkada in https://github.com/huggingface/trl/pull/1134
set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1135
clear up the parameters of supervised_finetuning.py by @sywangyi in https://github.com/huggingface/trl/pull/1126
Add type hints to core.py by @zachschillaci27 in https://github.com/huggingface/trl/pull/1097
fix_ddpo_demo by @zhangsibo1129 in https://github.com/huggingface/trl/pull/1129
Add npu support for ppo example by @zhangsibo1129 in https://github.com/huggingface/trl/pull/1128

New Contributors

@zachschillaci27 made their first contribution in https://github.com/huggingface/trl/pull/1097
@zhangsibo1129 made their first contribution in https://github.com/huggingface/trl/pull/1129

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.6...v0.7.7

Previous 1 2 3 4 5 Next

TRL

What's Changed

New Contributors

What's Changed

Bug fix

New post-training methods

Generalized Knowledge Distillation

Exploratory Preference Optimization

Nash Learning with Human Feedback

New trainer features

Deprecations 🚨

Bugfixes and improvements

New Contributors

Online DPO

Liger Triton kernels for supercharged SFT

DPO for VLMs

WinRate callback for LLM as a judge

Anchored Preference Optimisation (APO) for fine-grained human/AI feedback

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

ORPO Trainer

CPO Trainer

VLLMs support for SFTTrainer

KTO Fixes

10x PPO !

Other fixes

New Contributors

What's Changed

New Trainer: KTOTrainer:

TRL Command Line Interfaces (CLIs):

FSDP + QLoRA:

Other fixes

New Contributors

DPO important fixes

Faster data processing and other enhancements:

Automatic tagging for all models

What's Changed

New Contributors

v0.7.10: Minor fixes, Automatic templating, setup_chat_format API, stronger tests

What's Changed

New Contributors

What's Changed

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for xxxTrainer

DPO fixes

DDPO + PEFT

Other fixes

New Contributors

v0.7.7: Patch release PPO & DDPO tags

What's Changed

New Contributors

More from this team

Similar releases

Other sources from this team

Similar sources

Other sources from this team

Similar sources

More from this team

Similar releases

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests

Unsloth tag for `xxxTrainer`