xxxTrainerThis is a patch release to push multiple tags (e.g. trl & sft) instead of one tag
xxxTrainer] multi-tags support for tagging by @younesbelkada in https://github.com/huggingface/trl/pull/1133Full Changelog: https://github.com/huggingface/trl/compare/v0.7.5...v0.7.6
DPOTrainer enhancements, automatic tags for xxxTrainerDPOTrainerThis release introduces many new features in TRL for DPOTrainer:
DPOTrainerxxxTrainer tagging on the HubNow, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub
xxxTrainer] Add tags to all trainers in TRL by @younesbelkada in https://github.com/huggingface/trl/pull/1120We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer
Docs] Add unsloth optimizations in TRL's documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1119Tests] Add non optional packages tests by @younesbelkada in https://github.com/huggingface/trl/pull/974examples/ by @alvarobartt in https://github.com/huggingface/trl/pull/977eos_token_id and pad_token_id by @MustSave in https://github.com/huggingface/trl/pull/988DataCollatorForCompletionOnlyLM] Add more clarification / guidance in the case tokenizer.pad_token_id == tokenizer.eos_token_id by @younesbelkada in https://github.com/huggingface/trl/pull/992requires_grad to input for non-quantized peft models by @younesbelkada in https://github.com/huggingface/trl/pull/1006core] Fix failing tests on main by @younesbelkada in https://github.com/huggingface/trl/pull/1065SFTTrainer] Fix Trainer when args is None by @younesbelkada in https://github.com/huggingface/trl/pull/1064loss_type in ValueError message by @alvarobartt in https://github.com/huggingface/trl/pull/1067tyro in sft_llama2.py by @vwxyzjn in https://github.com/huggingface/trl/pull/1081peft_module_casting_to_bf16 util method, append_concat_token flag, remove callback PeftSavingCallback by @pacman100 in https://github.com/huggingface/trl/pull/1110description in setup.py by @alvarobartt in https://github.com/huggingface/trl/pull/1101Full Changelog: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5
This release is a patch release that addresses an issue for users that have TRL installed without PEFT
core] Fix peft config typehint by @younesbelkada in https://github.com/huggingface/trl/pull/967Full Changelog: https://github.com/huggingface/trl/compare/v0.7.3...v0.7.4
IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed TrainingIn this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.
Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.
Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer
NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:
SFTTrainer] Adds NEFTune into SFTTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/871NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889NEFTune] Make use of forward hooks instead by @younesbelkada in https://github.com/huggingface/trl/pull/889Read more about it here
Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.
DPO] fix DPO + GC issues by @younesbelkada in https://github.com/huggingface/trl/pull/927core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in https://github.com/huggingface/trl/pull/912The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below
reward_modeling.py by @vwxyzjn in https://github.com/huggingface/trl/pull/890tyro version by @brentyi in https://github.com/huggingface/trl/pull/928SFTTrainer] Make sure to not conflict between transformers and TRL implementation by @younesbelkada in https://github.com/huggingface/trl/pull/933CI] Fix CI with new transformers release by @younesbelkada in https://github.com/huggingface/trl/pull/946Full Changelog: https://github.com/huggingface/trl/compare/v0.7.2...v0.7.3
In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer
SFTTrainer:Docs] Fix sft mistakes by @younesbelkada in https://github.com/huggingface/trl/pull/717core] Bump peft to 0.4.0 by @younesbelkada in https://github.com/huggingface/trl/pull/720SFTTrainer] Check correctly for condition by @younesbelkada in https://github.com/huggingface/trl/pull/668core] Fix import of randn_tensor by @younesbelkada in https://github.com/huggingface/trl/pull/751prepare_model_for_kbit_training by @mnoukhov in https://github.com/huggingface/trl/pull/728PPOTrainer by @davidberenstein1957 in https://github.com/huggingface/trl/pull/665RewardConfig is backwards compatible by @lewtun in https://github.com/huggingface/trl/pull/748log_with argument by @filippobistaffa in https://github.com/huggingface/trl/pull/792DPO] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in https://github.com/huggingface/trl/pull/799Docs] Clarify PEFT docs by @younesbelkada in https://github.com/huggingface/trl/pull/797PPOTrainer] Fixes ppo trainer generate nit by @younesbelkada in https://github.com/huggingface/trl/pull/798create_reference_model() when ZeRO-3 is enabled by @lewtun in https://github.com/huggingface/trl/pull/840lewtun power by @lvwerra in https://github.com/huggingface/trl/pull/856core] Fix import issues by @younesbelkada in https://github.com/huggingface/trl/pull/859Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2
PPOTrainer and log_statsFixed a bug with log_stats of PPOTrainer to avoid breaking behaviour
PPOTrainer] A workaround for failing log_stats by @younesbelkada in https://github.com/huggingface/trl/pull/708Full Changelog: https://github.com/huggingface/trl/compare/v0.7.0...v0.7.1
Text environments provide a learning ground for language agents. It allows a language model to use tools to accomplish a task such as using a Python interpreter to answer math questions or using a search index for trivia questions. Having access to tools allows language models to solve tasks that would be very hard for the models itself but can be trivial for the appropriate tools.
We are excited to bring to the community a complete set of functionalities and full examples to train LLMs to use tools!
Check out the documentation page here and few examples below:
Full Changelog: https://github.com/huggingface/trl/compare/v0.6.0...v0.7.0
We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models. Read more about it directly in the docs.
| Before | After DDPO finetuning |
|---|---|
| <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_squirrel.png"/></div> | <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_squirrel.png"/></div> |
| <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_starfish.png"/></div> | <div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_starfish.png"/></div> |
The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below
Modeling] Add token support for hf_hub_download by @younesbelkada in https://github.com/huggingface/trl/pull/604response_template in DataCollatorForCompletionOnlyLM by @ivsanro1 in https://github.com/huggingface/trl/pull/622sft_llama2] Add check of arguments by @younesbelkada in https://github.com/huggingface/trl/pull/660CI] Fix unmutable TrainingArguments issue by @younesbelkada in https://github.com/huggingface/trl/pull/676dataclasses.replace by @tomaarsen in https://github.com/huggingface/trl/pull/682Full Changelog: https://github.com/huggingface/trl/compare/v0.5.0...v0.6.0
This release includes multiple important bugfixes (SFTTrainer, PPOTrainer), the release also extends the current DataCollatorForCompletionOnlyLM to support chat-like training.
The DPO algorithm (Direct Policy Optimization) has been introduced by Rafailov et al. in this paper and introduces a way of performing RL training without having to rely on a reward model. The DPOTrainer is now part of TRL library for anyone that wants to use it thanks to the amazing contributors!
DPO] Resolve logging for DPOTrainer by @tomaarsen in https://github.com/lvwerra/trl/pull/570_get_current_device() by @lewtun in https://github.com/lvwerra/trl/pull/515DataCollatorForCompletionOnlyLMYou can now mask out the users prompts in the DataCollatorForCompletionOnlyLM data collator and train only on chat completions. Check out the PR below or the appropriate section on the documentation to learn more about it!
Multiple bugs on the supported trainers have been raised by the community and fixed in the below PRs
core] Fix offline case by @younesbelkada in https://github.com/lvwerra/trl/pull/538SFTTrainer] Add warning for wrong padding_side by @younesbelkada in https://github.com/lvwerra/trl/pull/550SFTTrainer] Add epochs and num steps on CLI by @younesbelkada in https://github.com/lvwerra/trl/pull/562DataCollatorForCompletionOnlyLM in the docs by @younesbelkada in https://github.com/lvwerra/trl/pull/565PPO] fix corner cases with PPO batch size and forward_batch_size by @younesbelkada in https://github.com/lvwerra/trl/pull/563The examples and documentation has been refactored, check the PRs below for more details
examples] Big refactor of examples and documentation by @younesbelkada in https://github.com/lvwerra/trl/pull/509examples] Fix sentiment nit by @younesbelkada in https://github.com/lvwerra/trl/pull/517examples] make the sft script more modulable by @younesbelkada in https://github.com/lvwerra/trl/pull/543use_auth_token arg to sft_trainer example by @corey-lambda in https://github.com/lvwerra/trl/pull/544Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.7...v0.5.0
SFTTrainer and PPOTrainer bug fixes_prepare_dataset function by @BeibinLi in https://github.com/lvwerra/trl/pull/464CI] Fix CI RM by @younesbelkada in https://github.com/lvwerra/trl/pull/468float instead of double to avoid issues with MPS device by @younesbelkada in https://github.com/lvwerra/trl/pull/499PPOTrainer] Add prefix tuning support by @younesbelkada in https://github.com/lvwerra/trl/pull/501PPOTrainer] Add prompt tuning support on TRL by @younesbelkada in https://github.com/lvwerra/trl/pull/500SFTTrainer] Fix the sequence length check of SFTTrainer by @younesbelkada in https://github.com/lvwerra/trl/pull/512Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.6...v0.4.7
Patch release to fix a bug on google colab with PPOTrainer & PPOConfig + wandb
Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.5...v0.4.6
SFTTrainer enhancements and fixesThis patch release adds multiple fixes for the SFTTrainer and enhancements. Another patch release is coming for fixing an issue with PPOTrainer and Google Colab combined with wandb logging
SFTTrainer] Relax dataset constraints by @younesbelkada in https://github.com/lvwerra/trl/pull/442SFTTrainer] Fix non packed dataset by @younesbelkada in https://github.com/lvwerra/trl/pull/444core] Add stale bot by @younesbelkada in https://github.com/lvwerra/trl/pull/447SFTTrainer] Introducing DataCollatorForCompletionOnlyLM by @younesbelkada in https://github.com/lvwerra/trl/pull/445ConstantLengthDataset] Fix packed dataset issue by @younesbelkada in https://github.com/lvwerra/trl/pull/452Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.4...v0.4.5
core] unpin accelerate by @younesbelkada in https://github.com/lvwerra/trl/pull/418Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.3...v0.4.4
Patch release - pin accelerate version
Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.2...v0.4.3
A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!
SFTTrainer and RewardTrainerUse the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!
core] officially support SFT (Supervised Finetuning) by @younesbelkada in https://github.com/lvwerra/trl/pull/323SFT] Fix sft issues by @younesbelkada in https://github.com/lvwerra/trl/pull/336docs] fix SFT doc by @younesbelkada in https://github.com/lvwerra/trl/pull/367core] Officially Support Reward Modeling by @younesbelkada in https://github.com/lvwerra/trl/pull/303Pass 4bit models directly into PPOTrainer for more memory efficient training
core] Add 4bit QLora by @younesbelkada in https://github.com/lvwerra/trl/pull/383bnb] fix 4 bit SFT by @younesbelkada in https://github.com/lvwerra/trl/pull/396Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:
core] refactor peft API by @younesbelkada in https://github.com/lvwerra/trl/pull/231core] Add warning when negative KL by @younesbelkada in https://github.com/lvwerra/trl/pull/239pip cache by @SauravMaheshkar in https://github.com/lvwerra/trl/pull/198core] Fix DeepSpeed zero-3 issue by @younesbelkada in https://github.com/lvwerra/trl/pull/182distributed] Fix early stopping and DP by @younesbelkada in https://github.com/lvwerra/trl/pull/254core] Fix ds issue by @younesbelkada in https://github.com/lvwerra/trl/pull/260create_reference_model by @younesbelkada in https://github.com/lvwerra/trl/pull/261t5] Fix negative kl issue by @younesbelkada in https://github.com/lvwerra/trl/pull/262CI] Fix broken tests by @younesbelkada in https://github.com/lvwerra/trl/pull/318Docs] Add details on multi-GPU / multi-node by @younesbelkada in https://github.com/lvwerra/trl/pull/320PPO] Relax negative KL constraint by @younesbelkada in https://github.com/lvwerra/trl/pull/352PPOTrainer] Fix tensorboard issue by @younesbelkada in https://github.com/lvwerra/trl/pull/330core] Fix warning issue by @younesbelkada in https://github.com/lvwerra/trl/pull/377Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2
peft Data Parallelism support and distributed training bug fixesThis release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.
We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model
peft Data Parallelism supportpeft] Fix DP issues by @younesbelkada in https://github.com/lvwerra/trl/pull/221core] fix DP issue by @younesbelkada in https://github.com/lvwerra/trl/pull/222There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)
Your training runs can be now much more memory efficient thanks to few tricks / bug fixes:
Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues
This release also includes minor fixes related to PyTorch 2.0 release
test] attempt to fix CI test for PT 2.0 by @younesbelkada in https://github.com/lvwerra/trl/pull/225Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.0...v0.4.1
v0.4.0: peft integrationApply RLHF and fine-tune your favorite large model on consumer GPU using peft and trl ! Share also easily your trained RLHF adapters on the Hub with few lines of code
With this integration you can train gpt-neo-x (20B parameter model - 40GB in bfloat16) on a 24GB consumer GPU!
core] Fix quality issue by @younesbelkada in https://github.com/lvwerra/trl/pull/197peft integration by @edbeeching in https://github.com/lvwerra/trl/pull/163core] Update dependency by @younesbelkada in https://github.com/lvwerra/trl/pull/206Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.1...v0.4.0
Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.0...v0.3.1
set_seed to init.py by @lvwerra in https://github.com/lvwerra/trl/pull/127bug] Update gpt2-sentiment.py by @younesbelkada in https://github.com/lvwerra/trl/pull/132core] Small refactor of forward pass by @younesbelkada in https://github.com/lvwerra/trl/pull/136tests] Add correct repo name by @younesbelkada in https://github.com/lvwerra/trl/pull/138core] Add torch_dtype support by @younesbelkada in https://github.com/lvwerra/trl/pull/147core] Fix dataloader issue by @younesbelkada in https://github.com/lvwerra/trl/pull/154core] enable bf16 training by @younesbelkada in https://github.com/lvwerra/trl/pull/156core] fix saving multi-gpu by @younesbelkada in https://github.com/lvwerra/trl/pull/157core] Add max_grad_norm support by @younesbelkada in https://github.com/lvwerra/trl/pull/177Docs] Fix barplot by @younesbelkada in https://github.com/lvwerra/trl/pull/181Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.1...v0.3.0
datasets as a dependancy by @lvwerra in https://github.com/lvwerra/trl/pull/110Mapping in core for Python 3.10 by @lvwerra in https://github.com/lvwerra/trl/pull/112Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.0...v0.2.1