releases.shpreview

TRL

$npx -y @buildinternet/releases show trl
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases10Avg3/moVersionsv0.27.0 → v1.2.0
Dec 22, 2023
v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`

Patch release: Multi-tag instead of single tags for xxxTrainer

This is a patch release to push multiple tags (e.g. trl & sft) instead of one tag

What's Changed

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.5...v0.7.6

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

This release introduces many new features in TRL for DPOTrainer:

  • IPO-loss for a better generalization of DPO algorithm
  • KTO & cDPO loss
  • You can also pass pre-computed logits to DPOTrainer

Automatic xxxTrainer tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5

Nov 10, 2023
v0.7.4: Patch Release

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.3...v0.7.4

v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

Read more about it here

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.2...v0.7.3

Oct 12, 2023

0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2

Aug 30, 2023
v0.7.1: Patch release

Patch release: fix bug with PPOTrainer and log_stats

Fixed a bug with log_stats of PPOTrainer to avoid breaking behaviour

What's Changed

Full Changelog: https://github.com/huggingface/trl/compare/v0.7.0...v0.7.1

v0.7.0: Text Environments, Agents & Tools

Text environments, LLMs with tools and agents!

<div style="text-align: center"> <img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/textenv.png"> </div>

Text environments provide a learning ground for language agents. It allows a language model to use tools to accomplish a task such as using a Python interpreter to answer math questions or using a search index for trivia questions. Having access to tools allows language models to solve tasks that would be very hard for the models itself but can be trivial for the appropriate tools.

We are excited to bring to the community a complete set of functionalities and full examples to train LLMs to use tools!

Check out the documentation page here and few examples below:

What's Changed

Full Changelog: https://github.com/huggingface/trl/compare/v0.6.0...v0.7.0

Aug 25, 2023

DDPO for diffusion models

We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models. Read more about it directly in the docs.

BeforeAfter DDPO finetuning
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_squirrel.png"/></div><div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_squirrel.png"/></div>
<div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/pre_starfish.png"/></div><div style="text-align: center"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/post_starfish.png"/></div>

Bug fixes and other enhancements

The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.5.0...v0.6.0

Aug 2, 2023

v0.5.0 DPOTrainer and multiple bug fixes on PPOTrainer and SFTTrainer

This release includes multiple important bugfixes (SFTTrainer, PPOTrainer), the release also extends the current DataCollatorForCompletionOnlyLM to support chat-like training.

DPO Trainer

The DPO algorithm (Direct Policy Optimization) has been introduced by Rafailov et al. in this paper and introduces a way of performing RL training without having to rely on a reward model. The DPOTrainer is now part of TRL library for anyone that wants to use it thanks to the amazing contributors!

What's Changed

Extending the DataCollatorForCompletionOnlyLM

You can now mask out the users prompts in the DataCollatorForCompletionOnlyLM data collator and train only on chat completions. Check out the PR below or the appropriate section on the documentation to learn more about it!

Important bug fixes

Multiple bugs on the supported trainers have been raised by the community and fixed in the below PRs

Big refactor of examples and documentation

The examples and documentation has been refactored, check the PRs below for more details

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.7...v0.5.0

Jul 13, 2023

Patch release: SFTTrainer and PPOTrainer bug fixes

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.6...v0.4.7

Jun 23, 2023

Patch release

Patch release to fix a bug on google colab with PPOTrainer & PPOConfig + wandb

What's Changed

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.5...v0.4.6

Patch release 1 - SFTTrainer enhancements and fixes

This patch release adds multiple fixes for the SFTTrainer and enhancements. Another patch release is coming for fixing an issue with PPOTrainer and Google Colab combined with wandb logging

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.4...v0.4.5

Jun 8, 2023

Patch release

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.3...v0.4.4

0.4.3 Patch release

Patch release - pin accelerate version

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.2...v0.4.3

Jun 7, 2023

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing SFTTrainer and RewardTrainer

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

Bug fixes and improvements

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2

Mar 17, 2023

Large models training, Naive Pipeline Parallelism, peft Data Parallelism support and distributed training bug fixes

This release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.

Naive Pipeline Parallelism support

We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model

peft Data Parallelism support

There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)

Memory optimization

Your training runs can be now much more memory efficient thanks to few tricks / bug fixes: Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues

Pytorch 2.0 fixes

This release also includes minor fixes related to PyTorch 2.0 release

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.0...v0.4.1

Mar 9, 2023

v0.4.0: peft integration

Apply RLHF and fine-tune your favorite large model on consumer GPU using peft and trl ! Share also easily your trained RLHF adapters on the Hub with few lines of code

With this integration you can train gpt-neo-x (20B parameter model - 40GB in bfloat16) on a 24GB consumer GPU!

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.1...v0.4.0

Mar 2, 2023

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.3.0...v0.3.1

Mar 1, 2023

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.1...v0.3.0

Jan 25, 2023

What's Changed

Full Changelog: https://github.com/lvwerra/trl/compare/v0.2.0...v0.2.1

Latest
v1.2.0
Tracking Since
Jan 25, 2023
Last checked Apr 19, 2026