releases.shpreview

v0.4.2

$npx -y @buildinternet/releases show rel_6Q9xRsQpJ5M2219DWT_nb

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing SFTTrainer and RewardTrainer

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

Bug fixes and improvements

New Contributors

Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2

Fetched April 7, 2026