A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!
SFTTrainer and RewardTrainerUse the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!
core] officially support SFT (Supervised Finetuning) by @younesbelkada in https://github.com/lvwerra/trl/pull/323SFT] Fix sft issues by @younesbelkada in https://github.com/lvwerra/trl/pull/336docs] fix SFT doc by @younesbelkada in https://github.com/lvwerra/trl/pull/367core] Officially Support Reward Modeling by @younesbelkada in https://github.com/lvwerra/trl/pull/303Pass 4bit models directly into PPOTrainer for more memory efficient training
core] Add 4bit QLora by @younesbelkada in https://github.com/lvwerra/trl/pull/383bnb] fix 4 bit SFT by @younesbelkada in https://github.com/lvwerra/trl/pull/396Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:
core] refactor peft API by @younesbelkada in https://github.com/lvwerra/trl/pull/231core] Add warning when negative KL by @younesbelkada in https://github.com/lvwerra/trl/pull/239pip cache by @SauravMaheshkar in https://github.com/lvwerra/trl/pull/198core] Fix DeepSpeed zero-3 issue by @younesbelkada in https://github.com/lvwerra/trl/pull/182distributed] Fix early stopping and DP by @younesbelkada in https://github.com/lvwerra/trl/pull/254core] Fix ds issue by @younesbelkada in https://github.com/lvwerra/trl/pull/260create_reference_model by @younesbelkada in https://github.com/lvwerra/trl/pull/261t5] Fix negative kl issue by @younesbelkada in https://github.com/lvwerra/trl/pull/262CI] Fix broken tests by @younesbelkada in https://github.com/lvwerra/trl/pull/318Docs] Add details on multi-GPU / multi-node by @younesbelkada in https://github.com/lvwerra/trl/pull/320PPO] Relax negative KL constraint by @younesbelkada in https://github.com/lvwerra/trl/pull/352PPOTrainer] Fix tensorboard issue by @younesbelkada in https://github.com/lvwerra/trl/pull/330core] Fix warning issue by @younesbelkada in https://github.com/lvwerra/trl/pull/377Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2
Fetched April 7, 2026