peft Data Parallelism support and distributed training bug fixesThis release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.
We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model

peft Data Parallelism supportpeft] Fix DP issues by @younesbelkada in https://github.com/lvwerra/trl/pull/221core] fix DP issue by @younesbelkada in https://github.com/lvwerra/trl/pull/222There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)
Your training runs can be now much more memory efficient thanks to few tricks / bug fixes:
Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues
This release also includes minor fixes related to PyTorch 2.0 release
test] attempt to fix CI test for PT 2.0 by @younesbelkada in https://github.com/lvwerra/trl/pull/225Full Changelog: https://github.com/lvwerra/trl/compare/v0.4.0...v0.4.1
Fetched April 7, 2026