v0.23.0 — TRL — releases.sh

Major

🥓 Context Parallelism

SFT now supports Context Parallelism (CP) for training large language models on very large sequences. You can now train with an arbitrarily long sequence length.

by @kashif in https://github.com/huggingface/trl/pull/3994

🧨 Dynamic Fine-Tuning

Dynamic Fine-Tuning (DFT) is a nnow supported in TRL.

from trl import SFTConfig

training_args = SFTConfig(
    loss_type="dft",
    ...
)

by @qgallouedec in https://github.com/huggingface/trl/pull/4042

🪵 Truncated Importance Sampling (TIS) to address rollout-training mismatch

Different implementations are used for rollout generation (vLLM) and model training. The implementation gap implicitly turns the on-policy RL to be off-policy. Truncated Importance Sampling (TIS) a simple yet effective importance sampling technique for handling such discrepancy. This is now implemented in GRPO.

from trl import GRPOConfig

training_args = GRPOConfig(
    ...
    use_vllm=True,
    vllm_importance_sampling_correction=True, # default True
    vllm_importance_sampling_cap=2.0, # hyper-parameter C
)

by @LeonEricsson in https://github.com/huggingface/trl/pull/3867

🥣 [SFTTrainer]: Add Aux Loss for MoE models

Mixture of Experts (MoE) models require an auxiliary loss to ensure that the different experts are used evenly. This auxiliary loss is now supported in SFTTrainer.

training_args = SFTConfig(
    model_init_kwargs={"output_router_logits": True},
    ...
)

by @pramodith in https://github.com/huggingface/trl/pull/4012

💤 [GRPO/RLOO] Adds an option to sleep vllm when running in colocated mode

When running GRPO (or RLOO) with vLLM in colocated mode, the vLLM server consume VRAM during optimization while not being used. We now have an option to put the vLLM server to sleep during optimization to free up VRAM.

from trl import GRPOConfig

training_args = GRPOConfig(..., vllm_sleep_enabled=True)

by @edbeeching in https://github.com/huggingface/trl/pull/3968

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer

You can now use vLLM server mode with OnlineDPOTrainer. Additionally, VLM models are now supported.

by @vaelev in https://github.com/huggingface/trl/pull/3783

Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations

The paper index has been significantly enhanced with the addition of 9+ new algorithm implementations, providing a more comprehensive resource for users.

by @behroozazarkhalili in https://github.com/huggingface/trl/pull/3990

Other Notable Changes

👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in https://github.com/huggingface/trl/pull/3969
🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in https://github.com/huggingface/trl/pull/4013

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/3978
👮 Fix GRPO CLI by setting parameters for get_soft_overlong_punishment by @qgallouedec in https://github.com/huggingface/trl/pull/3972
🪃 args.gradient_checkpointing = False instead of args = dataclasses.replace(args, gradient_checkpointing=False) by @qgallouedec in https://github.com/huggingface/trl/pull/3981
[GRPO] Adds an option to sleep vllm when running in colocated mode by @edbeeching in https://github.com/huggingface/trl/pull/3968
🎯 Add Trackio integration documentation and update TOC by @qgallouedec in https://github.com/huggingface/trl/pull/3971
⚖️ Fix scale_rewards issue in GRPO by @Peter-Chou in https://github.com/huggingface/trl/pull/3992
⏰ fix: add return to shift_tokens_right by @ginkyenglee in https://github.com/huggingface/trl/pull/3987
Add pre-commit and hf-doc-builder as dev dependencies by @albertvillanova in https://github.com/huggingface/trl/pull/3993
[GRPO] Truncated Importance Sampling to address rollout-training mismatch by @LeonEricsson in https://github.com/huggingface/trl/pull/3867
Fixed tags shown problem in memory usage docs by @sergiopaniego in https://github.com/huggingface/trl/pull/3999
✖️ Support pad-to-multiple-of and padding-free by @qgallouedec in https://github.com/huggingface/trl/pull/3996
💾 [bugfix] fix PPO save_checkpoint by @hjh0119 in https://github.com/huggingface/trl/pull/3998
[GRPO]: Fix Multi-GPU training for Entropy based masking of tokens. by @pramodith in https://github.com/huggingface/trl/pull/3964
📏 torch_dype to dtype everywhere by @sergiopaniego in https://github.com/huggingface/trl/pull/4000
Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations by @behroozazarkhalili in https://github.com/huggingface/trl/pull/3990
[SFT] fix: collator docstring by @LeonEricsson in https://github.com/huggingface/trl/pull/4011
👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in https://github.com/huggingface/trl/pull/3969
🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in https://github.com/huggingface/trl/pull/4013
[SFTTrainer]: Add Aux Loss for MoE models. by @pramodith in https://github.com/huggingface/trl/pull/4012
Add missing doc strings in SFTrainer by @pramodith in https://github.com/huggingface/trl/pull/4003
⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer by @vaelev in https://github.com/huggingface/trl/pull/3783
Fix typo in GRPO quickstart by @dwisdom0 in https://github.com/huggingface/trl/pull/4020
Align docstring parameters with function definitions by @albertvillanova in https://github.com/huggingface/trl/pull/4017
Fix formatting errors in docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4025
[doc] Paper index for Truncated Importance Sampling by @LeonEricsson in https://github.com/huggingface/trl/pull/4026
[doc] Group paper index by trainer by @LeonEricsson in https://github.com/huggingface/trl/pull/4027
Add missing trainer docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4030
Add autodoc for AlignPropTrainer and AlignPropConfig by @albertvillanova in https://github.com/huggingface/trl/pull/4033
🥓 [docs] add CP docs by @kashif in https://github.com/huggingface/trl/pull/3994
⚖️ Remove average_tokens_across_devices default replacement by @qgallouedec in https://github.com/huggingface/trl/pull/4039
CI hotfix: xfail test_training_with_transformers_paged by @albertvillanova in https://github.com/huggingface/trl/pull/4046
Update transformers minimum version to 4.56.1 by @albertvillanova in https://github.com/huggingface/trl/pull/4047
🧨 DFT by @qgallouedec in https://github.com/huggingface/trl/pull/4042
Update VLM arch check to AutoModelForImageTextToText for DPO and Online DPO by @sergiopaniego in https://github.com/huggingface/trl/pull/4049
🏂 Fix label shifting logic in SFTTrainer for compatibility with CP by @qgallouedec in https://github.com/huggingface/trl/pull/4038
Add autodoc for BestOfNSampler and improve docstrings by @albertvillanova in https://github.com/huggingface/trl/pull/4034
✨ Improve SFT doc by @qgallouedec in https://github.com/huggingface/trl/pull/4005
💬 Remove setting chat template in sft script by @qgallouedec in https://github.com/huggingface/trl/pull/4037
🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index by @qgallouedec in https://github.com/huggingface/trl/pull/4051
🗜 Hotfix: avoid passing quantization_config=None by @qgallouedec in https://github.com/huggingface/trl/pull/4019
Release: 0.23 by @qgallouedec in https://github.com/huggingface/trl/pull/4053

New Contributors

@Peter-Chou made their first contribution in https://github.com/huggingface/trl/pull/3992
@ginkyenglee made their first contribution in https://github.com/huggingface/trl/pull/3987
@albertvillanova made their first contribution in https://github.com/huggingface/trl/pull/3993
@hjh0119 made their first contribution in https://github.com/huggingface/trl/pull/3998
@vaelev made their first contribution in https://github.com/huggingface/trl/pull/3783
@dwisdom0 made their first contribution in https://github.com/huggingface/trl/pull/4020

Full Changelog: https://github.com/huggingface/trl/compare/v0.22.0...v0.23.0