v0.19.0

Breaking and major changes

🧰 [SFT] Tool support

SFTTrainer now supports training with tools! You just have to add a column tools to your dataset, which contains a list of tool definitions as json schemas. The tools will be automatically registered and can be used in the training process.

from datasets import Dataset
from transformers.utils import get_json_schema
from trl import SFTTrainer

# Fictitious functions to simulate tool calls
def start_timer(duration: int) -> int:
    """
    Starts a timer for the specified duration in seconds.

    Args:
        duration: Duration in seconds to set the timer for.

    Returns:
        The duration set for the timer.
    """
    return duration

def create_reminder(time: str, note: str) -> str:
    """
    Creates a reminder for the specified time and note.

    Args:
        time: The time for the reminder.
        note: The note for the reminder.

    Returns:
        A confirmation message indicating that the reminder has been set.
    """
    return "I'll remind you to call mom at 7 PM."

# Define the JSON schemas for the tools
start_timer = get_json_schema(start_timer)
create_reminder = get_json_schema(create_reminder)

dataset = Dataset.from_dict({
    "messages": [
        [
            {"role": "user", "content": "Set a timer for 10 minutes."},
            {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "start_timer", "arguments": {"duration": 600}}}]},
            {"role": "tool", "name": "start_timer", "content": "600"},
            {"role": "assistant", "content": "Timer set for 10 minutes."},
        ],
        ...,
    ],
    "tools": [
        [start_timer, create_reminder],
        ...,
    ]
})

# Initialize the trainer
trainer = SFTTrainer(model="Qwen3-0.6B", train_dataset=dataset)

# Train the model
trainer.train()

by @qgallouedec in https://github.com/huggingface/trl/pull/3597

📉 FFD packing

We introduce a new packing method: FFD (First Fit Decreasing) packing. This method is designed to optimize the packing of sequences in a way that more efficiently reduces the size of the training dataset by grouping examples more effectively. Previously, we used a wrapped packing method, which often truncated sequences even when they were not longer than the maximum sequence length. The new FFD packing method avoids unnecessary truncation by grouping sequences more intelligently. This new packing strategy is now the default when packing is enabled.

training_args = SFTConfig(..., packing=True)

by @qgallouedec in https://github.com/huggingface/trl/pull/3521 and accelerated by @mariosasko in https://github.com/huggingface/trl/pull/3537

[Liger] liger DPO support

The DPOTrainer now supports the Liger-powered DPO loss, enabling faster training with lower memory usage.

training_args = DPOConfig(..., use_liger_loss=True)

by @kashif in https://github.com/huggingface/trl/pull/2568

💬 Fix `setup_chat_format` and add `clone_chat_template`

We introduce clone_chat_template, a more convenient and flexible function for setting up chat templates from any tokenizer that already includes one. It handles EOS tokens and copies all added tokens from the source tokenizer, preserving their "special" status. You can either use this function directly:

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import clone_chat_template

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")

model, tokenizer = clone_chat_template(model, tokenizer, "Qwen/Qwen3-4B")

or use the chat_template_path parameter in SFTConfig to specify a chat template, which will be automatically cloned when the SFTTrainer is initialized.

from trl import SFTConfig

training_args = SFTConfig(chat_template_path="Qwen/Qwen3-4B")

by @qgallouedec in https://github.com/huggingface/trl/pull/3404 and https://github.com/huggingface/trl/pull/3599

📚 SFTTrainer support chat template kwargs

SFTTrainer now supports passing additional keyword arguments to the chat template. This allows for more flexibility in customizing the chat format during training. To enable it, just add a chat_template_kwargs column to your your dataset.

example = {'messages': [{'content': 'What is better than ugly?', 'role': 'user'},
                        {'content': 'Beautiful.', 'role': 'assistant'}]
           'chat_template_kwargs': {'my_template_arg': 'my_value'}}

by @qgallouedec in https://github.com/huggingface/trl/pull/3609

🤵‍♂️ SFT on assistant messages only

The SFTTrainer now supports training on assistant messages only

example = {'messages': [
    {'role': 'user', 'content': 'What is better than ugly?'},          # masked in the loss
    {'role': 'assistant', 'content': 'Beautiful.'},                    # used in the loss
    {'role': 'user', 'content': 'And what is better than implicit?'},  # masked in the loss
    {'role': 'assistant', 'content': 'Explicit.'},                     # used in the loss
]}

by @qgallouedec in https://github.com/huggingface/trl/pull/3586

🧬 Add `generation_kwargs` as a property of `GRPOConfig` to support additional generation arguments

The GRPOConfig now includes a generation_kwargs property, allowing users to specify additional generation arguments for the GRPOTrainer. This allows for further customization of the generation behavior, such as setting suppress_tokens, num_beams, etc. Depending on the generation backend used (transformers or vLLM), this property will be passed either to transformers.GenerationConfig (if using transformers) or vllm.SamplingParams (if using vLLM).

from trl import GRPOConfig

training_args = GRPOConfig(..., generation_kwargs={"length_penalty": -0.1})

by @pramodith in https://github.com/huggingface/trl/pull/3617

New defaults

🎀 New default: beta=0.0 for GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3516
🎀 New defaults: preparing the new structure by @qgallouedec in https://github.com/huggingface/trl/pull/3530
🎀 New defaults: logging_steps=10 by @qgallouedec in https://github.com/huggingface/trl/pull/3514
🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig by @edbeeching in https://github.com/huggingface/trl/pull/3538
🎀 New defaults: bf16=True by @qgallouedec in https://github.com/huggingface/trl/pull/3515

Minor changes

Add support for IterableDataset in DPO Trainer by @h-tonywu in https://github.com/huggingface/trl/pull/3559
🔖 Fix: ensure user-provided labels are retained in self._signature_columns by @sxndqc in https://github.com/huggingface/trl/pull/3589
⭐ Add vllm_gpu_memory_utilization recommendation script by @toslali-ibm in https://github.com/huggingface/trl/pull/3554

What's Changed

⬆️ Bump dev version by @qgallouedec in https://github.com/huggingface/trl/pull/3505
📎 Fix clip ratio logging by @qgallouedec in https://github.com/huggingface/trl/pull/3506
📚 Fix doc building by removing vLLM from dev dependencies in setup.cfg by @qgallouedec in https://github.com/huggingface/trl/pull/3511
🧭 Patch release guide by @qgallouedec in https://github.com/huggingface/trl/pull/3512
🎀 New default: beta=0.0 for GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3516
Add "🐯 Liger GRPO meets TRL" by @qgallouedec in https://github.com/huggingface/trl/pull/3525
📉 FFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/3521
🎀 New defaults: preparing the new structure by @qgallouedec in https://github.com/huggingface/trl/pull/3530
🪦 RIP trl chat by @shirinyamani in https://github.com/huggingface/trl/pull/3531
🎀 New defaults: logging_steps=10 by @qgallouedec in https://github.com/huggingface/trl/pull/3514
📰 Add blog "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" by @qgallouedec in https://github.com/huggingface/trl/pull/3527
🎯 Don't use getattr to get gradient_checkpointing by @qgallouedec in https://github.com/huggingface/trl/pull/3535
🧭 Remove useless transformers version checks by @qgallouedec in https://github.com/huggingface/trl/pull/3534
🐳 Add DeepseekV3 model configurations and update tests for new models by @qgallouedec in https://github.com/huggingface/trl/pull/3536
💭 [Data] Fix DeepSeek-R1 case by @kashif in https://github.com/huggingface/trl/pull/3522
🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig by @edbeeching in https://github.com/huggingface/trl/pull/3538
⚡ Faster FFD packing by @mariosasko in https://github.com/huggingface/trl/pull/3537
📦 Packing with flash attn kwargs to avoid cross-contamination by @thepowerfuldeez in https://github.com/huggingface/trl/pull/3526
💽 [TRLParser] Fail when unknown args are provided in the config file. by @edbeeching in https://github.com/huggingface/trl/pull/3543
🛋️ Fix CI and bump accelerate by @qgallouedec in https://github.com/huggingface/trl/pull/3551
🧮 Rearrange DPOTrainer by @DaizeDong in https://github.com/huggingface/trl/pull/3501
🆙 Bump transformers to 4.51 and use _VALID_DICT_FIELDS by @qgallouedec in https://github.com/huggingface/trl/pull/3553
Update tests_latest.yml by @qgallouedec in https://github.com/huggingface/trl/pull/3558
ℹ️ Unify autocast behavior to torch.autocast and make it cover XPU by @yao-matrix in https://github.com/huggingface/trl/pull/3541
Fix dev version by @Tavish9 in https://github.com/huggingface/trl/pull/3570
[Liger] liger DPO support by @kashif in https://github.com/huggingface/trl/pull/2568
Add support for IterableDataset in DPO Trainer by @h-tonywu in https://github.com/huggingface/trl/pull/3559
🏗️ Add test for training with multiple dataloader workers and update worker initialization for compatibility with transformers 4.52.0 by @qgallouedec in https://github.com/huggingface/trl/pull/3568
🫸 Push model card with checkpoint by @qgallouedec in https://github.com/huggingface/trl/pull/3550
Add Community Tutorial: GRPO text summarization example with Unsloth optimizations by @amanzoni1 in https://github.com/huggingface/trl/pull/3576
🎀 New defaults: bf16=True by @qgallouedec in https://github.com/huggingface/trl/pull/3515
📨 [SFT] Tokenize directly when applying the chat template by @qgallouedec in https://github.com/huggingface/trl/pull/3572
Adjust max_num_batched_tokens by @toslali-ibm in https://github.com/huggingface/trl/pull/3565
💡 Fix type hints in trainer/utils.py by @bvantuan in https://github.com/huggingface/trl/pull/3591
💡 Fix wrong type hint for formatting_func argument in SFTTrainer by @MaiqiVerse in https://github.com/huggingface/trl/pull/3584
💬 Fix setup_chat_format and add clone_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/3404
🛡️ Adding trust_remote_code to vllm-serve by @maziyarpanahi in https://github.com/huggingface/trl/pull/3588
Fix typos and improve metric descriptions in documentation by @vtjl10 in https://github.com/huggingface/trl/pull/3585
Fix Typo in Documentation and Notebook; Improve Library Installation Comment by @zeevick10 in https://github.com/huggingface/trl/pull/3593
♻️ Avoids redundant calculation of ref logps in the new policy update loop by @zkpranav in https://github.com/huggingface/trl/pull/3600
🗳️ Remove logging_steps parameter from for simpler setup by @qgallouedec in https://github.com/huggingface/trl/pull/3612
Fix: list-typed tags handling in Trainer::create_model_card by @LeonEricsson in https://github.com/huggingface/trl/pull/3613
Fix Typos in Comments and Improve Clarity in Trainer Modules by @maximevtush in https://github.com/huggingface/trl/pull/3596
Change enforce_eager default value in vLLM server. by @LeonEricsson in https://github.com/huggingface/trl/pull/3607
[SFT] Clarify default collator docs by @LeonEricsson in https://github.com/huggingface/trl/pull/3606
🏁 Refactor reference model initialization in GRPOTrainer by @Tavish9 in https://github.com/huggingface/trl/pull/3575
🏛️ Fix CI and Iterative SFT by @qgallouedec in https://github.com/huggingface/trl/pull/3614
👔 Apply doc-builder style by @qgallouedec in https://github.com/huggingface/trl/pull/3615
🔖 Fix: ensure user-provided labels are retained in self._signature_columns by @sxndqc in https://github.com/huggingface/trl/pull/3589
📚 SFTTrainer support chat template kwargs by @qgallouedec in https://github.com/huggingface/trl/pull/3609
🦘 Skip no-op ChatML conversion for datasets already in ChatML format by @qgallouedec in https://github.com/huggingface/trl/pull/3594
🤵‍♂️ SFT on assistant messages only by @qgallouedec in https://github.com/huggingface/trl/pull/3586
🎁 Put the reward computation in a separate function by @ajtejankar in https://github.com/huggingface/trl/pull/3620
⭐ Add vllm_gpu_memory_utilization recommendation script by @toslali-ibm in https://github.com/huggingface/trl/pull/3554
[GRPO] Fix prompt truncation (max_prompt_length) with vLLM. by @LeonEricsson in https://github.com/huggingface/trl/pull/3601
🧬 Add generation_kwargs as a property of GRPOConfig to support additional generation arguments. by @pramodith in https://github.com/huggingface/trl/pull/3617
📜 Add chat_template_path parameter to SFTConfig by @qgallouedec in https://github.com/huggingface/trl/pull/3599
⚔️ Fix bf16 fp16 config conflict issue by @yao-matrix in https://github.com/huggingface/trl/pull/3598
🔍 Add test to verify chat template consistency by @qgallouedec in https://github.com/huggingface/trl/pull/3624
🧰 [SFT] Tool support by @qgallouedec in https://github.com/huggingface/trl/pull/3597
Release: v0.19 by @qgallouedec in https://github.com/huggingface/trl/pull/3625

New Contributors

@thepowerfuldeez made their first contribution in https://github.com/huggingface/trl/pull/3526
@DaizeDong made their first contribution in https://github.com/huggingface/trl/pull/3501
@h-tonywu made their first contribution in https://github.com/huggingface/trl/pull/3559
@amanzoni1 made their first contribution in https://github.com/huggingface/trl/pull/3576
@bvantuan made their first contribution in https://github.com/huggingface/trl/pull/3591
@MaiqiVerse made their first contribution in https://github.com/huggingface/trl/pull/3584
@maziyarpanahi made their first contribution in https://github.com/huggingface/trl/pull/3588
@vtjl10 made their first contribution in https://github.com/huggingface/trl/pull/3585
@zeevick10 made their first contribution in https://github.com/huggingface/trl/pull/3593
@zkpranav made their first contribution in https://github.com/huggingface/trl/pull/3600
@sxndqc made their first contribution in https://github.com/huggingface/trl/pull/3589
@ajtejankar made their first contribution in https://github.com/huggingface/trl/pull/3620
@pramodith made their first contribution in https://github.com/huggingface/trl/pull/3617

Full Changelog: https://github.com/huggingface/trl/compare/v0.18.0...v0.19.0