SFTTrainer now supports training with tools! You just have to add a column tools to your dataset, which contains a list of tool definitions as json schemas. The tools will be automatically registered and can be used in the training process.
from datasets import Dataset
from transformers.utils import get_json_schema
from trl import SFTTrainer
# Fictitious functions to simulate tool calls
def start_timer(duration: int) -> int:
"""
Starts a timer for the specified duration in seconds.
Args:
duration: Duration in seconds to set the timer for.
Returns:
The duration set for the timer.
"""
return duration
def create_reminder(time: str, note: str) -> str:
"""
Creates a reminder for the specified time and note.
Args:
time: The time for the reminder.
note: The note for the reminder.
Returns:
A confirmation message indicating that the reminder has been set.
"""
return "I'll remind you to call mom at 7 PM."
# Define the JSON schemas for the tools
start_timer = get_json_schema(start_timer)
create_reminder = get_json_schema(create_reminder)
dataset = Dataset.from_dict({
"messages": [
[
{"role": "user", "content": "Set a timer for 10 minutes."},
{"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "start_timer", "arguments": {"duration": 600}}}]},
{"role": "tool", "name": "start_timer", "content": "600"},
{"role": "assistant", "content": "Timer set for 10 minutes."},
],
...,
],
"tools": [
[start_timer, create_reminder],
...,
]
})
# Initialize the trainer
trainer = SFTTrainer(model="Qwen3-0.6B", train_dataset=dataset)
# Train the model
trainer.train()
by @qgallouedec in https://github.com/huggingface/trl/pull/3597
We introduce a new packing method: FFD (First Fit Decreasing) packing. This method is designed to optimize the packing of sequences in a way that more efficiently reduces the size of the training dataset by grouping examples more effectively. Previously, we used a wrapped packing method, which often truncated sequences even when they were not longer than the maximum sequence length. The new FFD packing method avoids unnecessary truncation by grouping sequences more intelligently. This new packing strategy is now the default when packing is enabled.
training_args = SFTConfig(..., packing=True)

by @qgallouedec in https://github.com/huggingface/trl/pull/3521 and accelerated by @mariosasko in https://github.com/huggingface/trl/pull/3537
The DPOTrainer now supports the Liger-powered DPO loss, enabling faster training with lower memory usage.
training_args = DPOConfig(..., use_liger_loss=True)
by @kashif in https://github.com/huggingface/trl/pull/2568
setup_chat_format and add clone_chat_templateWe introduce clone_chat_template, a more convenient and flexible function for setting up chat templates from any tokenizer that already includes one. It handles EOS tokens and copies all added tokens from the source tokenizer, preserving their "special" status.
You can either use this function directly:
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import clone_chat_template
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model, tokenizer = clone_chat_template(model, tokenizer, "Qwen/Qwen3-4B")
or use the chat_template_path parameter in SFTConfig to specify a chat template, which will be automatically cloned when the SFTTrainer is initialized.
from trl import SFTConfig
training_args = SFTConfig(chat_template_path="Qwen/Qwen3-4B")
by @qgallouedec in https://github.com/huggingface/trl/pull/3404 and https://github.com/huggingface/trl/pull/3599
SFTTrainer now supports passing additional keyword arguments to the chat template. This allows for more flexibility in customizing the chat format during training. To enable it, just add a chat_template_kwargs column to your your dataset.
example = {'messages': [{'content': 'What is better than ugly?', 'role': 'user'},
{'content': 'Beautiful.', 'role': 'assistant'}]
'chat_template_kwargs': {'my_template_arg': 'my_value'}}
by @qgallouedec in https://github.com/huggingface/trl/pull/3609
The SFTTrainer now supports training on assistant messages only
example = {'messages': [
{'role': 'user', 'content': 'What is better than ugly?'}, # masked in the loss
{'role': 'assistant', 'content': 'Beautiful.'}, # used in the loss
{'role': 'user', 'content': 'And what is better than implicit?'}, # masked in the loss
{'role': 'assistant', 'content': 'Explicit.'}, # used in the loss
]}
by @qgallouedec in https://github.com/huggingface/trl/pull/3586
generation_kwargs as a property of GRPOConfig to support additional generation argumentsThe GRPOConfig now includes a generation_kwargs property, allowing users to specify additional generation arguments for the GRPOTrainer. This allows for further customization of the generation behavior, such as setting suppress_tokens, num_beams, etc.
Depending on the generation backend used (transformers or vLLM), this property will be passed either to transformers.GenerationConfig (if using transformers) or vllm.SamplingParams (if using vLLM).
from trl import GRPOConfig
training_args = GRPOConfig(..., generation_kwargs={"length_penalty": -0.1})
by @pramodith in https://github.com/huggingface/trl/pull/3617
beta=0.0 for GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3516logging_steps=10 by @qgallouedec in https://github.com/huggingface/trl/pull/3514bf16=True by @qgallouedec in https://github.com/huggingface/trl/pull/3515IterableDataset in DPO Trainer by @h-tonywu in https://github.com/huggingface/trl/pull/3559labels are retained in self._signature_columns by @sxndqc in https://github.com/huggingface/trl/pull/3589vllm_gpu_memory_utilization recommendation script by @toslali-ibm in https://github.com/huggingface/trl/pull/3554setup.cfg by @qgallouedec in https://github.com/huggingface/trl/pull/3511beta=0.0 for GRPO by @qgallouedec in https://github.com/huggingface/trl/pull/3516logging_steps=10 by @qgallouedec in https://github.com/huggingface/trl/pull/3514getattr to get gradient_checkpointing by @qgallouedec in https://github.com/huggingface/trl/pull/3535_VALID_DICT_FIELDS by @qgallouedec in https://github.com/huggingface/trl/pull/3553torch.autocast and make it cover XPU by @yao-matrix in https://github.com/huggingface/trl/pull/3541IterableDataset in DPO Trainer by @h-tonywu in https://github.com/huggingface/trl/pull/3559bf16=True by @qgallouedec in https://github.com/huggingface/trl/pull/3515setup_chat_format and add clone_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/3404logging_steps parameter from for simpler setup by @qgallouedec in https://github.com/huggingface/trl/pull/3612Trainer::create_model_card by @LeonEricsson in https://github.com/huggingface/trl/pull/3613enforce_eager default value in vLLM server. by @LeonEricsson in https://github.com/huggingface/trl/pull/3607labels are retained in self._signature_columns by @sxndqc in https://github.com/huggingface/trl/pull/3589vllm_gpu_memory_utilization recommendation script by @toslali-ibm in https://github.com/huggingface/trl/pull/3554max_prompt_length) with vLLM. by @LeonEricsson in https://github.com/huggingface/trl/pull/3601generation_kwargs as a property of GRPOConfig to support additional generation arguments. by @pramodith in https://github.com/huggingface/trl/pull/3617chat_template_path parameter to SFTConfig by @qgallouedec in https://github.com/huggingface/trl/pull/3599Full Changelog: https://github.com/huggingface/trl/compare/v0.18.0...v0.19.0
Fetched April 7, 2026