environment_factory to GRPOTrainerGRPOTrainer now accepts an environment_factory argument, allowing users to specify a custom environment class for training. This enables more flexible and diverse training scenarios by letting users define their own environments with specific dynamics and reward structures.
from datasets import Dataset
from trl import GRPOConfig, GRPOTrainer
dataset = Dataset.from_dict({
"prompt": [[{"role": "user", "content": f"Increment the counter by {i}."}] for i in range(1, 7)]
})
def reward_func(environments, **kwargs):
return [env.counter for env in environments]
class IncrementEnv:
def reset(self):
self.counter = 0
def increment(self, step: int) -> int:
"""
Increment the internal counter.
Args:
step: Value to add to the counter.
Returns:
The updated counter value.
"""
self.counter += step
return self.counter
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(chat_template_kwargs={"enable_thinking": False}),
train_dataset=dataset,
reward_funcs=reward_func,
environment_factory=IncrementEnv,
)
trainer.train()
by @qgallouedec in https://github.com/huggingface/trl/pull/5093
TRL introduces agent-native CLI Integration: trl-training, a first-class Agent Skill that exposes TRL’s training workflows (SFT, DPO, GRPO, etc.) in a structured, agent-readable format. The skill is packaged directly with the trl library and can be installed via the CLI:
# Install into the project's agent directory (default scope=project), by agent name: claude, codex, opencode
trl skills install trl-training --target <agent>
This enables AI agents to safely and reproducibly execute TRL training workflows using a well-defined interface.
Skills can be installed at the project or global scope, and support explicit targets and overwrite controls.
launch_args for all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059num_labels to 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066None in get_trackio_space_url() to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115trl <command> --help TypeError caused by unescaped % in TrainingArguments help strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135SFTTrainer support for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132grpo_trainer.md by @casinca in https://github.com/huggingface/trl/pull/5047RewardTrainer collator from chosen/rejected_input_ids to chosen/rejected_ids by @qgallouedec in https://github.com/huggingface/trl/pull/5179RewardTrainer tests by @qgallouedec in https://github.com/huggingface/trl/pull/5060get_training_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/5108train_dataset is required by @albertvillanova in https://github.com/huggingface/trl/pull/5171grpo_trainer.md by @casinca in https://github.com/huggingface/trl/pull/5047launch_args for all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059num_labels to 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066RewardTrainer tests by @qgallouedec in https://github.com/huggingface/trl/pull/5060get_training_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/5108None in get_trackio_space_url() to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115trl <command> --help TypeError caused by unescaped % in TrainingArguments help strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135environment_factory to GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5093SFTTrainer support for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132train_dataset is required by @albertvillanova in https://github.com/huggingface/trl/pull/5171RewardTrainer collator from chosen/rejected_input_ids to chosen/rejected_ids by @qgallouedec in https://github.com/huggingface/trl/pull/5179Full Changelog: https://github.com/huggingface/trl/compare/v0.28.0...v0.29.0
Fetched April 7, 2026