v0.29.0

Features

Add `environment_factory` to `GRPOTrainer`

GRPOTrainer now accepts an environment_factory argument, allowing users to specify a custom environment class for training. This enables more flexible and diverse training scenarios by letting users define their own environments with specific dynamics and reward structures.

from datasets import Dataset
from trl import GRPOConfig, GRPOTrainer

dataset = Dataset.from_dict({
    "prompt": [[{"role": "user", "content": f"Increment the counter by {i}."}] for i in range(1, 7)]
})

def reward_func(environments, **kwargs):
    return [env.counter for env in environments]

class IncrementEnv:
    def reset(self):
        self.counter = 0

    def increment(self, step: int) -> int:
        """
        Increment the internal counter.

        Args:
            step: Value to add to the counter.

        Returns:
            The updated counter value.
        """
        self.counter += step
        return self.counter

trainer = GRPOTrainer(
    model="Qwen/Qwen3-0.6B",
    args=GRPOConfig(chat_template_kwargs={"enable_thinking": False}),
    train_dataset=dataset,
    reward_funcs=reward_func,
    environment_factory=IncrementEnv,
)
trainer.train()

by @qgallouedec in https://github.com/huggingface/trl/pull/5093

Skills

TRL introduces agent-native CLI Integration: trl-training, a first-class Agent Skill that exposes TRL’s training workflows (SFT, DPO, GRPO, etc.) in a structured, agent-readable format. The skill is packaged directly with the trl library and can be installed via the CLI:

# Install into the project's agent directory (default scope=project), by agent name: claude, codex, opencode
trl skills install trl-training --target <agent>

This enables AI agents to safely and reproducibly execute TRL training workflows using a well-defined interface.

Skills can be installed at the project or global scope, and support explicit targets and overwrite controls.

Implement Agent Skills [1/N]: Create training skill (MVP) by @albertvillanova in https://github.com/huggingface/trl/pull/5096
Implement Agent Skills [2/N]: Create skills module by @albertvillanova in https://github.com/huggingface/trl/pull/5097
Implement Agent Skills [3/N]: Create skills installer by @albertvillanova in https://github.com/huggingface/trl/pull/5100
Implement Agent Skills [4/N]: Create skills CLI by @albertvillanova in https://github.com/huggingface/trl/pull/5103

Other

Pass vllm_is_ratio to LigerFusedLinearGRPOLoss in compute_liger_loss by @yukiu00 in https://github.com/huggingface/trl/pull/5031
feature: top_k selective_log_softmax by @LeonEricsson in https://github.com/huggingface/trl/pull/5104
Add Trackio integration for model card visualization by @qgallouedec in https://github.com/huggingface/trl/pull/5101
Update tool handling to support JSON string schemas in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5118
Refactor DPO by @qgallouedec in https://github.com/huggingface/trl/pull/3906
Add support for Python 3.14 by @albertvillanova in https://github.com/huggingface/trl/pull/4225
Fix default learning_rate in PPO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5174
Fix default learning_rate in BCO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5173
feature: Configurable num logprobs in vLLM generation by @LeonEricsson in https://github.com/huggingface/trl/pull/5107

Fixes

[GRPO] fix: remove SAPO temperature check by @LeonEricsson in https://github.com/huggingface/trl/pull/5042
fix: Use launch_args for all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059
Fix GRPO multi-turn training with liger kernels by @albertvillanova in https://github.com/huggingface/trl/pull/4975
fix: Set num_labels to 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066
[SFT] Fix high vRAM consumption during eval with liger kernel by @LoganVegnaSHOP in https://github.com/huggingface/trl/pull/5069
Fix BFD packing for SFT datasets by @albertvillanova in https://github.com/huggingface/trl/pull/5076
Fix DPO and RLOO incompatibility with FSDP2 by @flutist in https://github.com/huggingface/trl/pull/4838
Fix SFT loss type rewards being overwritten in dpo_loss() by @Mr-Neutr0n in https://github.com/huggingface/trl/pull/5079
Fix Qwen3 schema by @qgallouedec in https://github.com/huggingface/trl/pull/5111
Add check for None in get_trackio_space_url() to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115
Fix trl <command> --help TypeError caused by unescaped % in TrainingArguments help strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135
Fix PPOTrainer.save_model by @albertvillanova in https://github.com/huggingface/trl/pull/5151
Fix SFTTrainer support for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132
Fix structured_outputs handling and tool normalization in vLLM backend by @ehofm in https://github.com/huggingface/trl/pull/5155
fix: wake up vLLM weights before sync to prevent writes to freed memory by @bledden in https://github.com/huggingface/trl/pull/5147
Accept mm_token_type_ids in GRPO/RLOO _get_per_token_logps_and_entropies by @albertvillanova in https://github.com/huggingface/trl/pull/5176

Documentation and Examples

[minor] docs: typo in grpo_trainer.md by @casinca in https://github.com/huggingface/trl/pull/5047
docs: add DeepSeek-R1 training dynamics and GRPO example by @JenWei0312 in https://github.com/huggingface/trl/pull/5053
docs: Add INTELLECT-2 (2505.07291) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5061
docs: Add REINFORCE++ (2501.03262) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5062
docs: Add XPO (2405.21046) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5068
docs: Add RPO paper (2405.16436) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5070
docs: Add SimPO paper (2405.14734) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5071
docs: Add TR-DPO paper (2404.09656) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5078
docs: Add ORPO paper (2403.07691) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5080
docs: Add CPO paper (2401.08417) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5081
docs: Add GKD paper (2306.13649) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5082
docs: Add PRM paper (2211.14275) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5083
docs: Add T5 packing paper (1910.10683) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5084
docs: Add PPO paper (1707.06347) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5085
docs: Add MPO paper (2411.10442) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5089
docs: add Multi-Node Training subsection (#4384) by @nabin2004 in https://github.com/huggingface/trl/pull/5091
docs: Unify model examples to use trl-lib namespace by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4431
Add Tiny Aya tool calling examples (script/notebook) by @sergiopaniego in https://github.com/huggingface/trl/pull/5123
Fix wording in DPO and SFT trainer documentation for clarity by @qgallouedec in https://github.com/huggingface/trl/pull/5140
Fix type of TrainingArguments.logging_steps in docs by @albertvillanova in https://github.com/huggingface/trl/pull/5149
Fix Liquid syntax error in DPO trainer docs caused by double braces in LaTeX by @albertvillanova in https://github.com/huggingface/trl/pull/5153
Document parameters with differing default values in experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5172

Deprecations

Remove deprecated BCO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5045
Remove deprecated CPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5046
Remove deprecated Judges after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5048
Remove deprecated ORPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5050
Remove deprecated PPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5051
Remove deprecated PRM after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5052
Remove deprecated XPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5055
Remove deprecated RLOOConfig.max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/5056
Remove deprecated classes moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5044
Remove deprecated mergekit_utils moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5057
Rename input keys in RewardTrainer collator from chosen/rejected_input_ids to chosen/rejected_ids by @qgallouedec in https://github.com/huggingface/trl/pull/5179

CI Improvements

Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4893
Remove duplicated tests for SFT and add gradient checkpointing tests by @qgallouedec in https://github.com/huggingface/trl/pull/5054
Update model from SequenceClassification to CausalLM in RewardTrainer tests by @qgallouedec in https://github.com/huggingface/trl/pull/5060
Fix CI ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) by @albertvillanova in https://github.com/huggingface/trl/pull/5074
Add more tests for get_training_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/5108
Add test for Cohere2 models by @qgallouedec in https://github.com/huggingface/trl/pull/5116
Remove revision references in dataset loading for toolcall tests by @qgallouedec in https://github.com/huggingface/trl/pull/5133
Fix NameError: name 'importlib' is not defined by @albertvillanova in https://github.com/huggingface/trl/pull/5134
Fix CI by removing liger-kernel from dev deps by @qgallouedec in https://github.com/huggingface/trl/pull/5163
Fix experimental TestUpdateWithReplayBuffer: ValueError: train_dataset is required by @albertvillanova in https://github.com/huggingface/trl/pull/5171
Update upstream tracking info about CI PyTorch JIT deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5166

Miscellaneous

Fix logging warning suppression with scoped override for seq-clf head key by @qgallouedec in https://github.com/huggingface/trl/pull/5058
Fix logging warning suppression for transformers 4.56.2 by @albertvillanova in https://github.com/huggingface/trl/pull/5077
Validate reward model has 1 num_labels by @albertvillanova in https://github.com/huggingface/trl/pull/5087
Fix style by @albertvillanova in https://github.com/huggingface/trl/pull/5106
Remove outdated liger-kernel compatibility checks and warnings in tests and SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5105
Add validation for conversational prompts in multimodal training by @qgallouedec in https://github.com/huggingface/trl/pull/5067
Update version check for transformers to 5.2.0 in online_dpo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/5110
Add GLM-4.5 model to tests by @qgallouedec in https://github.com/huggingface/trl/pull/5114
Fix import latency [1/N]: Extract _LazyModule to dedicated module by @albertvillanova in https://github.com/huggingface/trl/pull/5128
Fix import latency [2/N]: Implement native _is_package_available by @albertvillanova in https://github.com/huggingface/trl/pull/5129
refactor(gkd_trainer): small optim by @casinca in https://github.com/huggingface/trl/pull/5143
Move common fields from stable trainer configs to BaseConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5136
Use BaseConfig in all experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5148
Raise ValueError for None train_dataset in core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5157
Revert changes in vLLM client/server by @qgallouedec in https://github.com/huggingface/trl/pull/5165

Refactor CLI

Refactor CLI [1/N]: Refactor into modular command architecture by @albertvillanova in https://github.com/huggingface/trl/pull/5124
Refactor CLI [2/N]: Move accelerate concerns into TrainingCommand by @albertvillanova in https://github.com/huggingface/trl/pull/5159
Refactor CLI [3/N]: Self-contain VllmServeCommand argument parsing by @albertvillanova in https://github.com/huggingface/trl/pull/5160

What's Changed

[minor] docs: typo in grpo_trainer.md by @casinca in https://github.com/huggingface/trl/pull/5047
⬆️ Bump dev version by @albertvillanova in https://github.com/huggingface/trl/pull/5049
[GRPO] fix: remove SAPO temperature check by @LeonEricsson in https://github.com/huggingface/trl/pull/5042
Remove deprecated BCO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5045
Remove deprecated CPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5046
Remove deprecated Judges after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5048
Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4893
Remove deprecated ORPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5050
Remove deprecated PPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5051
Remove deprecated PRM after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5052
Remove deprecated XPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5055
Remove deprecated RLOOConfig.max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/5056
Remove deprecated classes moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5044
Remove duplicated tests for SFT and add gradient checkpointing tests by @qgallouedec in https://github.com/huggingface/trl/pull/5054
Remove deprecated mergekit_utils moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5057
docs: add DeepSeek-R1 training dynamics and GRPO example by @JenWei0312 in https://github.com/huggingface/trl/pull/5053
docs: Add INTELLECT-2 (2505.07291) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5061
docs: Add REINFORCE++ (2501.03262) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5062
docs: Add XPO (2405.21046) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5068
docs: Add RPO paper (2405.16436) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5070
docs: Add SimPO paper (2405.14734) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5071
Fix logging warning suppression with scoped override for seq-clf head key by @qgallouedec in https://github.com/huggingface/trl/pull/5058
fix: Use launch_args for all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059
Fix GRPO multi-turn training with liger kernels by @albertvillanova in https://github.com/huggingface/trl/pull/4975
fix: Set num_labels to 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066
Fix logging warning suppression for transformers 4.56.2 by @albertvillanova in https://github.com/huggingface/trl/pull/5077
Update model from SequenceClassification to CausalLM in RewardTrainer tests by @qgallouedec in https://github.com/huggingface/trl/pull/5060
Fix CI ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) by @albertvillanova in https://github.com/huggingface/trl/pull/5074
[SFT] Fix high vRAM consumption during eval with liger kernel by @LoganVegnaSHOP in https://github.com/huggingface/trl/pull/5069
docs: Add TR-DPO paper (2404.09656) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5078
docs: Add ORPO paper (2403.07691) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5080
docs: Add CPO paper (2401.08417) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5081
docs: Add GKD paper (2306.13649) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5082
docs: Add PRM paper (2211.14275) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5083
docs: Add T5 packing paper (1910.10683) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5084
docs: Add PPO paper (1707.06347) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5085
Fix BFD packing for SFT datasets by @albertvillanova in https://github.com/huggingface/trl/pull/5076
Validate reward model has 1 num_labels by @albertvillanova in https://github.com/huggingface/trl/pull/5087
docs: Add MPO paper (2411.10442) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5089
docs: add Multi-Node Training subsection (#4384) by @nabin2004 in https://github.com/huggingface/trl/pull/5091
docs: Unify model examples to use trl-lib namespace by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4431
Implement Agent Skills [1/N]: Create training skill (MVP) by @albertvillanova in https://github.com/huggingface/trl/pull/5096
Pass vllm_is_ratio to LigerFusedLinearGRPOLoss in compute_liger_loss by @yukiu00 in https://github.com/huggingface/trl/pull/5031
Fix DPO and RLOO incompatibility with FSDP2 by @flutist in https://github.com/huggingface/trl/pull/4838
feature: top_k selective_log_softmax by @LeonEricsson in https://github.com/huggingface/trl/pull/5104
Implement Agent Skills [2/N]: Create skills module by @albertvillanova in https://github.com/huggingface/trl/pull/5097
Fix style by @albertvillanova in https://github.com/huggingface/trl/pull/5106
Add Trackio integration for model card visualization by @qgallouedec in https://github.com/huggingface/trl/pull/5101
Fix SFT loss type rewards being overwritten in dpo_loss() by @Mr-Neutr0n in https://github.com/huggingface/trl/pull/5079
Remove outdated liger-kernel compatibility checks and warnings in tests and SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5105
Implement Agent Skills [3/N]: Create skills installer by @albertvillanova in https://github.com/huggingface/trl/pull/5100
Add validation for conversational prompts in multimodal training by @qgallouedec in https://github.com/huggingface/trl/pull/5067
Update version check for transformers to 5.2.0 in online_dpo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/5110
Add more tests for get_training_chat_template by @qgallouedec in https://github.com/huggingface/trl/pull/5108
Add test for Cohere2 models by @qgallouedec in https://github.com/huggingface/trl/pull/5116
Fix Qwen3 schema by @qgallouedec in https://github.com/huggingface/trl/pull/5111
Add check for None in get_trackio_space_url() to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115
Add GLM-4.5 model to tests by @qgallouedec in https://github.com/huggingface/trl/pull/5114
Add Tiny Aya tool calling examples (script/notebook) by @sergiopaniego in https://github.com/huggingface/trl/pull/5123
Update tool handling to support JSON string schemas in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5118
Implement Agent Skills [4/N]: Create skills CLI by @albertvillanova in https://github.com/huggingface/trl/pull/5103
Refactor CLI [1/N]: Refactor into modular command architecture by @albertvillanova in https://github.com/huggingface/trl/pull/5124
Remove revision references in dataset loading for toolcall tests by @qgallouedec in https://github.com/huggingface/trl/pull/5133
Refactor DPO by @qgallouedec in https://github.com/huggingface/trl/pull/3906
Fix import latency [1/N]: Extract _LazyModule to dedicated module by @albertvillanova in https://github.com/huggingface/trl/pull/5128
Fix import latency [2/N]: Implement native _is_package_available by @albertvillanova in https://github.com/huggingface/trl/pull/5129
Fix NameError: name 'importlib' is not defined by @albertvillanova in https://github.com/huggingface/trl/pull/5134
Fix trl <command> --help TypeError caused by unescaped % in TrainingArguments help strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135
Add environment_factory to GRPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5093
refactor(gkd_trainer): small optim by @casinca in https://github.com/huggingface/trl/pull/5143
Move common fields from stable trainer configs to BaseConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5136
Fix wording in DPO and SFT trainer documentation for clarity by @qgallouedec in https://github.com/huggingface/trl/pull/5140
Fix PPOTrainer.save_model by @albertvillanova in https://github.com/huggingface/trl/pull/5151
Use BaseConfig in all experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5148
Fix type of TrainingArguments.logging_steps in docs by @albertvillanova in https://github.com/huggingface/trl/pull/5149
Add support for Python 3.14 by @albertvillanova in https://github.com/huggingface/trl/pull/4225
Fix SFTTrainer support for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132
Fix CI by removing liger-kernel from dev deps by @qgallouedec in https://github.com/huggingface/trl/pull/5163
Fix structured_outputs handling and tool normalization in vLLM backend by @ehofm in https://github.com/huggingface/trl/pull/5155
fix: wake up vLLM weights before sync to prevent writes to freed memory by @bledden in https://github.com/huggingface/trl/pull/5147
Fix Liquid syntax error in DPO trainer docs caused by double braces in LaTeX by @albertvillanova in https://github.com/huggingface/trl/pull/5153
Raise ValueError for None train_dataset in core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5157
Refactor CLI [2/N]: Move accelerate concerns into TrainingCommand by @albertvillanova in https://github.com/huggingface/trl/pull/5159
Refactor CLI [3/N]: Self-contain VllmServeCommand argument parsing by @albertvillanova in https://github.com/huggingface/trl/pull/5160
Revert changes in vLLM client/server by @qgallouedec in https://github.com/huggingface/trl/pull/5165
Fix experimental TestUpdateWithReplayBuffer: ValueError: train_dataset is required by @albertvillanova in https://github.com/huggingface/trl/pull/5171
Fix default learning_rate in PPO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5174
Accept mm_token_type_ids in GRPO/RLOO _get_per_token_logps_and_entropies by @albertvillanova in https://github.com/huggingface/trl/pull/5176
Fix default learning_rate in BCO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5173
Document parameters with differing default values in experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5172
Update upstream tracking info about CI PyTorch JIT deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5166
Rename input keys in RewardTrainer collator from chosen/rejected_input_ids to chosen/rejected_ids by @qgallouedec in https://github.com/huggingface/trl/pull/5179
feature: Configurable num logprobs in vLLM generation by @LeonEricsson in https://github.com/huggingface/trl/pull/5107
Release: v0.29 by @qgallouedec in https://github.com/huggingface/trl/pull/5181

New Contributors

@LoganVegnaSHOP made their first contribution in https://github.com/huggingface/trl/pull/5069
@yukiu00 made their first contribution in https://github.com/huggingface/trl/pull/5031
@flutist made their first contribution in https://github.com/huggingface/trl/pull/4838
@Mr-Neutr0n made their first contribution in https://github.com/huggingface/trl/pull/5079
@ehofm made their first contribution in https://github.com/huggingface/trl/pull/5155
@bledden made their first contribution in https://github.com/huggingface/trl/pull/5147

Full Changelog: https://github.com/huggingface/trl/compare/v0.28.0...v0.29.0

Features

Add `environment_factory` to `GRPOTrainer`

Skills

Other

Fixes

Documentation and Examples

Deprecations

CI Improvements

Miscellaneous

Refactor CLI

What's Changed

New Contributors

More from Hugging Face

More from Hugging Face

v0.29.0

Features

Add environment_factory to GRPOTrainer

Skills

Other

Fixes

Documentation and Examples

Deprecations

CI Improvements

Miscellaneous

Refactor CLI

What's Changed

New Contributors

More from Hugging Face

More from Hugging Face

Add `environment_factory` to `GRPOTrainer`