v0.29.0
Features
Add environment_factory to GRPOTrainer
GRPOTrainer now accepts an environment_factory argument, allowing users to specify a custom environment class for training. This enables more flexible and diverse training scenarios by letting users define their own environments with specific dynamics and reward structures.
from datasets import Dataset
from trl import GRPOConfig, GRPOTrainer
dataset = Dataset.from_dict({
"prompt": [[{"role": "user", "content": f"Increment the counter by {i}."}] for i in range(1, 7)]
})
def reward_func(environments, **kwargs):
return [env.counter for env in environments]
class IncrementEnv:
def reset(self):
self.counter = 0
def increment(self, step: int) -> int:
"""
Increment the internal counter.
Args:
step: Value to add to the counter.
Returns:
The updated counter value.
"""
self.counter += step
return self.counter
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(chat_template_kwargs={"enable_thinking": False}),
train_dataset=dataset,
reward_funcs=reward_func,
environment_factory=IncrementEnv,
)
trainer.train()
by @qgallouedec in https://github.com/huggingface/trl/pull/5093
Skills
TRL introduces agent-native CLI Integration: trl-training, a first-class Agent Skill that exposes TRL’s training workflows (SFT, DPO, GRPO, etc.) in a structured, agent-readable format. The skill is packaged directly with the trl library and can be installed via the CLI:
# Install into the project's agent directory (default scope=project), by agent name: claude, codex, opencode
trl skills install trl-training --target <agent>
This enables AI agents to safely and reproducibly execute TRL training workflows using a well-defined interface.
Skills can be installed at the project or global scope, and support explicit targets and overwrite controls.
- Implement Agent Skills [1/N]: Create training skill (MVP) by @albertvillanova in https://github.com/huggingface/trl/pull/5096
- Implement Agent Skills [2/N]: Create skills module by @albertvillanova in https://github.com/huggingface/trl/pull/5097
- Implement Agent Skills [3/N]: Create skills installer by @albertvillanova in https://github.com/huggingface/trl/pull/5100
- Implement Agent Skills [4/N]: Create skills CLI by @albertvillanova in https://github.com/huggingface/trl/pull/5103
Other
- Pass vllm_is_ratio to LigerFusedLinearGRPOLoss in compute_liger_loss by @yukiu00 in https://github.com/huggingface/trl/pull/5031
- feature: top_k selective_log_softmax by @LeonEricsson in https://github.com/huggingface/trl/pull/5104
- Add Trackio integration for model card visualization by @qgallouedec in https://github.com/huggingface/trl/pull/5101
- Update tool handling to support JSON string schemas in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5118
- Refactor DPO by @qgallouedec in https://github.com/huggingface/trl/pull/3906
- Add support for Python 3.14 by @albertvillanova in https://github.com/huggingface/trl/pull/4225
- Fix default learning_rate in PPO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5174
- Fix default learning_rate in BCO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5173
- feature: Configurable num logprobs in vLLM generation by @LeonEricsson in https://github.com/huggingface/trl/pull/5107
Fixes
- [GRPO] fix: remove SAPO temperature check by @LeonEricsson in https://github.com/huggingface/trl/pull/5042
- fix: Use
launch_argsfor all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059 - Fix GRPO multi-turn training with liger kernels by @albertvillanova in https://github.com/huggingface/trl/pull/4975
- fix: Set
num_labelsto 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066 - [SFT] Fix high vRAM consumption during eval with liger kernel by @LoganVegnaSHOP in https://github.com/huggingface/trl/pull/5069
- Fix BFD packing for SFT datasets by @albertvillanova in https://github.com/huggingface/trl/pull/5076
- Fix DPO and RLOO incompatibility with FSDP2 by @flutist in https://github.com/huggingface/trl/pull/4838
- Fix SFT loss type rewards being overwritten in dpo_loss() by @Mr-Neutr0n in https://github.com/huggingface/trl/pull/5079
- Fix Qwen3 schema by @qgallouedec in https://github.com/huggingface/trl/pull/5111
- Add check for
Noneinget_trackio_space_url()to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115 - Fix
trl <command> --helpTypeError caused by unescaped%inTrainingArgumentshelp strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135 - Fix PPOTrainer.save_model by @albertvillanova in https://github.com/huggingface/trl/pull/5151
- Fix
SFTTrainersupport for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132 - Fix structured_outputs handling and tool normalization in vLLM backend by @ehofm in https://github.com/huggingface/trl/pull/5155
- fix: wake up vLLM weights before sync to prevent writes to freed memory by @bledden in https://github.com/huggingface/trl/pull/5147
- Accept mm_token_type_ids in GRPO/RLOO _get_per_token_logps_and_entropies by @albertvillanova in https://github.com/huggingface/trl/pull/5176
Documentation and Examples
- [minor] docs: typo in
grpo_trainer.mdby @casinca in https://github.com/huggingface/trl/pull/5047 - docs: add DeepSeek-R1 training dynamics and GRPO example by @JenWei0312 in https://github.com/huggingface/trl/pull/5053
- docs: Add INTELLECT-2 (2505.07291) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5061
- docs: Add REINFORCE++ (2501.03262) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5062
- docs: Add XPO (2405.21046) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5068
- docs: Add RPO paper (2405.16436) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5070
- docs: Add SimPO paper (2405.14734) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5071
- docs: Add TR-DPO paper (2404.09656) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5078
- docs: Add ORPO paper (2403.07691) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5080
- docs: Add CPO paper (2401.08417) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5081
- docs: Add GKD paper (2306.13649) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5082
- docs: Add PRM paper (2211.14275) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5083
- docs: Add T5 packing paper (1910.10683) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5084
- docs: Add PPO paper (1707.06347) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5085
- docs: Add MPO paper (2411.10442) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5089
- docs: add Multi-Node Training subsection (#4384) by @nabin2004 in https://github.com/huggingface/trl/pull/5091
- docs: Unify model examples to use trl-lib namespace by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4431
- Add Tiny Aya tool calling examples (script/notebook) by @sergiopaniego in https://github.com/huggingface/trl/pull/5123
- Fix wording in DPO and SFT trainer documentation for clarity by @qgallouedec in https://github.com/huggingface/trl/pull/5140
- Fix type of TrainingArguments.logging_steps in docs by @albertvillanova in https://github.com/huggingface/trl/pull/5149
- Fix Liquid syntax error in DPO trainer docs caused by double braces in LaTeX by @albertvillanova in https://github.com/huggingface/trl/pull/5153
- Document parameters with differing default values in experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5172
Deprecations
- Remove deprecated BCO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5045
- Remove deprecated CPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5046
- Remove deprecated Judges after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5048
- Remove deprecated ORPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5050
- Remove deprecated PPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5051
- Remove deprecated PRM after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5052
- Remove deprecated XPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5055
- Remove deprecated RLOOConfig.max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/5056
- Remove deprecated classes moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5044
- Remove deprecated mergekit_utils moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5057
- Rename input keys in
RewardTrainercollator fromchosen/rejected_input_idstochosen/rejected_idsby @qgallouedec in https://github.com/huggingface/trl/pull/5179
CI Improvements
- Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4893
- Remove duplicated tests for SFT and add gradient checkpointing tests by @qgallouedec in https://github.com/huggingface/trl/pull/5054
- Update model from SequenceClassification to CausalLM in
RewardTrainertests by @qgallouedec in https://github.com/huggingface/trl/pull/5060 - Fix CI ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) by @albertvillanova in https://github.com/huggingface/trl/pull/5074
- Add more tests for
get_training_chat_templateby @qgallouedec in https://github.com/huggingface/trl/pull/5108 - Add test for Cohere2 models by @qgallouedec in https://github.com/huggingface/trl/pull/5116
- Remove revision references in dataset loading for toolcall tests by @qgallouedec in https://github.com/huggingface/trl/pull/5133
- Fix NameError: name 'importlib' is not defined by @albertvillanova in https://github.com/huggingface/trl/pull/5134
- Fix CI by removing liger-kernel from dev deps by @qgallouedec in https://github.com/huggingface/trl/pull/5163
- Fix experimental TestUpdateWithReplayBuffer: ValueError:
train_datasetis required by @albertvillanova in https://github.com/huggingface/trl/pull/5171 - Update upstream tracking info about CI PyTorch JIT deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5166
Miscellaneous
- Fix logging warning suppression with scoped override for seq-clf head key by @qgallouedec in https://github.com/huggingface/trl/pull/5058
- Fix logging warning suppression for transformers 4.56.2 by @albertvillanova in https://github.com/huggingface/trl/pull/5077
- Validate reward model has 1 num_labels by @albertvillanova in https://github.com/huggingface/trl/pull/5087
- Fix style by @albertvillanova in https://github.com/huggingface/trl/pull/5106
- Remove outdated liger-kernel compatibility checks and warnings in tests and SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5105
- Add validation for conversational prompts in multimodal training by @qgallouedec in https://github.com/huggingface/trl/pull/5067
- Update version check for transformers to 5.2.0 in online_dpo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/5110
- Add GLM-4.5 model to tests by @qgallouedec in https://github.com/huggingface/trl/pull/5114
- Fix import latency [1/N]: Extract _LazyModule to dedicated module by @albertvillanova in https://github.com/huggingface/trl/pull/5128
- Fix import latency [2/N]: Implement native _is_package_available by @albertvillanova in https://github.com/huggingface/trl/pull/5129
- refactor(gkd_trainer): small optim by @casinca in https://github.com/huggingface/trl/pull/5143
- Move common fields from stable trainer configs to BaseConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5136
- Use BaseConfig in all experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5148
- Raise ValueError for None train_dataset in core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5157
- Revert changes in vLLM client/server by @qgallouedec in https://github.com/huggingface/trl/pull/5165
Refactor CLI
- Refactor CLI [1/N]: Refactor into modular command architecture by @albertvillanova in https://github.com/huggingface/trl/pull/5124
- Refactor CLI [2/N]: Move accelerate concerns into TrainingCommand by @albertvillanova in https://github.com/huggingface/trl/pull/5159
- Refactor CLI [3/N]: Self-contain VllmServeCommand argument parsing by @albertvillanova in https://github.com/huggingface/trl/pull/5160
What's Changed
- [minor] docs: typo in
grpo_trainer.mdby @casinca in https://github.com/huggingface/trl/pull/5047 - ⬆️ Bump dev version by @albertvillanova in https://github.com/huggingface/trl/pull/5049
- [GRPO] fix: remove SAPO temperature check by @LeonEricsson in https://github.com/huggingface/trl/pull/5042
- Remove deprecated BCO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5045
- Remove deprecated CPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5046
- Remove deprecated Judges after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5048
- Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/trl/pull/4893
- Remove deprecated ORPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5050
- Remove deprecated PPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5051
- Remove deprecated PRM after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5052
- Remove deprecated XPO after moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5055
- Remove deprecated RLOOConfig.max_prompt_length by @albertvillanova in https://github.com/huggingface/trl/pull/5056
- Remove deprecated classes moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5044
- Remove duplicated tests for SFT and add gradient checkpointing tests by @qgallouedec in https://github.com/huggingface/trl/pull/5054
- Remove deprecated mergekit_utils moved to experimental by @albertvillanova in https://github.com/huggingface/trl/pull/5057
- docs: add DeepSeek-R1 training dynamics and GRPO example by @JenWei0312 in https://github.com/huggingface/trl/pull/5053
- docs: Add INTELLECT-2 (2505.07291) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5061
- docs: Add REINFORCE++ (2501.03262) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5062
- docs: Add XPO (2405.21046) to Paper Index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5068
- docs: Add RPO paper (2405.16436) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5070
- docs: Add SimPO paper (2405.14734) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5071
- Fix logging warning suppression with scoped override for seq-clf head key by @qgallouedec in https://github.com/huggingface/trl/pull/5058
- fix: Use
launch_argsfor all trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5059 - Fix GRPO multi-turn training with liger kernels by @albertvillanova in https://github.com/huggingface/trl/pull/4975
- fix: Set
num_labelsto 1 in causal model initialization for RewardTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5066 - Fix logging warning suppression for transformers 4.56.2 by @albertvillanova in https://github.com/huggingface/trl/pull/5077
- Update model from SequenceClassification to CausalLM in
RewardTrainertests by @qgallouedec in https://github.com/huggingface/trl/pull/5060 - Fix CI ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) by @albertvillanova in https://github.com/huggingface/trl/pull/5074
- [SFT] Fix high vRAM consumption during eval with liger kernel by @LoganVegnaSHOP in https://github.com/huggingface/trl/pull/5069
- docs: Add TR-DPO paper (2404.09656) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5078
- docs: Add ORPO paper (2403.07691) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5080
- docs: Add CPO paper (2401.08417) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5081
- docs: Add GKD paper (2306.13649) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5082
- docs: Add PRM paper (2211.14275) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5083
- docs: Add T5 packing paper (1910.10683) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5084
- docs: Add PPO paper (1707.06347) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5085
- Fix BFD packing for SFT datasets by @albertvillanova in https://github.com/huggingface/trl/pull/5076
- Validate reward model has 1 num_labels by @albertvillanova in https://github.com/huggingface/trl/pull/5087
- docs: Add MPO paper (2411.10442) to paper index by @behroozazarkhalili in https://github.com/huggingface/trl/pull/5089
- docs: add Multi-Node Training subsection (#4384) by @nabin2004 in https://github.com/huggingface/trl/pull/5091
- docs: Unify model examples to use trl-lib namespace by @behroozazarkhalili in https://github.com/huggingface/trl/pull/4431
- Implement Agent Skills [1/N]: Create training skill (MVP) by @albertvillanova in https://github.com/huggingface/trl/pull/5096
- Pass vllm_is_ratio to LigerFusedLinearGRPOLoss in compute_liger_loss by @yukiu00 in https://github.com/huggingface/trl/pull/5031
- Fix DPO and RLOO incompatibility with FSDP2 by @flutist in https://github.com/huggingface/trl/pull/4838
- feature: top_k selective_log_softmax by @LeonEricsson in https://github.com/huggingface/trl/pull/5104
- Implement Agent Skills [2/N]: Create skills module by @albertvillanova in https://github.com/huggingface/trl/pull/5097
- Fix style by @albertvillanova in https://github.com/huggingface/trl/pull/5106
- Add Trackio integration for model card visualization by @qgallouedec in https://github.com/huggingface/trl/pull/5101
- Fix SFT loss type rewards being overwritten in dpo_loss() by @Mr-Neutr0n in https://github.com/huggingface/trl/pull/5079
- Remove outdated liger-kernel compatibility checks and warnings in tests and SFTTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/5105
- Implement Agent Skills [3/N]: Create skills installer by @albertvillanova in https://github.com/huggingface/trl/pull/5100
- Add validation for conversational prompts in multimodal training by @qgallouedec in https://github.com/huggingface/trl/pull/5067
- Update version check for transformers to 5.2.0 in online_dpo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/5110
- Add more tests for
get_training_chat_templateby @qgallouedec in https://github.com/huggingface/trl/pull/5108 - Add test for Cohere2 models by @qgallouedec in https://github.com/huggingface/trl/pull/5116
- Fix Qwen3 schema by @qgallouedec in https://github.com/huggingface/trl/pull/5111
- Add check for
Noneinget_trackio_space_url()to prevent errors by @qgallouedec in https://github.com/huggingface/trl/pull/5115 - Add GLM-4.5 model to tests by @qgallouedec in https://github.com/huggingface/trl/pull/5114
- Add Tiny Aya tool calling examples (script/notebook) by @sergiopaniego in https://github.com/huggingface/trl/pull/5123
- Update tool handling to support JSON string schemas in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/5118
- Implement Agent Skills [4/N]: Create skills CLI by @albertvillanova in https://github.com/huggingface/trl/pull/5103
- Refactor CLI [1/N]: Refactor into modular command architecture by @albertvillanova in https://github.com/huggingface/trl/pull/5124
- Remove revision references in dataset loading for toolcall tests by @qgallouedec in https://github.com/huggingface/trl/pull/5133
- Refactor DPO by @qgallouedec in https://github.com/huggingface/trl/pull/3906
- Fix import latency [1/N]: Extract _LazyModule to dedicated module by @albertvillanova in https://github.com/huggingface/trl/pull/5128
- Fix import latency [2/N]: Implement native _is_package_available by @albertvillanova in https://github.com/huggingface/trl/pull/5129
- Fix NameError: name 'importlib' is not defined by @albertvillanova in https://github.com/huggingface/trl/pull/5134
- Fix
trl <command> --helpTypeError caused by unescaped%inTrainingArgumentshelp strings by @albertvillanova in https://github.com/huggingface/trl/pull/5135 - Add
environment_factorytoGRPOTrainerby @qgallouedec in https://github.com/huggingface/trl/pull/5093 - refactor(gkd_trainer): small optim by @casinca in https://github.com/huggingface/trl/pull/5143
- Move common fields from stable trainer configs to BaseConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5136
- Fix wording in DPO and SFT trainer documentation for clarity by @qgallouedec in https://github.com/huggingface/trl/pull/5140
- Fix PPOTrainer.save_model by @albertvillanova in https://github.com/huggingface/trl/pull/5151
- Use BaseConfig in all experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5148
- Fix type of TrainingArguments.logging_steps in docs by @albertvillanova in https://github.com/huggingface/trl/pull/5149
- Add support for Python 3.14 by @albertvillanova in https://github.com/huggingface/trl/pull/4225
- Fix
SFTTrainersupport for single-image data by @qgallouedec in https://github.com/huggingface/trl/pull/5132 - Fix CI by removing liger-kernel from dev deps by @qgallouedec in https://github.com/huggingface/trl/pull/5163
- Fix structured_outputs handling and tool normalization in vLLM backend by @ehofm in https://github.com/huggingface/trl/pull/5155
- fix: wake up vLLM weights before sync to prevent writes to freed memory by @bledden in https://github.com/huggingface/trl/pull/5147
- Fix Liquid syntax error in DPO trainer docs caused by double braces in LaTeX by @albertvillanova in https://github.com/huggingface/trl/pull/5153
- Raise ValueError for None train_dataset in core trainers by @albertvillanova in https://github.com/huggingface/trl/pull/5157
- Refactor CLI [2/N]: Move accelerate concerns into TrainingCommand by @albertvillanova in https://github.com/huggingface/trl/pull/5159
- Refactor CLI [3/N]: Self-contain VllmServeCommand argument parsing by @albertvillanova in https://github.com/huggingface/trl/pull/5160
- Revert changes in vLLM client/server by @qgallouedec in https://github.com/huggingface/trl/pull/5165
- Fix experimental TestUpdateWithReplayBuffer: ValueError:
train_datasetis required by @albertvillanova in https://github.com/huggingface/trl/pull/5171 - Fix default learning_rate in PPO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5174
- Accept mm_token_type_ids in GRPO/RLOO _get_per_token_logps_and_entropies by @albertvillanova in https://github.com/huggingface/trl/pull/5176
- Fix default learning_rate in BCO according to paper by @albertvillanova in https://github.com/huggingface/trl/pull/5173
- Document parameters with differing default values in experimental configs by @albertvillanova in https://github.com/huggingface/trl/pull/5172
- Update upstream tracking info about CI PyTorch JIT deprecation warnings by @albertvillanova in https://github.com/huggingface/trl/pull/5166
- Rename input keys in
RewardTrainercollator fromchosen/rejected_input_idstochosen/rejected_idsby @qgallouedec in https://github.com/huggingface/trl/pull/5179 - feature: Configurable num logprobs in vLLM generation by @LeonEricsson in https://github.com/huggingface/trl/pull/5107
- Release: v0.29 by @qgallouedec in https://github.com/huggingface/trl/pull/5181
New Contributors
- @LoganVegnaSHOP made their first contribution in https://github.com/huggingface/trl/pull/5069
- @yukiu00 made their first contribution in https://github.com/huggingface/trl/pull/5031
- @flutist made their first contribution in https://github.com/huggingface/trl/pull/4838
- @Mr-Neutr0n made their first contribution in https://github.com/huggingface/trl/pull/5079
- @ehofm made their first contribution in https://github.com/huggingface/trl/pull/5155
- @bledden made their first contribution in https://github.com/huggingface/trl/pull/5147
Full Changelog: https://github.com/huggingface/trl/compare/v0.28.0...v0.29.0
Fetched April 7, 2026
