vllm_group_port argument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545forward_masked_logits function by @qgallouedec in https://github.com/huggingface/trl/pull/4729AutoModelForCausalLMWithValueHead and AutoModelForSeq2SeqLMWithValueHead to experimental by @qgallouedec in https://github.com/huggingface/trl/pull/4654experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4667DataCollatorForChatML to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4668add_bos_token_if_needed and add_eos_token_if_needed to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4674truncate_right and SIMPLE_CHAT_TEMPLATE to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4677prepare_model_for_kbit_training, enable_gradient_checkpointing, prepare_peft_model to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4704get_reward function to experimental.utils by @qgallouedec in https://github.com/huggingface/trl/pull/4683num_generations_eval=1 in the calculation of the advantage by @qgallouedec in https://github.com/huggingface/trl/pull/4662num_generations_eval is specified and different than num_generations by @apalmas-saifh in https://github.com/huggingface/trl/pull/4682generation_config for tiny model uploads by @qgallouedec in https://github.com/huggingface/trl/pull/4643qwen3_schema by @mattbui in https://github.com/huggingface/trl/pull/4709HybridCache in Liger-Kernel with transformers v5 by @qgallouedec in https://github.com/huggingface/trl/pull/4798args by @carlyou in https://github.com/huggingface/trl/pull/4801grpo_trainer.md): Added Qwen SAPO details under Loss Types by @casinca in https://github.com/huggingface/trl/pull/4681MergeModelCallback from import structure by @qgallouedec in https://github.com/huggingface/trl/pull/4664ChatMlSpecialTokens by @qgallouedec in https://github.com/huggingface/trl/pull/4666_win_rate_completions_df function from callbacks by @qgallouedec in https://github.com/huggingface/trl/pull/4672DbrxForCausalLM support by @qgallouedec in https://github.com/huggingface/trl/pull/4799compute_accuracy to PRM Trainer file by @qgallouedec in https://github.com/huggingface/trl/pull/4656clone_chat_template to chat_template_utils by @qgallouedec in https://github.com/huggingface/trl/pull/4653GeometricMixtureWrapper to nash_md_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4670exact_div, print_rich_table, truncate_response, forward to ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4676OnPolicyConfig and PPOConfig and move OnlineTrainerState by @qgallouedec in https://github.com/huggingface/trl/pull/4671AutoModelForCausalLMWithValueHead to test_ppo_trainer by @qgallouedec in https://github.com/huggingface/trl/pull/4678generate and batch_generation to ppo_trainer.py by @qgallouedec in https://github.com/huggingface/trl/pull/4675TrainerCallback from top-level transformers by @qgallouedec in https://github.com/huggingface/trl/pull/4694top_k parameter in OnlineDPOTrainer by @qgallouedec in https://github.com/huggingface/trl/pull/4714PeftModel + peft_config in trainers by @qgallouedec in https://github.com/huggingface/trl/pull/4713TestParseResponse by @qgallouedec in https://github.com/huggingface/trl/pull/4736GuidedDecodingParams with StructuredOutputsParams in sampling parameter configuration by @qgallouedec in https://github.com/huggingface/trl/pull/4797--
Full Changelog: https://github.com/huggingface/trl/compare/v0.26.0...v0.27.0
Fetched April 7, 2026