v0.29.1
What's Changed
- Handle mm_token_type_ids in SFT/GRPO/RLOO to fix IndexError by @albertvillanova in https://github.com/huggingface/trl/pull/5178
- Fix
prepare_multimodal_messagesto supporttool_callsandtoolrole by @alvarobartt in https://github.com/huggingface/trl/pull/5212 - Fix type for model_init_kwargs when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5230
- Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn by @albertvillanova in https://github.com/huggingface/trl/pull/5122
- Simplify logic for structured outputs across vLLM versions by @albertvillanova in https://github.com/huggingface/trl/pull/5215
- Add support for raw ids in
promptsin vLLM client and server by @qgallouedec in https://github.com/huggingface/trl/pull/5225 - Add VLM support when passing raw token IDs to vLLM client by @qgallouedec in https://github.com/huggingface/trl/pull/5227
- Move
rollout_funcfrom_generate_single_turnto_generateby @qgallouedec in https://github.com/huggingface/trl/pull/5232 - [GRPO/RLOO] Tokenize before vLLM generation call by @qgallouedec in https://github.com/huggingface/trl/pull/5238
- Support JSON string parsing of teacher_model_init_kwargs in MiniLLMConfig by @albertvillanova in https://github.com/huggingface/trl/pull/5259
- [GRPO/RLOO] Unify tokenization across all generation backends in
_generate_single_turnby @qgallouedec in https://github.com/huggingface/trl/pull/5239 - [GRPO/RLOO] Extract tokenize prompts from
_generate_single_turnby @qgallouedec in https://github.com/huggingface/trl/pull/5240 - [CPO/ORPO] Fix handling of different length chosen/rejected prompts. by @davmels in https://github.com/huggingface/trl/pull/4639
- Fix type for teacher_model_init_kwargs when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5258
- Fix support for model_init_kwargs in GKD/GOLD when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5266
- Fix mm_token_type_ids silently dropped in DPO VLM training by @albertvillanova in https://github.com/huggingface/trl/pull/5279
- Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string by @albertvillanova in https://github.com/huggingface/trl/pull/5274
- Fix GRPOTrainer attribute access for vLLM model config by @falcondai in https://github.com/huggingface/trl/pull/5302
- [GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs by @qgallouedec in https://github.com/huggingface/trl/pull/5242
New Contributors
- @davmels made their first contribution in https://github.com/huggingface/trl/pull/4639
- @falcondai made their first contribution in https://github.com/huggingface/trl/pull/5302
Full Changelog: https://github.com/huggingface/trl/compare/v0.29.0...v0.29.1
Fetched April 7, 2026
