Home/Hugging Face/Fine-tuning

Fine-tuning

Libraries for efficient model fine-tuning and alignment

Sun

Mon

Tue

Wed

Thu

Fri

Sat

JulAugSepOctNovDecJanFebMarAprMayJunJul

Less

Releases12Avg Interval5dAvg Cadence6/mo

Jul 9, 2026

KTO now stable; environment rewards and GRPO entropy regularization added

↗

TRL · v1.8.0

KTO trainer graduates from experimental to the top-level trl package with the same API as DPO/GRPO/SFT, and the experimental import path still works with a FutureWarning. Environment-owned rewards let agentic RL environments define their own reward via a reserved get_reward() method, and multi-environment support allows a single training run to handle multiple environments with environment-specific tool schemas. GRPO now supports both static and adaptive entropy regularization to encourage exploration and prevent policy collapse.

Jul 4, 2026

GRPO + vLLM hang fixed on non-NVLink; dataset fingerprinting corrected

↗

TRL · v1.7.1

Fixed a hang in GRPO + vLLM colocate + PEFT on non-NVLink hardware and corrected dataset fingerprinting in DPO/SFT tokenization. Also integrated the new response parsing API, added a prompt-learning guard for PEFT with Liger in GRPO, and fixed activation offload storage deduplication.

Jun 25, 2026

SFT default loss flips to chunked_nll; GMPO trainer arrives

↗Breaking

TRL · v1.7.0

The default SFT loss_type is now "chunked_nll", delivering ~30% less peak VRAM on average with neutral or slightly faster wall-clock time. Also introduces experimental GMPO trainer, transformers continuous batching, AsyncGRPO weight sync with vLLM 0.22+, and paddding-free AsyncGRPO.

Jun 11, 2026

A2PO trainer debuts; VLM KTO support; Async GRPO spawns process

↗

TRL · v1.6.0

The release introduces a new experimental A2POTrainer for optimal advantage regression and grants KTO trainer support for vision-language models. The AsyncRolloutWorker now runs in a separate process to avoid GIL contention and potential NCCL watchdog timeouts, along with fixes for aiohttp retries and all-NaN reward columns. Gold distillation trainer now aligns tokens via byte offsets, and SDFT/SDPO leverage the vLLM server for live teacher logprobs. Other features include bidirectional masked importance sampling for IcePop, support for NemotronH and Nemotron 3 Ultra, additional training chat templates, and decoupled self-distillation trainers.

May 27, 2026

Trainer telemetry now allowlisted

↗

TRL · v1.5.1

Trainer telemetry is now gated on an explicit class-name allowlist, restricting which trainer classes can send telemetry.

May 25, 2026

Response parsing hang fixed; CUDA memory leak patched

↗

TRL · v1.5.0

Fixed an exponential backtracking bug in Qwen3/Qwen3.5/GLM4MoE response parsing that caused GRPOTrainer to hang indefinitely on truncated tool-call blocks, reducing worst-case complexity from O(2ⁿ) to O(n). Also fixed a CUDA memory leak in BNB dequantization buffers and stale state in OffloadActivations. Added training chat templates for Phi-3.5, Qwen3-VL, and Qwen3.5 Think/NoThink, and final logits softcapping support for AsyncGRPOTrainer on models like Gemma 2.

May 9, 2026

Chunked cross-entropy loss cuts SFT VRAM by 50%

↗

TRL · v1.4.0

A new loss_type="chunked_nll" option for SFT drastically reduces peak activation memory by computing cross-entropy over tokens in checkpointed chunks instead of materializing the full [batch × seq × vocab] logits tensor, unlocking sequence lengths that previously caused out-of-memory errors. Also added OpenReward Standard environment adapter support, length-normalized DPO sigmoid loss, training chat templates for Cohere, Cohere2, Gemma 3, Qwen3, and Qwen2.5, and a training-invariance test suite to catch numerical drift across trainer configurations.

Apr 26, 2026

TRL v1.3.0

↗

Features

Qwen 3.6 integration

TRL v1.3 ships training support for the new Qwen 3.6…

Apr 17, 2026

TRL v1.2.0

↗

Features

New `SSDTrainer` — Simple Self-Distillation

A new experimental SSDTrainer implements the…

Apr 16, 2026

PEFT v0.19.1

↗

A small patch release containing these fixes:

#3161
#3165

Full Changelog: https://github.com/huggingface/peft/compare/v0.19.0...v0.19.1

Apr 14, 2026

PEFT v0.19.0

↗

Highlights

This PEFT release contains no less than nine new PEFT methods, described below. It also contains numerous enhancements that should make PEFT more useful to many users.

<img width="1248" height="560" alt="peft-v0 19 0"…

Apr 12, 2026

TRL v1.1.0

↗

Features

`DistillationTrainer` for efficient on-policy distillation

Read the blog post: https://huggingface.co/spaces/HuggingFaceTB/trl-distillation-trainer

![off_vs_on_policy_distillation…

Mar 31, 2026

TRL v1.0.0

↗

Read our blog post for an overview of TRL v1.

Features

Asynchronous GRPO

Asynchronous GRPO…

Mar 20, 2026

Feb 25, 2026

TRL v0.29.0

↗

Features

Add `environment_factory` to `GRPOTrainer`

GRPOTrainer now accepts an environment_factory argument, allowing users to specify a custom environment class for training. This enables more flexible and diverse training scenarios by letting users define…

Feb 10, 2026

TRL v0.28.0

↗

Features

[GRPOTrainer]: Agent Training Supports Async Tool Calls by @pramodith in https://github.com/huggingface/trl/pull/4742
Add retry strategy to vLLM Client for increased robustness by @apalmas-saifh in https://github.com/huggingface/trl/pull/4845
Enable vLLM…

Feb 3, 2026

TRL v0.27.2

↗

What's Changed

Remove access to warnings_issued by @qgallouedec in #4960
Fix SFTTrainer init logic: remove TrainingArguments.push_to_hub_token only for transformers < v5 by @albertvillanova in #4942
Fix extra EOS appended in DPO preprocessing for conversational…

Jan 24, 2026

TRL v0.27.1

↗

What's Changed

Fix: undefined current_gradient_accumulation_steps by @qgallouedec in https://github.com/huggingface/trl/pull/4852
fix(DeepSeek OPSM): passing correct (vLLM) logprobs by @casinca in https://github.com/huggingface/trl/pull/4857
Fix SFT training for…

Jan 16, 2026

TRL v0.27.0

↗

Features

Add vllm_group_port argument to GRPO, RLOO and OnlineDPO configuration by @pointerhacker in https://github.com/huggingface/trl/pull/4545
Preserve truncated tokens in BFD packing by @qgallouedec in https://github.com/huggingface/trl/pull/4632
Support…

Latest

Jul 9, 2026

Fine-tuning

Features

Qwen 3.6 integration

Features

New SSDTrainer — Simple Self-Distillation

Highlights

Features

DistillationTrainer for efficient on-policy distillation

Features

Asynchronous GRPO

Features

Add environment_factory to GRPOTrainer

Features

What's Changed

What's Changed

Features

New `SSDTrainer` — Simple Self-Distillation

`DistillationTrainer` for efficient on-policy distillation

Add `environment_factory` to `GRPOTrainer`