Transformers

Core NLP/ML library for state-of-the-art models

Sun

Mon

Tue

Wed

Thu

Fri

Sat

JulAugSepOctNovDecJanFebMarAprMayJunJul

Less

Releases25Avg Interval3dAvg Cadence9/mo

Jul 16, 2026

Assisted decoding fixed for EncoderDecoderCache models; SDPA prefill hardened

↗

Transformers · v5.14.1

Fixed assisted decoding for models using EncoderDecoderCache (e.g., OlmoHybrid) and a SDPA prefill issue with position_bias during StaticCache. Also bumps FP8 kernels and fixes DeepGEMM on multiple devices.

Jul 15, 2026

Inkling 975B model added; GPTNeoX weight naming changed

↗Breaking

Transformers · v5.14.0

Added Inkling, a 975B-parameter multimodal model, and TIPSv2. GPTNeoX now remaps embed_out to lm_head and GPTBigCode has attention backend changes for vLLM compatibility. Multi-Token Prediction decoding support, SDPA prefill with FlashAttention for StaticCache (up to 260% faster), and numerous bug fixes across MoE, cache, and generation.

Jul 11, 2026

Custom models no longer break on new linear layer type names

↗

Transformers · v5.13.1

Fixes custom model compatibility with the latest vllm release by being more defensive with remap_legacy_layer_types and handling cases where custom code doesn't know about the new linear layer type names. Also fixed a key type assertion in _LazyAutoMapping.register.

Jul 3, 2026

Seven new model architectures: Kimi 2.5–2.7, MiMo-V2-Flash, Nemotron ASR, Qwen3 ASR, ZAYA, VideoPrism, RADIO

↗Breaking (minor)

Transformers · v5.13.0

This release adds support for seven new model architectures: Kimi 2.5–2.7 (multimodal agentic coding), MiMo-V2-Flash (256K context MoE model), Nemotron 3.5 ASR and Nemotron ASR Streaming (multilingual speech recognition with configurable latency-accuracy tradeoffs), Qwen3 ASR with forced aligner, ZAYA1 (MoE language model), VideoPrism (video understanding encoder), and RADIO (vision foundation model family).

Jun 15, 2026

Jun 12, 2026

New models: MiniMax-M3-VL, PP-OCRv6, Parakeet-RNNT

↗

Transformers · v5.12.0

This release introduces the MiniMax-M3-VL vision-language model, the PP-OCRv6 OCR system, and the Parakeet-RNNT model for speech processing. Several bug fixes and improvements were also made, including changes to CI, stop string matching, and model documentation.

Jun 11, 2026

AMD ROCm support added; FSDP2 hardened

↗

Accelerate · v1.14.0

Accelerate now works end-to-end on AMD ROCm devices. This release also includes numerous FSDP2 fixes and quality-of-life improvements, such as correct dtype handling on load, sharding of embeddings/norms, and QLoRA crash prevention.

Jun 10, 2026

DiffusionGemma and DeepSeek-V3.2 models added

↗

Transformers · v5.11.0

New models DiffusionGemma and DeepSeek-V3.2 have been added, featuring optimizations for inference speed and efficient long-context handling. The Kernels API was extended for module fusion and parameter transformation, with added support for fp8/fp4 Triton kernels. Model parallel beam search bugs in Qwen2-VL model families were fixed.

Jun 5, 2026

Agent traces now load for SFT; streaming shuffle uses multiple shards

↗

Datasets · v5.0.0

Agent traces from Claude Code, Pi, Codex and others can now be loaded with load_dataset and parsed to messages for training with TRL. Streaming shuffle now uses multiple input shards by default for better randomization, though this is a breaking change—use max_buffer_input_shards=1 to restore the prior behavior. Also added batch(by_column=...) for robotics datasets, support for Apache Iceberg, TsFile (IoTDB), 3D mesh, and CoNLL formats, plus fixes for Parquet streaming hangs, Lance file handling, and JSON null encoding.

Jun 4, 2026

CLIP model conversion bug fixed

↗

Transformers · v5.10.2

Fixed a conversion bug for CLIP models that affected downstream models like SAM3.

Jun 3, 2026

Gemma4 Unified; Sapiens2; model parallelism hardened

↗

Transformers · v5.10.1

Added Gemma4 12B Unified, an encoder-free multimodal model that projects raw vision and audio inputs directly into language model space; Sapiens2, a vision transformer family for human-centric tasks; DeepSeek-OCR-2 for document understanding; and Mellum, a code-focused mixture-of-experts model. Fixed numerous model parallelism bugs across tensor and expert parallelism, beam search under parallel settings, and loss over-counting; also fixed encoder-decoder cache initialization regression and BitsAndBytes quantization tensor-dropping bug.

May 20, 2026

Three new models added; SAM3 text embeddings API changes; generation bugs fixed

↗

Transformers · v5.9.0

Added support for Cohere2Moe (a Mixture-of-Experts model with sliding window and full attention), HRM-Text (hierarchical reasoning model with two transformer stacks), and Parakeet tdt speech model. SAM3, EdgeTAM, and SAM3-Lite-Text now expect full text embeddings instead of pooler outputs, requiring input updates. Fixed generation issues including inputs_embeds handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, memory leaks from lru decorators in vision models, and improved audio/vision encoder compilability.

May 13, 2026

Deepseek V4 integration fixed

↗

Transformers · v5.8.1

Fixed Deepseek V4 integration issues including CSA mask collapse and WeightConverter regex incorrectly matching shared_experts as experts. Also added fatal_error to ContinuousBatchingManager for serving operations.

May 5, 2026