releases.shpreview

New models: MiniMax-M3-VL, PP-OCRv6, Parakeet-RNNT

v5.12.0

3 features7 enhancements13 fixesThis release3 featuresNew capabilities7 enhancementsImprovements to existing features13 fixesBug fixesAI-tallied from the release notes

New Model additions

MiniMax-M3-VL

<img width="886" height="583" alt="image" src="https://github.com/user-attachments/assets/ae9dd96f-6877-4531-a06b-a756686f24e5" />

MiniMax-M3-VL is the vision-language member of the MiniMax-M3 family that pairs a CLIP-style vision tower with 3D rotary position embeddings with the MiniMax-M3 text backbone. It uses a mixed dense/sparse Mixture-of-Experts decoder with SwiGLU-OAI gated experts and a lightning indexer for block-sparse attention. The model processes images through a Conv3d patch embedding system and includes specialized components for efficient multimodal understanding and generation.

Links: Documentation

PP-OCRv6: update documentation and slow tests (#46576)

<img width="3840" height="1494" alt="image" src="https://github.com/user-attachments/assets/e62284ec-78bf-49cb-8aa2-deccc665372f" />

The official weights for PP-OCRv6 are out: PP-OCRv6 is a lightweight OCR system that combines architectural innovation with data-centric optimization. It redesigns the backbone, detection neck, and recognition neck around a unified MetaFormer-style building block with structural reparameterization. Three model tiers (medium, small, tiny) share the same block primitives, covering deployment scenarios from server to edge.

  • PP-OCRv6: update documentation and slow tests (#46576) by @ zhang-prog

Add Parakeet-RNNT (#46331)

ParakeetForRNNT: a Fast Conformer Encoder + an RNN-T (RNN Transducer) decoder

  • RNN-T Decoder: Standard neural transducer:
    • LSTM prediction network maintains language context across token predictions.
      • Joint network combines encoder and decoder outputs.
      • Greedy transducer decoding for inference: a blank emission advances the encoder frame by one, a non-blank emission stays on the same frame.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

Fetched June 12, 2026