releases.shpreview
Hugging Face/Tokenizers/python-v0.6.0

python-v0.6.0

Python v0.6.0

$npx -y @buildinternet/releases show rel_xB5p2Pk7zgr-wwiQhtMGk

Changes:

  • Big improvements in speed for BPE (Both training and tokenization) (#165)

Fixes:

  • Some default tokens were missing from BertWordPieceTokenizer (cf #160)
  • There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up in multiple bytes. (cf #156)
  • The longest_first truncation strategy had a bug (#174)

Fetched April 7, 2026