Python v0.5.0
BertWordPieceTokenizer now cleans up some tokenization artifacts while decoding (cf #145)ByteLevelBPETokenizer now has dropout (thanks @colinclement with #149)Strip normalizerdo_lowercase has been changed to lowercase for consistency between the different tokenizers. (Especially ByteLevelBPETokenizer and CharBPETokenizer)__len__ on Encoding (cf #139)BertWordPieceTokenizer.BPETokenizerFetched April 7, 2026