Python v0.3.0
CharDelimiterSplit: a new PreTokenizer that allows splitting sequences on the given delimiter (Works like .split(delimiter))WordLevel: a new model that simply maps tokens to their ids.Encoding that are ready to be processed by a language model, just as the main Encoding.output = tokenizer.encode(...)
print(output.original_str.offsets(output.offsets[3]))
Fetched April 7, 2026