decode and decode_batch work on borrowed content. by @mfuntowicz in https://github.com/huggingface/tokenizers/pull/1251expect() for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316safetensors. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.1
expect() for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316safetensors. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.14.1rc1
⚠️ Reworks the release pipeline. Other breaking changes ⚠️ :
is_special_token rename to special for consistencyOFF by default, and depends on hf-hub instead of cached_path (updated cache directory, better sync implementation)decode and decode_batch work on borrowed content. by @mfuntowicz in https://github.com/huggingface/tokenizers/pull/1251expect() for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316safetensors. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.0
Reworks the release pipeline. Other breaking changes are mostly related to https://github.com/huggingface/tokenizers/pull/1335, where AddedToken is reworked
expect() for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316safetensors. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.14.0.rc1
Mostly checking the new release scripts actually work.
expect() for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.13.4.rc3
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc1...v0.13.4.rc2
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.4-rc2...v0.13.4.rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.2...v0.13.3
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Tokenizer clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152from_pretrained on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153tokenizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183datasets train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192Replace to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195normalizers.Prepend (To be used instead of Metaspace). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194content to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
Python 3.11 support (Python only modification)
Python 3.11 support (Python only modification)
Decoder is now a composable trait, but without being backward incompatibleProcessor is now a composable trait, but without being backward incompatibleBoth trait changes warrant a "major" number since, despite best efforts to not break backward compatibility, the code is different enough that we cannot be exactly sure.