{"id":"src_9fkD1v4bXCzE8DETuFtcD","slug":"tokenizers","name":"Tokenizers","type":"github","url":"https://github.com/huggingface/tokenizers","orgId":"org_GDdYeYynEgCEBNBwy-m6s","org":{"slug":"hugging-face","name":"Hugging Face"},"isPrimary":false,"metadata":"{\"evaluatedMethod\":\"github\",\"evaluatedAt\":\"2026-04-07T17:19:14.575Z\",\"changelogDetectedAt\":\"2026-04-07T17:28:05.248Z\"}","releaseCount":100,"releasesLast30Days":0,"avgReleasesPerWeek":0,"latestVersion":"v0.22.2","latestDate":"2025-12-02T13:01:19.000Z","changelogUrl":null,"hasChangelogFile":false,"lastFetchedAt":"2026-04-19T03:02:00.201Z","trackingSince":"2019-12-03T22:44:21.000Z","releases":[{"id":"rel_IdgCR4JfoZ4rXw3ZK8Ar2","version":"v0.22.2","title":"Release v0.22.2 ","summary":"## What's Changed\r\n\r\nOkay mostly doing the release for these PR: \r\n* Update deserialize of added tokens by @ArthurZucker in https://github.com/hugging...","content":"## What's Changed\r\n\r\nOkay mostly doing the release for these PR: \r\n* Update deserialize of added tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1891\r\n* update stub for typing by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1896\r\n* bump PyO3 to 0.26 by @davidhewitt in https://github.com/huggingface/tokenizers/pull/1901\r\n\r\n<img width=\"2400\" height=\"1200\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0b974453-1fc6-4393-84ea-da99269e2b34\" />\r\n\r\n\r\nBasically good typing with at least `ty`, and a lot fast (from 4 to 8x faster) loading vocab with a lot of added tokens and GIL free !? \r\n\r\n\r\n* ci: add support for building Win-ARM64 wheels by @MugundanMCW in https://github.com/huggingface/tokenizers/pull/1869\r\n* Add cargo-semver-checks to Rust CI workflow by @haixuanTao in https://github.com/huggingface/tokenizers/pull/1875\r\n* Update indicatif dependency by @gordonmessmer in https://github.com/huggingface/tokenizers/pull/1867\r\n* Bump node-forge from 1.3.1 to 1.3.2 in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in https://github.com/huggingface/tokenizers/pull/1889\r\n* Bump js-yaml from 3.14.1 to 3.14.2 in /bindings/node by @dependabot[bot] in https://github.com/huggingface/tokenizers/pull/1892\r\n\r\n* fix: used normalize_str in BaseTokenizer.normalize by @ishitab02 in https://github.com/huggingface/tokenizers/pull/1884\r\n* [MINOR:TYPO] Update mod.rs by @cakiki in https://github.com/huggingface/tokenizers/pull/1883\r\n* Remove runtime stderr warning from Python bindings by @Copilot in https://github.com/huggingface/tokenizers/pull/1898\r\n* Mark immutable pyclasses as frozen by @ngoldbaum in https://github.com/huggingface/tokenizers/pull/1861\r\n* DOCS: add `add_prefix_space` to `processors.ByteLevel`  by @CloseChoice in https://github.com/huggingface/tokenizers/pull/1878\r\n\r\n* Bump express from 4.21.2 to 4.22.1 in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in https://github.com/huggingface/tokenizers/pull/1903\r\n\r\n## New Contributors\r\n* @MugundanMCW made their first contribution in https://github.com/huggingface/tokenizers/pull/1869\r\n* @haixuanTao made their first contribution in https://github.com/huggingface/tokenizers/pull/1875\r\n* @gordonmessmer made their first contribution in https://github.com/huggingface/tokenizers/pull/1867\r\n* @ishitab02 made their first contribution in https://github.com/huggingface/tokenizers/pull/1884\r\n* @Copilot made their first contribution in https://github.com/huggingface/tokenizers/pull/1898\r\n* @ngoldbaum made their first contribution in https://github.com/huggingface/tokenizers/pull/1861\r\n* @CloseChoice made their first contribution in https://github.com/huggingface/tokenizers/pull/1878\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.22.1...v0.22.2","publishedAt":"2025-12-02T13:01:19.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.22.2","media":[]},{"id":"rel_nT9XHEJHcaluP4apUbL0g","version":"v0.22.1","title":"v0.22.1","summary":"# Release v0.22.1\r\n\r\nMain change:\r\n- Bump huggingface_hub upper version (#1866) from @Wauplin \r\n- chore(trainer): add and improve trainer signature (#...","content":"# Release v0.22.1\r\n\r\nMain change:\r\n- Bump huggingface_hub upper version (#1866) from @Wauplin \r\n- chore(trainer): add and improve trainer signature (#1838) from @shenxiangzhuang \r\n- Some doc updates: c91d76ae558ca2dc1aa725959e65dc21bf1fed7e, 7b0217894c1e2baed7354ab41503841b47af7cf9, 57eb8d7d9564621221784f7949b9efdeb7a49ac1 \r\n\r\n","publishedAt":"2025-09-19T09:52:24.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.22.1","media":[]},{"id":"rel_-z7crM27ELNq_v9fXzdX9","version":"v0.22.0","title":"v0.22.0","summary":"## What's Changed\r\n* Bump on-headers and compression in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in https://github.com/huggingface/t...","content":"## What's Changed\r\n* Bump on-headers and compression in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in https://github.com/huggingface/tokenizers/pull/1827\r\n* Implement `from_bytes` and `read_bytes` Methods in WordPiece Tokenizer for WebAssembly Compatibility by @sondalex in https://github.com/huggingface/tokenizers/pull/1758\r\n* fix: use AHashMap to fix compile error by @b00f in https://github.com/huggingface/tokenizers/pull/1840\r\n* New stream by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1856\r\n* [docs] Add more decoders by @pcuenca in https://github.com/huggingface/tokenizers/pull/1849\r\n* Fix missing parenthesis in `EncodingVisualizer.calculate_label_colors` by @Liam-DeVoe in https://github.com/huggingface/tokenizers/pull/1853\r\n* Update quicktour.mdx re: Issue #1625 by @WilliamPLaCroix in https://github.com/huggingface/tokenizers/pull/1846\r\n* remove stray comment by @sanderland in https://github.com/huggingface/tokenizers/pull/1831\r\n* Fix typo in README by @aisk in https://github.com/huggingface/tokenizers/pull/1808\r\n* RUSTSEC-2024-0436 - replace paste with pastey by @nystromjd in https://github.com/huggingface/tokenizers/pull/1834\r\n* Tokenizer: Add native async bindings, via py03-async-runtimes. by @michaelfeil in https://github.com/huggingface/tokenizers/pull/1843\r\n\r\n## New Contributors\r\n* @b00f made their first contribution in https://github.com/huggingface/tokenizers/pull/1840\r\n* @pcuenca made their first contribution in https://github.com/huggingface/tokenizers/pull/1849\r\n* @Liam-DeVoe made their first contribution in https://github.com/huggingface/tokenizers/pull/1853\r\n* @WilliamPLaCroix made their first contribution in https://github.com/huggingface/tokenizers/pull/1846\r\n* @sanderland made their first contribution in https://github.com/huggingface/tokenizers/pull/1831\r\n* @aisk made their first contribution in https://github.com/huggingface/tokenizers/pull/1808\r\n* @nystromjd made their first contribution in https://github.com/huggingface/tokenizers/pull/1834\r\n* @michaelfeil made their first contribution in https://github.com/huggingface/tokenizers/pull/1843\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.3...v0.22.0rc0","publishedAt":"2025-08-29T10:25:50.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.22.0","media":[]},{"id":"rel_p7AyVgutAtVk_9UjAQYf6","version":"v0.21.4","title":"v0.21.4","summary":"**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.3...v0.21.4\r\n\r\n\r\nNo change, the 0.21.3 release failed, this is just a re-r...","content":"**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.3...v0.21.4\r\n\r\n\r\nNo change, the 0.21.3 release failed, this is just a re-release.\r\n\r\nhttps://github.com/huggingface/tokenizers/releases/tag/v0.21.3","publishedAt":"2025-07-28T13:18:55.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.4","media":[]},{"id":"rel_BodGRBkFB8jMKZ_3wdRDh","version":"v0.21.3","title":"v0.21.3","summary":"## What's Changed\r\n* Clippy fixes. by @Narsil in https://github.com/huggingface/tokenizers/pull/1818\r\n* Fixed an introduced backward breaking change i...","content":"## What's Changed\r\n* Clippy fixes. by @Narsil in https://github.com/huggingface/tokenizers/pull/1818\r\n* Fixed an introduced backward breaking change in our Rust APIs.\r\n\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.2...v0.21.3","publishedAt":"2025-07-04T11:58:09.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.3","media":[]},{"id":"rel_kwjXMnC1msg_KI70OCPHX","version":"v0.21.2","title":"v0.21.2","summary":"## What's Changed\r\n\r\nThis release if focused around some performance optimization, enabling broader python no gil support, and fixing some onig issues...","content":"## What's Changed\r\n\r\nThis release if focused around some performance optimization, enabling broader python no gil support, and fixing some onig issues! \r\n\r\n\r\n* Update the release builds following 0.21.1. by @Narsil in https://github.com/huggingface/tokenizers/pull/1746\r\n* replace lazy_static with stabilized std::sync::LazyLock in 1.80 by @sftse in https://github.com/huggingface/tokenizers/pull/1739\r\n* Fix no-onig no-wasm builds by @414owen in https://github.com/huggingface/tokenizers/pull/1772\r\n* Fix typos in strings and comments by @co63oc in https://github.com/huggingface/tokenizers/pull/1770\r\n* Fix type notation of merges in BPE Python binding by @Coqueue in https://github.com/huggingface/tokenizers/pull/1766\r\n* Bump http-proxy-middleware from 2.0.6 to 2.0.9 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1762\r\n* Fix data path in test_continuing_prefix_trainer_mismatch by @GaetanLepage in https://github.com/huggingface/tokenizers/pull/1747\r\n* clippy by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1781\r\n* Update pyo3 and rust-numpy depends for no-gil/free-threading compat by @Qubitium in https://github.com/huggingface/tokenizers/pull/1774\r\n* Use ApiBuilder::from_env() in from_pretrained function by @BenLocal in https://github.com/huggingface/tokenizers/pull/1737\r\n* Upgrade onig, to get it compiling with GCC 15 by @414owen in https://github.com/huggingface/tokenizers/pull/1771\r\n* Itertools upgrade by @sftse in https://github.com/huggingface/tokenizers/pull/1756\r\n* Bump webpack-dev-server from 4.10.0 to 5.2.1 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1792\r\n* Bump brace-expansion from 1.1.11 to 1.1.12 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1796\r\n* Fix features blending into a paragraph by @bionicles in https://github.com/huggingface/tokenizers/pull/1798\r\n* Adding throughput to benches to have a more consistent measure across by @Narsil in https://github.com/huggingface/tokenizers/pull/1800\r\n* Upgrading dependencies. by @Narsil in https://github.com/huggingface/tokenizers/pull/1801\r\n* [docs] Whitespace by @stevhliu in https://github.com/huggingface/tokenizers/pull/1785\r\n* Hotfixing the stub. by @Narsil in https://github.com/huggingface/tokenizers/pull/1802\r\n* Bpe clones by @sftse in https://github.com/huggingface/tokenizers/pull/1707\r\n* Fixed Length Pre-Tokenizer by @jonvet in https://github.com/huggingface/tokenizers/pull/1713\r\n* Consolidated optimization ahash dary compact str by @Narsil in https://github.com/huggingface/tokenizers/pull/1799\r\n* 🚨 breaking: Fix training with special tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1617\r\n\r\n## New Contributors\r\n* @414owen made their first contribution in https://github.com/huggingface/tokenizers/pull/1772\r\n* @co63oc made their first contribution in https://github.com/huggingface/tokenizers/pull/1770\r\n* @Coqueue made their first contribution in https://github.com/huggingface/tokenizers/pull/1766\r\n* @GaetanLepage made their first contribution in https://github.com/huggingface/tokenizers/pull/1747\r\n* @Qubitium made their first contribution in https://github.com/huggingface/tokenizers/pull/1774\r\n* @BenLocal made their first contribution in https://github.com/huggingface/tokenizers/pull/1737\r\n* @bionicles made their first contribution in https://github.com/huggingface/tokenizers/pull/1798\r\n* @stevhliu made their first contribution in https://github.com/huggingface/tokenizers/pull/1785\r\n* @jonvet made their first contribution in https://github.com/huggingface/tokenizers/pull/1713\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.1...v0.21.2rc0","publishedAt":"2025-06-24T10:26:00.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.2","media":[]},{"id":"rel_u0351zDmYaSgQ5-BAUQ_A","version":"v0.21.1","title":"v0.21.1","summary":"## What's Changed\r\n* Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693\r\n* Add feature flag...","content":"## What's Changed\r\n* Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693\r\n* Add feature flag hint to README.md, fixes #1633 by @sftse in https://github.com/huggingface/tokenizers/pull/1709\r\n* Upgrade to PyO3 0.23 by @Narsil in https://github.com/huggingface/tokenizers/pull/1708\r\n* Fixing the README. by @Narsil in https://github.com/huggingface/tokenizers/pull/1714\r\n* Fix typo in Split docstrings by @Dylan-Harden3 in https://github.com/huggingface/tokenizers/pull/1701\r\n* Fix typos by @tinyboxvk in https://github.com/huggingface/tokenizers/pull/1715\r\n* Update documentation of Rust feature by @sondalex in https://github.com/huggingface/tokenizers/pull/1711\r\n* Fix panic in DecodeStream::step due to incorrect index usage by @n0gu-furiosa in https://github.com/huggingface/tokenizers/pull/1699\r\n* Fixing the stream by removing the read_index altogether. by @Narsil in https://github.com/huggingface/tokenizers/pull/1716\r\n* Fixing NormalizedString append when normalized is empty. by @Narsil in https://github.com/huggingface/tokenizers/pull/1717\r\n* 🚨 Support updating template processors by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1652.  Removed in this release to keep backware compatibility temporarily.\r\n* Update metadata as Python3.7 and Python3.8 support was dropped by @earlytobed in https://github.com/huggingface/tokenizers/pull/1724\r\n* Add rustls-tls feature by @torymur in https://github.com/huggingface/tokenizers/pull/1732\r\n\r\n## New Contributors\r\n* @Dylan-Harden3 made their first contribution in https://github.com/huggingface/tokenizers/pull/1701\r\n* @sondalex made their first contribution in https://github.com/huggingface/tokenizers/pull/1711\r\n* @n0gu-furiosa made their first contribution in https://github.com/huggingface/tokenizers/pull/1699\r\n* @earlytobed made their first contribution in https://github.com/huggingface/tokenizers/pull/1724\r\n* @torymur made their first contribution in https://github.com/huggingface/tokenizers/pull/1732\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.0...v0.21.1","publishedAt":"2025-03-13T10:44:52.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.1","media":[]},{"id":"rel_2WoZbBLArPIZY2P_2C-u0","version":"v0.21.1rc0","title":"v0.21.1rc0","summary":"## What's Changed\r\n* Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693\r\n* Add feature flag...","content":"## What's Changed\r\n* Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693\r\n* Add feature flag hint to README.md, fixes #1633 by @sftse in https://github.com/huggingface/tokenizers/pull/1709\r\n* Upgrade to PyO3 0.23 by @Narsil in https://github.com/huggingface/tokenizers/pull/1708\r\n* Fixing the README. by @Narsil in https://github.com/huggingface/tokenizers/pull/1714\r\n* Fix typo in Split docstrings by @Dylan-Harden3 in https://github.com/huggingface/tokenizers/pull/1701\r\n* Fix typos by @tinyboxvk in https://github.com/huggingface/tokenizers/pull/1715\r\n* Update documentation of Rust feature by @sondalex in https://github.com/huggingface/tokenizers/pull/1711\r\n* Fix panic in DecodeStream::step due to incorrect index usage by @n0gu-furiosa in https://github.com/huggingface/tokenizers/pull/1699\r\n* Fixing the stream by removing the read_index altogether. by @Narsil in https://github.com/huggingface/tokenizers/pull/1716\r\n* Fixing NormalizedString append when normalized is empty. by @Narsil in https://github.com/huggingface/tokenizers/pull/1717\r\n* 🚨 Support updating template processors by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1652\r\n* Update metadata as Python3.7 and Python3.8 support was dropped by @earlytobed in https://github.com/huggingface/tokenizers/pull/1724\r\n* Add rustls-tls feature by @torymur in https://github.com/huggingface/tokenizers/pull/1732\r\n\r\n## New Contributors\r\n* @Dylan-Harden3 made their first contribution in https://github.com/huggingface/tokenizers/pull/1701\r\n* @sondalex made their first contribution in https://github.com/huggingface/tokenizers/pull/1711\r\n* @n0gu-furiosa made their first contribution in https://github.com/huggingface/tokenizers/pull/1699\r\n* @earlytobed made their first contribution in https://github.com/huggingface/tokenizers/pull/1724\r\n* @torymur made their first contribution in https://github.com/huggingface/tokenizers/pull/1732\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.0...v0.21.1rc0","publishedAt":"2025-03-12T09:47:07.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.1rc0","media":[]},{"id":"rel_2dJu9UTh9ULjDadr7RcL8","version":"v0.21.0","title":"Release v0.21.0 ","summary":"## Release ~v0.20.4~ v0.21.0 \r\n* More cache options. by @Narsil in https://github.com/huggingface/tokenizers/pull/1675\r\n* Disable caching for long str...","content":"## Release ~v0.20.4~ v0.21.0 \r\n* More cache options. by @Narsil in https://github.com/huggingface/tokenizers/pull/1675\r\n* Disable caching for long strings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1676\r\n* Testing ABI3 wheels to reduce number of wheels by @Narsil in https://github.com/huggingface/tokenizers/pull/1674\r\n* Adding an API for decode streaming. by @Narsil in https://github.com/huggingface/tokenizers/pull/1677\r\n* Decode stream python by @Narsil in https://github.com/huggingface/tokenizers/pull/1678\r\n* Fix encode_batch and encode_batch_fast to accept ndarrays again by @diliop in  https://github.com/huggingface/tokenizers/pull/1679 \r\n\r\nWe also no longer support python 3.7 or 3.8 (similar to transformers) as they are deprecated.\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.20.3...v0.21.0","publishedAt":"2024-11-15T11:12:00.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.21.0","media":[]},{"id":"rel_yXIiIa7QUOF38kQdyBXA7","version":"v0.20.3","title":"v0.20.3","summary":"## What's Changed\r\nThere was a breaking change in `0.20.3` for tuple inputs of `encode_batch`! \r\n* fix pylist by @ArthurZucker in https://github.com/h...","content":"## What's Changed\r\nThere was a breaking change in `0.20.3` for tuple inputs of `encode_batch`! \r\n* fix pylist by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1673\r\n* [MINOR:TYPO] Fix docstrings by @cakiki in https://github.com/huggingface/tokenizers/pull/1653\r\n\r\n## New Contributors\r\n* @cakiki made their first contribution in https://github.com/huggingface/tokenizers/pull/1653\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.20.2...v0.20.3","publishedAt":"2024-11-05T17:20:01.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.20.3","media":[]},{"id":"rel_tBTJCzh0pUtFxBHdgiLCL","version":"v0.20.2","title":"v0.20.2","summary":"# Release v0.20.2\r\n\r\nThanks a MILE to @diliop we now have support for python 3.13! 🥳 \r\n\r\n## What's Changed\r\n* Bump cookie and express in /tokenizers/...","content":"# Release v0.20.2\r\n\r\nThanks a MILE to @diliop we now have support for python 3.13! 🥳 \r\n\r\n## What's Changed\r\n* Bump cookie and express in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1648\r\n* Fix off-by-one error in tokenizer::normalizer::Range::len by @rlanday in https://github.com/huggingface/tokenizers/pull/1638\r\n* Arg name correction: auth_token -> token by @rravenel in https://github.com/huggingface/tokenizers/pull/1621\r\n* Unsound call of `set_var` by @sftse in https://github.com/huggingface/tokenizers/pull/1664\r\n* Add safety comments by @Manishearth in https://github.com/huggingface/tokenizers/pull/1651\r\n* Bump actions/checkout to v4 by @tinyboxvk in https://github.com/huggingface/tokenizers/pull/1667\r\n* PyO3 0.22 by @diliop in https://github.com/huggingface/tokenizers/pull/1665\r\n* Bump actions versions by @tinyboxvk in https://github.com/huggingface/tokenizers/pull/1669\r\n\r\n## New Contributors\r\n* @rlanday made their first contribution in https://github.com/huggingface/tokenizers/pull/1638\r\n* @rravenel made their first contribution in https://github.com/huggingface/tokenizers/pull/1621\r\n* @sftse made their first contribution in https://github.com/huggingface/tokenizers/pull/1664\r\n* @Manishearth made their first contribution in https://github.com/huggingface/tokenizers/pull/1651\r\n* @tinyboxvk made their first contribution in https://github.com/huggingface/tokenizers/pull/1667\r\n* @diliop made their first contribution in https://github.com/huggingface/tokenizers/pull/1665\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.20.1...v0.20.2","publishedAt":"2024-11-04T17:25:24.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.20.2","media":[]},{"id":"rel_t-wqWMsxJXN18xAeqLcsu","version":"v0.20.1","title":"Release v0.20.1","summary":"## What's Changed\r\nThe most awaited `offset` issue with `Llama` is fixed 🥳 \r\n\r\n* Update README.md by @ArthurZucker in https://github.com/huggingface/...","content":"## What's Changed\r\nThe most awaited `offset` issue with `Llama` is fixed 🥳 \r\n\r\n* Update README.md by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1608\r\n* fix benchmark file link by @152334H in https://github.com/huggingface/tokenizers/pull/1610\r\n* Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows by @dependabot in https://github.com/huggingface/tokenizers/pull/1626\r\n* [`ignore_merges`] Fix offsets by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1640\r\n* Bump body-parser and express in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1629\r\n* Bump serve-static and express in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1630\r\n* Bump send and express in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1631\r\n* Bump webpack from 5.76.0 to 5.95.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1641\r\n* Fix documentation build by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1642\r\n* style: simplify string formatting for readability by @hamirmahal in https://github.com/huggingface/tokenizers/pull/1632\r\n\r\n## New Contributors\r\n* @152334H made their first contribution in https://github.com/huggingface/tokenizers/pull/1610\r\n* @hamirmahal made their first contribution in https://github.com/huggingface/tokenizers/pull/1632\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.20.0...v0.20.1","publishedAt":"2024-10-10T09:56:39.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.20.1","media":[]},{"id":"rel_SDpNB0PKRQzlPPpvkpk3S","version":"v0.20.0","title":"Release v0.20.0: faster encode, better python support","summary":"# Release v0.20.0\r\n\r\nThis release is focused on **performances** and **user experience**. \r\n\r\n## Performances:\r\nFirst off, we did a bit of benchmarkin...","content":"# Release v0.20.0\r\n\r\nThis release is focused on **performances** and **user experience**. \r\n\r\n## Performances:\r\nFirst off, we did a bit of benchmarking, and found some place for improvement for us!\r\nWith a few minor changes (mostly #1587) here is what we get on `Llama3` running on a g6 instances on AWS `https://github.com/huggingface/tokenizers/blob/main/bindings/python/benches/test_tiktoken.py` : \r\n![image](https://github.com/user-attachments/assets/e6838866-ec76-44ce-a7b6-532e56971234)\r\n\r\n## Python API\r\nWe shipped better deserialization errors in general, and support for `__str__` and `__repr__` for all the object. This allows for a lot easier debugging see this:\r\n```python3\r\n>>> from tokenizers import Tokenizer;\r\n>>> tokenizer = Tokenizer.from_pretrained(\"bert-base-uncased\");\r\n>>> print(tokenizer)\r\nTokenizer(version=\"1.0\", truncation=None, padding=None, added_tokens=[{\"id\":0, \"content\":\"[PAD]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, ...}, {\"id\":100, \"content\":\"[UNK]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, ...}, {\"id\":101, \"content\":\"[CLS]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, ...}, {\"id\":102, \"content\":\"[SEP]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, ...}, {\"id\":103, \"content\":\"[MASK]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, ...}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id=\"[CLS]\", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id=\"[SEP]\", type_id=0)], pair=[SpecialToken(id=\"[CLS]\", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id=\"[SEP]\", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id=\"[SEP]\", type_id=1)], special_tokens={\"[CLS]\":SpecialToken(id=\"[CLS]\", ids=[101], tokens=[\"[CLS]\"]), \"[SEP]\":SpecialToken(id=\"[SEP]\", ids=[102], tokens=[\"[SEP]\"])}), decoder=WordPiece(prefix=\"##\", cleanup=True), model=WordPiece(unk_token=\"[UNK]\", continuing_subword_prefix=\"##\", max_input_chars_per_word=100, vocab={\"[PAD]\":0, \"[unused0]\":1, \"[unused1]\":2, \"[unused2]\":3, \"[unused3]\":4, ...}))\r\n\r\n>>> tokenizer\r\nTokenizer(version=\"1.0\", truncation=None, padding=None, added_tokens=[{\"id\":0, \"content\":\"[PAD]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, \"normalized\":False, \"special\":True}, {\"id\":100, \"content\":\"[UNK]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, \"normalized\":False, \"special\":True}, {\"id\":101, \"content\":\"[CLS]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, \"normalized\":False, \"special\":True}, {\"id\":102, \"content\":\"[SEP]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, \"normalized\":False, \"special\":True}, {\"id\":103, \"content\":\"[MASK]\", \"single_word\":False, \"lstrip\":False, \"rstrip\":False, \"normalized\":False, \"special\":True}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id=\"[CLS]\", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id=\"[SEP]\", type_id=0)], pair=[SpecialToken(id=\"[CLS]\", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id=\"[SEP]\", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id=\"[SEP]\", type_id=1)], special_tokens={\"[CLS]\":SpecialToken(id=\"[CLS]\", ids=[101], tokens=[\"[CLS]\"]), \"[SEP]\":SpecialToken(id=\"[SEP]\", ids=[102], tokens=[\"[SEP]\"])}), decoder=WordPiece(prefix=\"##\", cleanup=True), model=WordPiece(unk_token=\"[UNK]\", continuing_subword_prefix=\"##\", max_input_chars_per_word=100, vocab={\"[PAD]\":0, \"[unused0]\":1, \"[unused1]\":2, ...}))\r\n```\r\n\r\nThe `pre_tokenizer.Sequence` and `normalizer.Sequence` are also more accessible now:\r\n```python \r\nfrom tokenizers import normalizers\r\nnorm = normalizers.Sequence([normalizers.Strip(), normalizers.BertNormalizer()])\r\nnorm[0]\r\nnorm[1].lowercase=False\r\n```\r\n\r\n## What's Changed\r\n* remove enforcement of non special when adding tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1521\r\n* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder by @Narsil in https://github.com/huggingface/tokenizers/pull/1513\r\n* Make `USED_PARALLELISM` atomic by @nathaniel-daniel in https://github.com/huggingface/tokenizers/pull/1532\r\n* Fixing for clippy 1.78 by @Narsil in https://github.com/huggingface/tokenizers/pull/1548\r\n* feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/tokenizers/pull/1551\r\n* Switch from `cached_download` to `hf_hub_download` in tests by @Wauplin in https://github.com/huggingface/tokenizers/pull/1547\r\n* Fix \"dictionnary\" typo by @nprisbrey in https://github.com/huggingface/tokenizers/pull/1511\r\n* make sure we don't warn on empty tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1554\r\n* Enable `dropout = 0.0` as an equivalent to `none` in BPE by @mcognetta in https://github.com/huggingface/tokenizers/pull/1550\r\n* Revert \"[BREAKING CHANGE] Ignore added_tokens (both special and not) … by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1569\r\n* Add bytelevel normalizer to fix decode when adding tokens to BPE by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1555\r\n* Fix clippy + feature test management. by @Narsil in https://github.com/huggingface/tokenizers/pull/1580\r\n* Bump spm_precompiled to 0.1.3 by @MikeIvanichev in https://github.com/huggingface/tokenizers/pull/1571\r\n* Add benchmark vs tiktoken by @Narsil in https://github.com/huggingface/tokenizers/pull/1582\r\n* Fixing the benchmark. by @Narsil in https://github.com/huggingface/tokenizers/pull/1583\r\n* Tiny improvement by @Narsil in https://github.com/huggingface/tokenizers/pull/1585\r\n* Enable fancy regex by @Narsil in https://github.com/huggingface/tokenizers/pull/1586\r\n* Fixing release CI strict (taken from safetensors). by @Narsil in https://github.com/huggingface/tokenizers/pull/1593\r\n* Adding some serialization testing around the wrapper. by @Narsil in https://github.com/huggingface/tokenizers/pull/1594\r\n* Add-legacy-tests by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1597\r\n* Adding a few tests for decoder deserialization. by @Narsil in https://github.com/huggingface/tokenizers/pull/1598\r\n* Better serialization error by @Narsil in https://github.com/huggingface/tokenizers/pull/1595\r\n* Add test normalizers by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1600\r\n* Improve decoder deserialization by @Narsil in https://github.com/huggingface/tokenizers/pull/1599\r\n* Using serde (serde_pyo3) to get __str__ and __repr__ easily. by @Narsil in https://github.com/huggingface/tokenizers/pull/1588\r\n* Merges cannot handle tokens containing spaces. by @Narsil in https://github.com/huggingface/tokenizers/pull/909\r\n* Fix doc about split by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1591\r\n* Support `None` to reset pre_tokenizers and normalizers, and index sequences by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1590\r\n* Fix strip python type by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1602\r\n* Tests + Deserialization improvement for normalizers. by @Narsil in https://github.com/huggingface/tokenizers/pull/1604\r\n* add deserialize for pre tokenizers by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1603\r\n* Perf improvement 16% by removing offsets. by @Narsil in https://github.com/huggingface/tokenizers/pull/1587\r\n\r\n## New Contributors\r\n* @nathaniel-daniel made their first contribution in https://github.com/huggingface/tokenizers/pull/1532\r\n* @nprisbrey made their first contribution in https://github.com/huggingface/tokenizers/pull/1511\r\n* @mcognetta made their first contribution in https://github.com/huggingface/tokenizers/pull/1550\r\n* @MikeIvanichev made their first contribution in https://github.com/huggingface/tokenizers/pull/1571\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.19.1...v0.20.0rc1","publishedAt":"2024-08-08T16:56:21.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.20.0","media":[]},{"id":"rel_vrGbQrXZRSfsuzllX66NF","version":"v0.19.1","title":"v0.19.1","summary":"## What's Changed\r\n* add serialization for `ignore_merges` by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1504\r\n\r\n\r\n**Full Changel...","content":"## What's Changed\r\n* add serialization for `ignore_merges` by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1504\r\n\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.19.0...v0.19.1","publishedAt":"2024-04-17T21:37:50.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.19.1","media":[]},{"id":"rel_rKFmcoN7JNxYISHRxK64m","version":"v0.19.0","title":"v0.19.0","summary":"## What's Changed\r\n* chore: Remove CLI - this was originally intended for local development by @bryantbiggs in https://github.com/huggingface/tokenize...","content":"## What's Changed\r\n* chore: Remove CLI - this was originally intended for local development by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1442\r\n* [`remove black`] And use ruff by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1436\r\n* Bump ip from 2.0.0 to 2.0.1 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1456\r\n* Added ability to inspect a 'Sequence' decoder and the `AddedVocabulary`. by @eaplatanios in https://github.com/huggingface/tokenizers/pull/1443\r\n* 🚨🚨 BREAKING CHANGE 🚨🚨: (add_prefix_space dropped everything is using prepend_scheme enum instead) Refactor metaspace by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1476\r\n* Add more support for tiktoken based tokenizers by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1493\r\n* PyO3 0.21. by @Narsil in https://github.com/huggingface/tokenizers/pull/1494\r\n* Remove 3.13 (potential undefined behavior.) by @Narsil in https://github.com/huggingface/tokenizers/pull/1497\r\n* Bumping all versions 3 times (ty transformers :) ) by @Narsil in https://github.com/huggingface/tokenizers/pull/1498\r\n* Fixing doc. by @Narsil in https://github.com/huggingface/tokenizers/pull/1499\r\n\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.2...v0.19.0","publishedAt":"2024-04-17T08:51:36.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.19.0","media":[]},{"id":"rel_QDjzMr1WDUQwfK83fUtbh","version":"v0.19.0rc0","title":"v0.19.0rc0","summary":"Bumping 3 versions because of this: https://github.com/huggingface/transformers/blob/60dea593edd0b94ee15dc3917900b26e3acfbbee/setup.py#L177\r\n\r\n## What...","content":"Bumping 3 versions because of this: https://github.com/huggingface/transformers/blob/60dea593edd0b94ee15dc3917900b26e3acfbbee/setup.py#L177\r\n\r\n## What's Changed\r\n* chore: Remove CLI - this was originally intended for local development by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1442\r\n* [`remove black`] And use ruff by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1436\r\n* Bump ip from 2.0.0 to 2.0.1 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1456\r\n* Added ability to inspect a 'Sequence' decoder and the `AddedVocabulary`. by @eaplatanios in https://github.com/huggingface/tokenizers/pull/1443\r\n* 🚨🚨 BREAKING CHANGE 🚨🚨:  (add_prefix_space dropped everything is using prepend_scheme enum instead) Refactor metaspace by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1476\r\n* Add more support for tiktoken based tokenizers by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1493\r\n* PyO3 0.21. by @Narsil in https://github.com/huggingface/tokenizers/pull/1494\r\n* Remove 3.13 (potential undefined behavior.) by @Narsil in https://github.com/huggingface/tokenizers/pull/1497\r\n* Bumping all versions 3 times (ty transformers :) ) by @Narsil in https://github.com/huggingface/tokenizers/pull/1498\r\n\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.2...v0.19.0rc0","publishedAt":"2024-04-16T14:06:36.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.19.0rc0","media":[]},{"id":"rel_aawdLdeRfkHx1JAJOHhsx","version":"v0.15.2","title":"v0.15.2","summary":"## What's Changed\r\nBig shoutout to @rlrs for [the fast replace normalizers](https://github.com/huggingface/tokenizers/pull/1413) PR. This boosts the p...","content":"## What's Changed\r\nBig shoutout to @rlrs for [the fast replace normalizers](https://github.com/huggingface/tokenizers/pull/1413) PR. This boosts the performances of the tokenizers: \r\n![image](https://github.com/huggingface/tokenizers/assets/48595927/d8ee81b1-6d92-43d4-b74c-8775727763e3)\r\n\r\n* chore: Update dependencies to latest supported versions by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1441\r\n* Convert word counts to u64 by @stephenroller in https://github.com/huggingface/tokenizers/pull/1433\r\n* Efficient Replace normalizer by @rlrs in https://github.com/huggingface/tokenizers/pull/1413\r\n\r\n## New Contributors\r\n* @bryantbiggs made their first contribution in https://github.com/huggingface/tokenizers/pull/1441\r\n* @stephenroller made their first contribution in https://github.com/huggingface/tokenizers/pull/1433\r\n* @rlrs made their first contribution in https://github.com/huggingface/tokenizers/pull/1413\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.1...v0.15.2rc1","publishedAt":"2024-02-12T02:35:06.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.15.2","media":[]},{"id":"rel_euKDR8sqPu8OMch2d7FFa","version":"v0.15.1","title":"v0.15.1","summary":"## What's Changed\r\n* udpate to version = \"0.15.1-dev0\" by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390\r\n* Derive `Clone` on `T...","content":"## What's Changed\r\n* udpate to version = \"0.15.1-dev0\" by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390\r\n* Derive `Clone` on `Tokenizer`, add `Encoding.into_tokens()` method by @epwalsh in https://github.com/huggingface/tokenizers/pull/1381\r\n* Stale bot. by @Narsil in https://github.com/huggingface/tokenizers/pull/1404\r\n* Fix doc links in readme by @Pierrci in https://github.com/huggingface/tokenizers/pull/1367\r\n* Faster HF dataset iteration in docs by @mariosasko in https://github.com/huggingface/tokenizers/pull/1414\r\n* Add quick doc to byte_level.rs by @steventrouble in https://github.com/huggingface/tokenizers/pull/1420\r\n* Fix make bench. by @Narsil in https://github.com/huggingface/tokenizers/pull/1428\r\n* Bump follow-redirects from 1.15.1 to 1.15.4 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1430\r\n* pyo3: update to 0.20 by @mikelui in https://github.com/huggingface/tokenizers/pull/1386\r\n* Encode special tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1437\r\n* Update release for python3.12 windows by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1438\r\n\r\n## New Contributors\r\n* @steventrouble made their first contribution in https://github.com/huggingface/tokenizers/pull/1420\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.1","publishedAt":"2024-01-22T16:49:29.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.15.1","media":[]},{"id":"rel_2lxgdqvQs-e5Ubl0EPshK","version":"v0.15.1.rc0","title":"v0.15.1.rc0","summary":"## What's Changed\r\n* pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322\r\n* Add `expect()` for disabling truncatio...","content":"## What's Changed\r\n* pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322\r\n* Add `expect()` for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316\r\n* Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328\r\n* Reduce number of different revisions by 1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1329\r\n* Python 38 arm by @Narsil in https://github.com/huggingface/tokenizers/pull/1330\r\n* Move to maturing mimicking move for `safetensors`. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331\r\n* Updating the docs with the new command. by @Narsil in https://github.com/huggingface/tokenizers/pull/1333\r\n* Update added tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1335\r\n* update package version for dev by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1339\r\n* Added ability to inspect a 'Sequence' pre-tokenizer. by @eaplatanios in https://github.com/huggingface/tokenizers/pull/1341\r\n* Let's allow hf_hub < 1.0 by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1344\r\n* Fixing the progressbar. by @Narsil in https://github.com/huggingface/tokenizers/pull/1353\r\n* Preparing release. by @Narsil in https://github.com/huggingface/tokenizers/pull/1355\r\n* fix a clerical error  in the comment by @tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356\r\n* fix: remove useless token by @rtrompier in https://github.com/huggingface/tokenizers/pull/1371\r\n* Bump @babel/traverse from 7.22.11 to 7.23.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1370\r\n* Allow hf_hub 0.18 by @mariosasko in https://github.com/huggingface/tokenizers/pull/1383\r\n* Allow `huggingface_hub<1.0` by @Wauplin in https://github.com/huggingface/tokenizers/pull/1385\r\n* [`pre_tokenizers`] Fix sentencepiece based Metaspace by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1357\r\n* udpate to version = \"0.15.1-dev0\" by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390\r\n* Derive `Clone` on `Tokenizer`, add `Encoding.into_tokens()` method by @epwalsh in https://github.com/huggingface/tokenizers/pull/1381\r\n* Stale bot. by @Narsil in https://github.com/huggingface/tokenizers/pull/1404\r\n* Fix doc links in readme by @Pierrci in https://github.com/huggingface/tokenizers/pull/1367\r\n* Faster HF dataset iteration in docs by @mariosasko in https://github.com/huggingface/tokenizers/pull/1414\r\n* Add quick doc to byte_level.rs by @steventrouble in https://github.com/huggingface/tokenizers/pull/1420\r\n* Fix make bench. by @Narsil in https://github.com/huggingface/tokenizers/pull/1428\r\n* Bump follow-redirects from 1.15.1 to 1.15.4 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1430\r\n* pyo3: update to 0.20 by @mikelui in https://github.com/huggingface/tokenizers/pull/1386\r\n\r\n## New Contributors\r\n* @mikelui made their first contribution in https://github.com/huggingface/tokenizers/pull/1322\r\n* @eaplatanios made their first contribution in https://github.com/huggingface/tokenizers/pull/1341\r\n* @tiandiweizun made their first contribution in https://github.com/huggingface/tokenizers/pull/1356\r\n* @rtrompier made their first contribution in https://github.com/huggingface/tokenizers/pull/1371\r\n* @mariosasko made their first contribution in https://github.com/huggingface/tokenizers/pull/1383\r\n* @Wauplin made their first contribution in https://github.com/huggingface/tokenizers/pull/1385\r\n* @steventrouble made their first contribution in https://github.com/huggingface/tokenizers/pull/1420\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.15.1.rc0","publishedAt":"2024-01-18T16:34:03.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.15.1.rc0","media":[]},{"id":"rel_abtUVMQ-qO6hcM2aqUEtv","version":"v0.15.0","title":"v0.15.0","summary":"## What's Changed\r\n* fix a clerical error  in the comment by @tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356\r\n* fix: remove usele...","content":"## What's Changed\r\n* fix a clerical error  in the comment by @tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356\r\n* fix: remove useless token by @rtrompier in https://github.com/huggingface/tokenizers/pull/1371\r\n* Bump @babel/traverse from 7.22.11 to 7.23.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1370\r\n* Allow hf_hub 0.18 by @mariosasko in https://github.com/huggingface/tokenizers/pull/1383\r\n* Allow `huggingface_hub<1.0` by @Wauplin in https://github.com/huggingface/tokenizers/pull/1385\r\n* [`pre_tokenizers`] Fix sentencepiece based Metaspace by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1357\r\n\r\n## New Contributors\r\n* @tiandiweizun made their first contribution in https://github.com/huggingface/tokenizers/pull/1356\r\n* @rtrompier made their first contribution in https://github.com/huggingface/tokenizers/pull/1371\r\n* @mariosasko made their first contribution in https://github.com/huggingface/tokenizers/pull/1383\r\n* @Wauplin made their first contribution in https://github.com/huggingface/tokenizers/pull/1385\r\n\r\n**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.14.1...v0.15.0","publishedAt":"2023-11-14T19:06:30.000Z","url":"https://github.com/huggingface/tokenizers/releases/tag/v0.15.0","media":[]}],"pagination":{"page":1,"pageSize":20,"totalPages":5,"totalItems":100},"summaries":{"rolling":null,"monthly":[]}}