{"id":"src_YM5oUOL-MSWOWf517en37","slug":"text-embeddings-inference","name":"Text Embeddings Inference","type":"github","url":"https://github.com/huggingface/text-embeddings-inference","orgId":"org_GDdYeYynEgCEBNBwy-m6s","org":{"slug":"hugging-face","name":"Hugging Face"},"isPrimary":false,"metadata":"{\"evaluatedMethod\":\"github\",\"evaluatedAt\":\"2026-04-07T17:19:19.545Z\",\"changelogDetectedAt\":\"2026-04-07T17:28:46.634Z\"}","releaseCount":33,"releasesLast30Days":1,"avgReleasesPerWeek":0.3,"latestVersion":"v1.9.3","latestDate":"2026-03-23T11:57:19.000Z","changelogUrl":null,"hasChangelogFile":false,"lastFetchedAt":"2026-04-18T14:05:02.409Z","trackingSince":"2023-10-13T13:46:09.000Z","releases":[{"id":"rel_ZjYO-ZfPKrOPwOZSmU-m7","version":"v1.9.3","title":"v1.9.3","summary":"## What's Changed\r\n* Use `rust-toolchain.toml` before `rustup` on `Dockerfile-{cuda,cuda-all}` by @alvarobartt in https://github.com/huggingface/text-...","content":"## What's Changed\r\n* Use `rust-toolchain.toml` before `rustup` on `Dockerfile-{cuda,cuda-all}` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/842\r\n* fix(backend): replace bare except with Exception in device check by @llukito in https://github.com/huggingface/text-embeddings-inference/pull/821\r\n* Set `version` 1.9.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/849\r\n\r\n## New Contributors\r\n* @llukito made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/821\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.9.2...v1.9.3","publishedAt":"2026-03-23T11:57:19.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.9.3","media":[]},{"id":"rel_u5i8Y5-E1Ty1fTgVeHnUB","version":"v1.9.2","title":"v1.9.2","summary":"## What's Changed\r\n\r\n* Fix auto-truncate false setting by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/836\r\n* Set `pad_to...","content":"## What's Changed\r\n\r\n* Fix auto-truncate false setting by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/836\r\n* Set `pad_token_id` as nullable & add support for `rope_parameters` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/832\r\n* docs: add Homebrew installation to README by @Peredery in https://github.com/huggingface/text-embeddings-inference/pull/834\r\n* feat: support pplx-embed-v1 by @mkrimmel-pplx in https://github.com/huggingface/text-embeddings-inference/pull/824\r\n\r\n## New Contributors\r\n* @Peredery made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/834\r\n* @mkrimmel-pplx made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/824\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.9.1...v1.9.2","publishedAt":"2026-02-25T11:17:59.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.9.2","media":[]},{"id":"rel_jBoNJNG0GwO7Qj50CRnpr","version":"v1.9.1","title":"v1.9.1","summary":"## What's Changed\r\n\r\n### 🚨 Fix\r\n\r\n* Fix support for containers w/ CUDA 13.0+ by @alvarobartt in https://github.com/huggingface/text-embeddings-infere...","content":"## What's Changed\r\n\r\n### 🚨 Fix\r\n\r\n* Fix support for containers w/ CUDA 13.0+ by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/831\r\n> When releasing ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 with CUDA 12.9 and `cuda-compat-12-9` there was an issue when running that same container on instances with CUDA 13.0+, as the `cuda-compat-12-9` set in `LD_LIBRARY_PATH` was leading to a `CUDA_ERROR_SYSTEM_DRIVER_MISMATCH = 803`, which is now solved with a custom entrypoint that dynamically includes the `cuda-compat` on the `LD_LIBRARY_PATH` depending on the instance CUDA version.\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.9.0...v1.9.1","publishedAt":"2026-02-17T20:59:31.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.9.1","media":[]},{"id":"rel_Y8D6Bpmw3UbyRwtU5a0H4","version":"v1.9.0","title":"v1.9.0","summary":"<img width=\"1800\" height=\"972\" alt=\"text-embeddings-inference-v1 9 0\" src=\"https://github.com/user-attachments/assets/fe3751d1-1a3a-4b1f-8cf5-5c2326c1...","content":"<img width=\"1800\" height=\"972\" alt=\"text-embeddings-inference-v1 9 0\" src=\"https://github.com/user-attachments/assets/fe3751d1-1a3a-4b1f-8cf5-5c2326c14a62\" />\r\n\r\n## What's changed?\r\n\r\n### 🚨 Breaking changes\r\n\r\n* Default `HiddenAct::Gelu` to GeLU + tanh in favour of GeLU erf  by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/753\r\n\r\n> Default GeLU implementation is now GeLU + tanh approximation instead of exact GeLU (aka. GeLU erf) to make sure that the CPU and CUDA embeddings are the same (as cuBLASlt only supports GeLU + tanh), which represents a slight misalignment from how Transformers handles it, as when `hidden_act=\"gelu\"` is set in `config.json`, GeLU erf should be used. The numerical differences between GeLU + tanh and GeLU erf should have negligible impact on inference quality.\r\n\r\n* Set `--auto-truncate` to `true` by default by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/829\r\n\r\n> `--auto-truncate` now defaults to true, meaning that the sequences will be truncated to the lower value between the `--max-batch-tokens` or the maximum model length, to prevent the `--max-batch-tokens` from being lower than the actual maximum supported length.\r\n\r\n### 🎉 Additions\r\n\r\n* Add `--served-model-name` for OpenAI requests via HTTP by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/685\r\n* Extend `download_onnx` to download sharded ONNX by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/817\r\n* Add support for llama 2 by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/802\r\n* Add support for blackwell architecture (sm100, sm120) by @danielealbano in https://github.com/huggingface/text-embeddings-inference/pull/735\r\n* Mf/add-support-for-llama-3-and-nemotron by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/805\r\n* Add support for DebertaV2 by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/746\r\n* Add bidirectional attention and projection layer support for Qwen3-based models by @williambarberjr in https://github.com/huggingface/text-embeddings-inference/pull/808\r\n\r\n### 🐛 Fixes\r\n\r\n* Fix reading non-standard config for `past_key_values` in ONNX by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/751\r\n* Fix `TruncationDirection` to deserialize from lowercase and capitalized by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/755\r\n* Fix `sagemaker-entrypoint*` & remove SageMaker and Vertex from `Dockerfile*` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/699\r\n* Bug: Critical accuracy bugs for model_type=qwen2: no causal attention and wrong tokenizer by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/762\r\n* Fix `config.json` reading w/ aliases for ORT by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/786\r\n* Fix HTTP error code for validation by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/818\r\n* Fix to acquire the permit in a blocking way by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/726\r\n* Read Hugging Face Hub token from cache if not provided by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/814\r\n* Align the `normalize` param between the gRPC and HTTP /embed interfaces by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/810\r\n\r\n### ⚡ Improvements\r\n\r\n* Serialization in tokio thread instead of blocking thread, 50% reduction in latency for small models by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/767\r\n* Remove default `--model-id` argument by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/679\r\n* feat: better Tokenization # workers heuristic by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/766\r\n* add faster index select kernel by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/773\r\n* feat: speedup Parallel safetensors download by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/765\r\n* feat: startup time: add cloned tokenzier fix, saves ~1-20s cold start time by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/772\r\n* Adjust the warmup phase for CPU by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/792\r\n\r\n### 📄 Other\r\n\r\n* Skip Gemma3 tests when `HF_TOKEN` not set by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/812\r\n* Bump Rust 1.92, CUDA 12.6, Ubuntu 24.04 and add `Dockerfile-cuda-blackwell-all` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/823\r\n* Update `rustc` version to 1.92.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/826\r\n* Add `use_flash_attn` for better FA + FA2 feature gating by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/825\r\n* Update CUDA to 12.9 w/ `cuda-compat-12-9` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/828\r\n* Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/huggingface/text-embeddings-inference/pull/782\r\n* Lint: cargo fmt and clippy fix warnings by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/776\r\n* Fix `rustfmt` on `backend/candle/tests/*.rs` files by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/800\r\n* Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/huggingface/text-embeddings-inference/pull/783\r\n* Update `version` to 1.9.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/830\r\n\r\n## 🆕 New Contributors\r\n* @salmanmkc made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/782\r\n* @danielealbano made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/735\r\n* @williambarberjr made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/808\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.3...v1.9.0","publishedAt":"2026-02-17T13:42:14.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.9.0","media":[]},{"id":"rel_Nw3le7FfL8wAm4wA6OGOY","version":"v1.8.3","title":"v1.8.3","summary":"## What's Changed\r\n\r\n### Bug Fixes\r\n\r\n* Fix error code for empty requests by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull...","content":"## What's Changed\r\n\r\n### Bug Fixes\r\n\r\n* Fix error code for empty requests by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/727\r\n* Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/725\r\n* Fix reading `modules.json` for `Dense` modules in local models by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/738\r\n\r\n### Tests, Documentation & Release\r\n\r\n* Add `test_gemma3.rs` for EmbeddingGemma by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/718\r\n* Fix OpenAI client usage example for embeddings by @ZahraDehghani99 in https://github.com/huggingface/text-embeddings-inference/pull/720\r\n* Handle `HF_TOKEN` in `ApiBuilder` for `candle/tests` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/724\r\n* Fix `cargo install` commands for `candle` with CUDA by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/719\r\n* Update `version` to 1.8.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/745\r\n\r\n## New Contributors\r\n* @ZahraDehghani99 made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/720\r\n* @vrdn-23 made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/727\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.2...v1.8.3","publishedAt":"2025-10-30T09:08:18.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.8.3","media":[]},{"id":"rel_DtdRMbQKjMithLTXCaRWm","version":"v1.8.2","title":"v1.8.2","summary":"## 🔧 Fixed Intel MKL Support\r\n\r\nSince Text Embeddings Inference (TEI) v1.7.0, Intel MKL support had been broken due to changes in the `candle` depend...","content":"## 🔧 Fixed Intel MKL Support\r\n\r\nSince Text Embeddings Inference (TEI) v1.7.0, Intel MKL support had been broken due to changes in the `candle` dependency. Neither `static-linking` nor `dynamic-linking` worked correctly, which caused models using Intel MKL on CPU to fail with errors such as:  \"Intel oneMKL ERROR: Parameter 13 was incorrect on entry to SGEMM\".\r\n\r\nStarting with v1.8.2, this issue has been resolved by fixing how the `intel-mkl-src` dependency is defined. Both features, `static-linking` and `dynamic-linking` (the default), now work correctly, ensuring that Intel MKL libraries are properly linked.\r\n\r\nThis issue occurred in the following scenarios:\r\n- Users installing `text-embeddings-router` via `cargo` with the `--feature mkl` flag. Although `dynamic-linking` should have been used, it was not working as intended.\r\n- Users relying on the CPU `Dockerfile` when running models without ONNX weights. In these cases, Safetensors weights were used with `candle` as backend (with MKL optimizations), instead of `ort`.\r\n\r\nThe following table shows the affected versions and containers:\r\n\r\n| Version | Image |\r\n|---------|-------|\r\n| 1.7.0   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.0` |\r\n| 1.7.1   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.1` |\r\n| 1.7.2   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2` |\r\n| 1.7.3   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.3` |\r\n| 1.7.4   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.4` |\r\n| 1.8.0   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.0` |\r\n| 1.8.1   | `ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1` |\r\n\r\nMore details: [PR #715](https://github.com/huggingface/text-embeddings-inference/pull/715)\r\n\r\n**Full Changelog**: [v1.8.1...v1.8.2](https://github.com/huggingface/text-embeddings-inference/compare/v1.8.1...v1.8.2)","publishedAt":"2025-09-09T14:45:29.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.8.2","media":[]},{"id":"rel_BwxGl_zBfEgFELWANCF50","version":"v1.8.1","title":"v1.8.1","summary":"<img width=\"1200\" height=\"648\" alt=\"text-embeddings-inference-v1 8 1-embedding-gemma(1)\" src=\"https://github.com/user-attachments/assets/8ad8fb64-cee4...","content":"<img width=\"1200\" height=\"648\" alt=\"text-embeddings-inference-v1 8 1-embedding-gemma(1)\" src=\"https://github.com/user-attachments/assets/8ad8fb64-cee4-409f-8488-1d10f5ffe995\" />\r\n\r\nToday, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.\r\n\r\n- CPU:\r\n\r\n```bash\r\ndocker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \\\r\n    --model-id google/embeddinggemma-300m --dtype float32\r\n```\r\n\r\n- CPU with ONNX Runtime:\r\n\r\n```bash\r\ndocker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \\\r\n    --model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean\r\n```\r\n\r\n- NVIDIA CUDA:\r\n\r\n```bash\r\ndocker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \\\r\n    --model-id google/embeddinggemma-300m --dtype float32\r\n```\r\n\r\n## Notable Changes\r\n\r\n* Add support for Gemma3 (text-only) architecture\r\n* Intel updates to Synapse 1.21.3 and IPEX 2.8\r\n* Extend ONNX Runtime support in `OrtRuntime`\r\n    * Support `position_ids` and `past_key_values` as inputs\r\n    * Handle `padding_side` and `pad_token_id`\r\n\r\n## What's Changed\r\n\r\n* Adjust HPU warmup: use dummy inputs with shape more close to real scenario  by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/689\r\n* Add `extra_args` to `trufflehog` to exclude unverified results by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/696\r\n* Update GitHub templates & fix mentions to Text Embeddings Inference by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/697\r\n* Disable Flash Attention with `USE_FLASH_ATTENTION` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/692\r\n* Add support for `position_ids` and `past_key_values` in `OrtBackend` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/700\r\n* HPU upgrade to Synapse 1.21.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/703\r\n* Upgrade to IPEX 2.8 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/702\r\n* Parse `modules.json` to identify default `Dense` modules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/701\r\n* Add `padding_side` and `pad_token_id` in `OrtBackend` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/705\r\n* Update `docs/openapi.json` for v1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/708\r\n* Add Gemma3 architecture (text-only) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/711\r\n* Update `version` to 1.8.1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/712\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.0...v1.8.1","publishedAt":"2025-09-04T15:22:14.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.8.1","media":[]},{"id":"rel_hK3Gxqm8pCSBYATJ1Wdsa","version":"v1.8.0","title":"v1.8.0","summary":"<img width=\"3600\" height=\"1944\" alt=\"text-embeddings-inference-v1 8 0(2)\" src=\"https://github.com/user-attachments/assets/50df05b6-3821-4e2a-8de0-3e5c...","content":"<img width=\"3600\" height=\"1944\" alt=\"text-embeddings-inference-v1 8 0(2)\" src=\"https://github.com/user-attachments/assets/50df05b6-3821-4e2a-8de0-3e5c911b2a27\" />\r\n\r\n## Notable Changes\r\n\r\n- Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs\r\n- NomicBert MoE support\r\n- JinaAI Re-Rankers V1 support\r\n- Matryoshka Representation Learning (MRL)\r\n- Dense layer module support (after pooling)\r\n\r\n> [!NOTE]\r\n> Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.\r\n\r\n## What's Changed\r\n\r\n* [Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574\r\n* Update `README.md` and `supported_models.md` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572\r\n* Back with linting. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/577\r\n* [Docs] Add cloud run example by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/573\r\n* Fixup by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/578\r\n* Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/576\r\n* Removing requirements file. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/585\r\n* Removing candle-extensions to live on crates.io by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/583\r\n* Bump `sccache` to 0.10.0 and `sccache-action` to 0.0.9 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/586\r\n* optimize the performance of FlashBert Path for HPU by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/575\r\n* Revert \"Removing requirements file. (#585)\" by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/588\r\n* Get opentelemetry trace id from request headers by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/425\r\n* Add argument for configuring Prometheus port by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/589\r\n* Adding missing `head.` prefix in the weight name in `ModernBertClassificationHead` by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/591\r\n* Fixing the CI (grpc path). by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/593\r\n* fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/595\r\n* enable flash mistral model for HPU device by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/594\r\n* remove optimum-habana dependency by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/599\r\n* Support NomicBert MoE by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/596\r\n* Remove duplicate short option '-p' to fix router executable by @cebtenzzre in https://github.com/huggingface/text-embeddings-inference/pull/602\r\n* Update `text-embeddings-router --help` output by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/603\r\n* Warmup padded models too. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/592\r\n* Add support for JinaAI Re-Rankers V1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/582\r\n* Gte diffs by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/604\r\n* Fix the weight name in GTEClassificationHead by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/606\r\n* upgrade pytorch and ipex to 2.7 version by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/607\r\n* upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/608\r\n* Patch DistilBERT variants with different weight keys by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/614\r\n* add offline modeling for model `jinaai/jina-embeddings-v2-base-code` to avoid `auto_map` to other repository by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/612\r\n* Add mean pooling strategy for Modernbert classifier by @kwnath in https://github.com/huggingface/text-embeddings-inference/pull/616\r\n* Using serde for pool validation. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/620\r\n* Preparing the update to 1.7.1 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/623\r\n* Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624\r\n* Add `Qwen3Model` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/627\r\n* Add `HiddenAct::Silu` (remove `serde` alias) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/631\r\n* Add CPU support for Qwen3-Embedding models by @randomm in https://github.com/huggingface/text-embeddings-inference/pull/632\r\n* refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/625\r\n* Support Qwen3 w/ fp32 on GPU by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/634\r\n* Preparing the release. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/639\r\n* Default to Qwen3 in `README.md` and `docs/` examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641\r\n* Fix Qwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/646\r\n* Add integration tests for Gaudi by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/598\r\n* Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in https://github.com/huggingface/text-embeddings-inference/pull/648\r\n* Fix FlashQwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/650\r\n* Make flake work on metal by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/654\r\n* Fixing metal backend. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/655\r\n* Qwen3 hpu support by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/656\r\n* change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/659\r\n* Update `version` to 1.7.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/666\r\n* Add last token pooling support for ORT. by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/664\r\n* Fix Qwen3 Embedding Float16 DType by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/663\r\n* Fix `fmt` by re-running `pre-commit` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/671\r\n* Update `version` to 1.7.4 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/677\r\n* Support MRL (Matryoshka Representation Learning) by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/676\r\n* Add `Dense` layer for `2_Dense/` modules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/660\r\n* Update `version` to 1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/686\r\n\r\n## New Contributors\r\n* @NielsRogge made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/574\r\n* @cebtenzzre made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/602\r\n* @kwnath made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/616\r\n* @randomm made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/632\r\n* @lance-miles made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/648\r\n* @tpendragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/664\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.0...v1.8.0","publishedAt":"2025-08-05T08:31:22.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.8.0","media":[]},{"id":"rel_luwUHLwo6gD4f51Qq-akB","version":"v1.7.4","title":"v1.7.4","summary":"## Noticeable Changes\r\n\r\nQwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downca...","content":"## Noticeable Changes\r\n\r\nQwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to `null` values, as well as a missing `to_dtype` call on the `attention_bias` when working with batches.\r\n\r\n## What's Changed\r\n\r\n* Fix Qwen3 Embedding Float16 DType by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/663\r\n* Fix `fmt` by re-running `pre-commit` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/671\r\n* Update `version` to 1.7.4 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/677\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.3...v1.7.4","publishedAt":"2025-07-07T12:33:34.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.7.4","media":[]},{"id":"rel_lSE1FYm8XcqJNJ65WK6UW","version":"v1.7.3","title":"v1.7.3","summary":"## Noticeable Changes\r\n\r\nQwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.\r\n\r\n## What's Changed\r\n\r\n* Default to Qwen3 in `README...","content":"## Noticeable Changes\r\n\r\nQwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.\r\n\r\n## What's Changed\r\n\r\n* Default to Qwen3 in `README.md` and `docs/` examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641\r\n* Fix Qwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/646\r\n* Add integration tests for Gaudi by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/598\r\n* Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in https://github.com/huggingface/text-embeddings-inference/pull/648\r\n* Fix FlashQwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/650\r\n* Make flake work on metal by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/654\r\n* Fixing metal backend. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/655\r\n* Qwen3 hpu support by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/656\r\n* change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/659\r\n* Update `version` to 1.7.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/666\r\n* Add last token pooling support for ORT. by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/664\r\n\r\n## New Contributors\r\n\r\n* @lance-miles made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/648\r\n* @tpendragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/664\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.2...v1.7.3","publishedAt":"2025-06-30T10:54:30.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.7.3","media":[]},{"id":"rel_5d_RnBqUCE_TD7jQgQ8eK","version":"v1.7.2","title":"v1.7.2","summary":"## Notable change\r\n\r\n* Added support for Qwen3 embeddigns\r\n\r\n## What's Changed\r\n* Adding suggestions to fixing missing ONNX files. by @Narsil in https...","content":"## Notable change\r\n\r\n* Added support for Qwen3 embeddigns\r\n\r\n## What's Changed\r\n* Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624\r\n* Add `Qwen3Model` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/627\r\n* Add `HiddenAct::Silu` (remove `serde` alias) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/631\r\n* Add CPU support for Qwen3-Embedding models by @randomm in https://github.com/huggingface/text-embeddings-inference/pull/632\r\n* refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/625\r\n* Support Qwen3 w/ fp32 on GPU by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/634\r\n* Preparing the release. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/639\r\n\r\n## New Contributors\r\n* @randomm made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/632\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.1...v1.7.2","publishedAt":"2025-06-16T06:44:57.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.7.2","media":[]},{"id":"rel_C0_iE5ArfCPAYk8sIdB-Z","version":"v1.7.1","title":"v1.7.1","summary":"## What's Changed\r\n* [Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574\r\n* Update `README.md...","content":"## What's Changed\r\n* [Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574\r\n* Update `README.md` and `supported_models.md` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572\r\n* Back with linting. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/577\r\n* [Docs] Add cloud run example by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/573\r\n* Fixup by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/578\r\n* Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/576\r\n* Removing requirements file. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/585\r\n* Removing candle-extensions to live on crates.io by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/583\r\n* Bump `sccache` to 0.10.0 and `sccache-action` to 0.0.9 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/586\r\n* optimize the performance of FlashBert Path for HPU by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/575\r\n* Revert \"Removing requirements file. (#585)\" by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/588\r\n* Get opentelemetry trace id from request headers by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/425\r\n* Add argument for configuring Prometheus port by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/589\r\n* Adding missing `head.` prefix in the weight name in `ModernBertClassificationHead` by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/591\r\n* Fixing the CI (grpc path). by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/593\r\n* fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/595\r\n* enable flash mistral model for HPU device by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/594\r\n* remove optimum-habana dependency by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/599\r\n* Support NomicBert MoE by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/596\r\n* Remove duplicate short option '-p' to fix router executable by @cebtenzzre in https://github.com/huggingface/text-embeddings-inference/pull/602\r\n* Update `text-embeddings-router --help` output by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/603\r\n* Warmup padded models too. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/592\r\n* Add support for JinaAI Re-Rankers V1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/582\r\n* Gte diffs by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/604\r\n* Fix the weight name in GTEClassificationHead by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/606\r\n* upgrade pytorch and ipex to 2.7 version by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/607\r\n* upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/608\r\n* Patch DistilBERT variants with different weight keys by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/614\r\n* add offline modeling for model `jinaai/jina-embeddings-v2-base-code` to avoid `auto_map` to other repository by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/612\r\n* Add mean pooling strategy for Modernbert classifier by @kwnath in https://github.com/huggingface/text-embeddings-inference/pull/616\r\n* Using serde for pool validation. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/620\r\n* Preparing the update to 1.7.1 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/623\r\n\r\n## New Contributors\r\n* @NielsRogge made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/574\r\n* @cebtenzzre made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/602\r\n* @kwnath made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/616\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.0...v1.7.1","publishedAt":"2025-06-03T13:38:50.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.7.1","media":[]},{"id":"rel_JTPHHEXKWGom-jM0Du36S","version":"v1.7.0","title":"v1.7.0","summary":"## Notable changes\r\n\r\n- Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)\r\n- Added ModernBert support by @kozistr  !\r\n\r\n## What's Changed\r\n...","content":"## Notable changes\r\n\r\n- Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)\r\n- Added ModernBert support by @kozistr  !\r\n\r\n## What's Changed\r\n* Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/542\r\n* Upgrade candle2 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/543\r\n* Upgrade candle3 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/545\r\n* Fixing the static-linking. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/547\r\n* Fix linking bis by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/549\r\n* Make `sliding_window` for `Qwen2` optional by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/546\r\n* Optimize the performance of FlashBert on HPU by using fast mode softmax by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/555\r\n* Fixing cudarc to the latest unified bindings. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/558\r\n* Fix typos / formatting in CLI args in Markdown files by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/552\r\n* Use custom `serde` deserializer for JinaBERT models by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/559\r\n* Implement the `ModernBert` model by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/459\r\n* Fixing FlashAttention ModernBert. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/560\r\n* Enable ModernBert on metal by @ivarflakstad in https://github.com/huggingface/text-embeddings-inference/pull/562\r\n* Fix `{Bert,DistilBert}SpladeHead` when loading from Safetensors by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/564\r\n* add related docs for intel cpu/xpu/hpu container by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/550\r\n* Update the doc for submodule. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/567\r\n* Update `docs/source/en/custom_container.md` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/568\r\n* Preparing for release 1.7.0 (candle update + modernbert). by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/570\r\n\r\n## New Contributors\r\n* @ivarflakstad made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/562\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.6.1...v1.7.0","publishedAt":"2025-04-08T11:54:09.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.7.0","media":[]},{"id":"rel_RQG5phCy-kuGunIc-SBFB","version":"v1.6.1","title":"v1.6.1","summary":"## What's Changed\r\n* Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in https://github.com/huggingface/text-embeddings-inference/pu...","content":"## What's Changed\r\n* Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in https://github.com/huggingface/text-embeddings-inference/pull/245\r\n* add reranker model support for python backend by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/386\r\n* (FIX): CI Security Fix - branchname injection by @glegendre01 in https://github.com/huggingface/text-embeddings-inference/pull/479\r\n* Upgrade TEI. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/501\r\n* Pin `cargo-chef` installation to 0.1.62 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/469\r\n* add `TRUST_REMOTE_CODE` param to python backend. by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/485\r\n* Enable splade embeddings for Python backend by @pi314ever in https://github.com/huggingface/text-embeddings-inference/pull/493\r\n* Hpu bucketing by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/489\r\n* Optimize flash bert path for hpu device by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/509\r\n* upgrade ipex to 2.6 version for cpu/xpu  by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/510\r\n* fix bug for `MaskedLanguageModel` class` by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/513\r\n* Fix double incrementing `te_request_count` metric by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/486\r\n* Add intel based images to the CI by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/518\r\n* Fix typo on intel docker image by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/529\r\n* chore: Upgrade to tokenizers 0.21.0 by @lightsofapollo in https://github.com/huggingface/text-embeddings-inference/pull/512\r\n* feat: add support for \"model_type\": \"gte\" by @anton-pt in https://github.com/huggingface/text-embeddings-inference/pull/519\r\n* Update `README.md` to include ONNX by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/507\r\n* Fusing both Gte Configs. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/530\r\n* Add `HF_HUB_USER_AGENT_ORIGIN` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/534\r\n* Use `--hf-token` instead of `--hf-api-token` by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/535\r\n* Fixing the tests. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/531\r\n* Support classification head for DistilBERT by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/487\r\n* add CLI flag `disable-spans` to toggle span trace logging by @obloomfield in https://github.com/huggingface/text-embeddings-inference/pull/481\r\n* feat: support HF_ENDPOINT environment when downloading model by @StrayDragon in https://github.com/huggingface/text-embeddings-inference/pull/505\r\n* Small fixup. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/537\r\n* Fix `VarBuilder` handling in GTE e.g. `gte-multilingual-reranker-base` by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/538\r\n* make a WA in case Bert model do not have `safetensor` file by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/515\r\n* Add missing `match` on `onnx/model.onnx` download by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/472\r\n* Fixing the impure flake devShell to be able to run python code. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/539\r\n* Prepare for release. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/540\r\n\r\n## New Contributors\r\n* @yuanwu2017 made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/245\r\n* @kaixuanliu made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/386\r\n* @Narsil made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/501\r\n* @pi314ever made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/493\r\n* @baptistecolle made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/518\r\n* @lightsofapollo made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/512\r\n* @anton-pt made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/519\r\n* @obloomfield made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/481\r\n* @StrayDragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/505\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.6.0...v1.6.1","publishedAt":"2025-03-28T08:47:18.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.6.1","media":[]},{"id":"rel_9GIgbGbLN_KOWbm-wcAoq","version":"v1.6.0","title":"v1.6.0","summary":"## What's Changed\r\n* feat: support multiple backends at the same time by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/p...","content":"## What's Changed\r\n* feat: support multiple backends at the same time by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/440\r\n* feat: GTE classification head by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/441\r\n* feat: Implement GTE model to support the non-flash-attn version by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/446\r\n* feat: Implement MPNet model (#363) by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/447\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.5.1...v1.6.0","publishedAt":"2024-12-13T15:52:59.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.6.0","media":[]},{"id":"rel_zAPg2X6k4XJdULvyL9TuZ","version":"v1.5.1","title":"v1.5.1","summary":"## What's Changed\r\n* Download `model.onnx_data` by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/343\r\n* Rename 'Sentence T...","content":"## What's Changed\r\n* Download `model.onnx_data` by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/343\r\n* Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in https://github.com/huggingface/text-embeddings-inference/pull/342\r\n* fix: add serde default for truncation direction by @drbh in https://github.com/huggingface/text-embeddings-inference/pull/399\r\n* fix: metrics unbounded memory by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/409\r\n* Fix to allow health check w/o auth by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/360\r\n* Update `ort` crate version to `2.0.0-rc.4` to support onnx IR version 10 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/361\r\n* adds curl to fix healthcheck by @WissamAntoun in https://github.com/huggingface/text-embeddings-inference/pull/376\r\n* fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/410\r\n* fix: use status code 400 when batch is empty by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/413\r\n* fix: add cls pooling as default for BERT variants by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/426\r\n* feat: auto limit string if truncate is set by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/428\r\n\r\n## New Contributors\r\n* @Wauplin made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/342\r\n* @XciD made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/345\r\n* @WissamAntoun made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/376\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.5.0...v1.5.1","publishedAt":"2024-11-05T15:17:01.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.5.1","media":[]},{"id":"rel_YVNhMQx9AxupyJS47p91Q","version":"v1.5.0","title":"v1.5.0","summary":"## Notable Changes\r\n\r\n- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput\r\n- Add `/similarity` route\r\n\r\n## What's Changed\r\n*...","content":"## Notable Changes\r\n\r\n- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput\r\n- Add `/similarity` route\r\n\r\n## What's Changed\r\n* tokenizer max limit on input size by @ErikKaum in https://github.com/huggingface/text-embeddings-inference/pull/324\r\n* docs: air-gapped deployments by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/326\r\n* feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/328\r\n* feat: add `/similarity` route by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/331\r\n* fix(ort): fix mean pooling by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/332\r\n* chore(candle): update flash attn by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/335\r\n* v1.5.0 by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/336\r\n\r\n## New Contributors\r\n* @ErikKaum made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/324\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.4.0...v1.5.0","publishedAt":"2024-07-10T15:34:40.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.5.0","media":[]},{"id":"rel_4m0GKVH1UOGTOqAEcWj_O","version":"v1.4.0","title":"v1.4.0","summary":"## Notable Changes\r\n\r\n- Cuda support for the Qwen2 model architecture\r\n\r\n## What's Changed\r\n* feat(candle): support Qwen2 on Cuda by @OlivierDehaene i...","content":"## Notable Changes\r\n\r\n- Cuda support for the Qwen2 model architecture\r\n\r\n## What's Changed\r\n* feat(candle): support Qwen2 on Cuda by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/316\r\n* fix(candle): fix last token pooling\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.3.0...v1.4.0","publishedAt":"2024-07-02T15:17:26.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.4.0","media":[]},{"id":"rel__wUHbQcDTkf-SGyr0Ro5o","version":"v1.3.0","title":"v1.3.0","summary":"## Notable changes\r\n\r\n- New truncation direction parameter\r\n- Cuda support for [JinaCode](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) ...","content":"## Notable changes\r\n\r\n- New truncation direction parameter\r\n- Cuda support for [JinaCode](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) model architecture\r\n- Cuda support for [Mistral](https://huggingface.co/Salesforce/SFR-Embedding-2_R) model architecture\r\n- Cuda support for [Alibaba GTE](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model architecture\r\n- New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.\r\n\r\n## What's Changed\r\n* Ci migration to K8s by @glegendre01 in https://github.com/huggingface/text-embeddings-inference/pull/269\r\n* chore: map compute_cap from GPU name by @haixiw in https://github.com/huggingface/text-embeddings-inference/pull/276\r\n* chore: cover Nvidia T4/L4 GPU by @haixiw in https://github.com/huggingface/text-embeddings-inference/pull/284\r\n* feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/text-embeddings-inference/pull/286\r\n* Community contribution code of conduct by @LysandreJik in https://github.com/huggingface/text-embeddings-inference/pull/291\r\n* Update README.md by @michaelfeil in https://github.com/huggingface/text-embeddings-inference/pull/277\r\n* Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in https://github.com/huggingface/text-embeddings-inference/pull/266\r\n* Add env for OTLP service name by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/285\r\n* Fix CI build timeout by @fxmarty in https://github.com/huggingface/text-embeddings-inference/pull/296\r\n* fix(router): payload limit was not correctly applied by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/298\r\n* feat(candle): better cuda error by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/300\r\n* feat(router): add truncation direction parameter by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/299\r\n* Support for Jina Code model by @patricebechard in https://github.com/huggingface/text-embeddings-inference/pull/292\r\n* feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/301\r\n* fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/302\r\n* fix: use malloc_trim to cleanup pages by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/307\r\n* feat(candle): add FlashMistral by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/308\r\n* feat(candle): add flash gte by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/310\r\n* feat: add default prompts by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/312\r\n* Add optional CORS allow any option value in http server cli by @kir-gadjello in https://github.com/huggingface/text-embeddings-inference/pull/260\r\n* Update `HUGGING_FACE_HUB_TOKEN` to `HF_API_TOKEN` in README  by @kevinhu in https://github.com/huggingface/text-embeddings-inference/pull/263\r\n* v1.3.0 by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/313\r\n\r\n## New Contributors\r\n* @haixiw made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/276\r\n* @McPatate made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/286\r\n* @LysandreJik made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/291\r\n* @michaelfeil made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/277\r\n* @scriptator made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/266\r\n* @fxmarty made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/296\r\n* @patricebechard made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/292\r\n* @kir-gadjello made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/260\r\n* @kevinhu made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/263\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.2.3...v1.3.0","publishedAt":"2024-06-28T11:37:18.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.3.0","media":[]},{"id":"rel_yH7bkIzbaOccI19oBwBrf","version":"v1.2.3","title":"v1.2.3","summary":"## What's Changed\r\n\r\n* fix limit peak memory to build cuda-all docker image by @OlivierDehaene in https://github.com/huggingface/text-embeddings-infer...","content":"## What's Changed\r\n\r\n* fix limit peak memory to build cuda-all docker image by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/246\r\n\r\n**Full Changelog**: https://github.com/huggingface/text-embeddings-inference/compare/v1.2.2...v1.2.3","publishedAt":"2024-04-25T08:48:17.000Z","url":"https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.2.3","media":[]}],"pagination":{"page":1,"pageSize":20,"totalPages":2,"totalItems":33},"summaries":{"rolling":{"windowDays":90,"summary":"Text Embeddings Inference shifted toward better container compatibility and model coverage. v1.9.0 made breaking changes to unify CPU and CUDA GeLU behavior by defaulting to the tanh approximation instead of exact GeLU, and flipped `--auto-truncate` to true by default. Subsequent releases fixed CUDA version mismatch errors in container deployments and added rope parameter support alongside backing for Perplexity's embedding models, while refinements addressed edge cases like nullable pad tokens and truncation setting persistence.","releaseCount":4,"generatedAt":"2026-04-07T17:28:49.245Z"},"monthly":[{"year":2026,"month":3,"summary":"Fixed Docker build toolchain resolution and improved error handling in device detection. The release standardized Rust toolchain specification in CUDA Dockerfiles and replaced bare exception catching with explicit Exception handling.","releaseCount":1,"generatedAt":"2026-04-07T17:28:51.670Z"}]}}