v1.8.0
<img width="3600" height="1944" alt="text-embeddings-inference-v1 8 0(2)" src="https://github.com/user-attachments/assets/50df05b6-3821-4e2a-8de0-3e5c911b2a27" />
Notable Changes
- Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs
- NomicBert MoE support
- JinaAI Re-Rankers V1 support
- Matryoshka Representation Learning (MRL)
- Dense layer module support (after pooling)
Note
Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.
What's Changed
- [Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574
- Update
README.mdandsupported_models.mdby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572 - Back with linting. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/577
- [Docs] Add cloud run example by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/573
- Fixup by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/578
- Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/576
- Removing requirements file. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/585
- Removing candle-extensions to live on crates.io by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/583
- Bump
sccacheto 0.10.0 andsccache-actionto 0.0.9 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/586 - optimize the performance of FlashBert Path for HPU by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/575
- Revert "Removing requirements file. (#585)" by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/588
- Get opentelemetry trace id from request headers by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/425
- Add argument for configuring Prometheus port by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/589
- Adding missing
head.prefix in the weight name inModernBertClassificationHeadby @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/591 - Fixing the CI (grpc path). by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/593
- fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/595
- enable flash mistral model for HPU device by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/594
- remove optimum-habana dependency by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/599
- Support NomicBert MoE by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/596
- Remove duplicate short option '-p' to fix router executable by @cebtenzzre in https://github.com/huggingface/text-embeddings-inference/pull/602
- Update
text-embeddings-router --helpoutput by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/603 - Warmup padded models too. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/592
- Add support for JinaAI Re-Rankers V1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/582
- Gte diffs by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/604
- Fix the weight name in GTEClassificationHead by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/606
- upgrade pytorch and ipex to 2.7 version by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/607
- upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/608
- Patch DistilBERT variants with different weight keys by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/614
- add offline modeling for model
jinaai/jina-embeddings-v2-base-codeto avoidauto_mapto other repository by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/612 - Add mean pooling strategy for Modernbert classifier by @kwnath in https://github.com/huggingface/text-embeddings-inference/pull/616
- Using serde for pool validation. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/620
- Preparing the update to 1.7.1 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/623
- Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624
- Add
Qwen3Modelby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/627 - Add
HiddenAct::Silu(removeserdealias) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/631 - Add CPU support for Qwen3-Embedding models by @randomm in https://github.com/huggingface/text-embeddings-inference/pull/632
- refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/625
- Support Qwen3 w/ fp32 on GPU by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/634
- Preparing the release. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/639
- Default to Qwen3 in
README.mdanddocs/examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641 - Fix Qwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/646
- Add integration tests for Gaudi by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/598
- Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in https://github.com/huggingface/text-embeddings-inference/pull/648
- Fix FlashQwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/650
- Make flake work on metal by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/654
- Fixing metal backend. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/655
- Qwen3 hpu support by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/656
- change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/659
- Update
versionto 1.7.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/666 - Add last token pooling support for ORT. by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/664
- Fix Qwen3 Embedding Float16 DType by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/663
- Fix
fmtby re-runningpre-commitby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/671 - Update
versionto 1.7.4 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/677 - Support MRL (Matryoshka Representation Learning) by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/676
- Add
Denselayer for2_Dense/modules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/660 - Update
versionto 1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/686
New Contributors
- @NielsRogge made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/574
- @cebtenzzre made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/602
- @kwnath made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/616
- @randomm made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/632
- @lance-miles made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/648
- @tpendragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/664
Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.0...v1.8.0
Fetched April 7, 2026
