releases.shpreview

Text Embeddings Inference

Sun

Mon

Tue

Wed

Thu

Fri

Sat

JulAugSepOctNovDecJanFebMarAprMayJunJul

Less

More

Releases3Avg0/wkVersionsv1.9.0v1.9.2

Highlights All Releases

Mar 23, 2026

Mar 23, 2026

v1.9.3

What's Changed

Use rust-toolchain.toml before rustup on Dockerfile-{cuda,cuda-all} by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/842
fix(backend): replace bare except with Exception in device check by @llukito in…

Feb 25, 2026

Feb 25, 2026

v1.9.2

What's Changed

Fix auto-truncate false setting by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/836
Set pad_token_id as nullable & add support for rope_parameters by @alvarobartt in…

Feb 17, 2026

Feb 17, 2026

v1.9.1

What's Changed

🚨 Fix

Fix support for containers w/ CUDA 13.0+ by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/831

When releasing ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 with CUDA 12.9 and cuda-compat-12-9 there…

v1.9.0

What's changed?

🚨 Breaking changes

Default HiddenAct::Gelu to GeLU + tanh in favour of GeLU erf by…

Oct 30, 2025

Oct 30, 2025

v1.8.3

What's Changed

Bug Fixes

Fix error code for empty requests by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/727
Fix the infinite loop when max_input_length is bigger than max-batch-tokens by @kozistr in…

Sep 9, 2025

Sep 9, 2025

v1.8.2

🔧 Fixed Intel MKL Support

Since Text Embeddings Inference (TEI) v1.7.0, Intel MKL support had been broken due to changes in the candle dependency. Neither static-linking nor dynamic-linking worked correctly, which caused models using Intel MKL on CPU to fail with…

Sep 4, 2025

Sep 4, 2025

v1.8.1

Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for…

Aug 5, 2025

Aug 5, 2025

v1.8.0

Notable Changes

Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs -…

Jul 7, 2025

Jul 7, 2025

v1.7.4

Noticeable Changes

Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null values, as well as a missing to_dtype call on the…

Jun 30, 2025

Jun 30, 2025

v1.7.3

Noticeable Changes

Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.

What's Changed

Default to Qwen3 in README.md and docs/ examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641
Fix Qwen3 by…

Jun 16, 2025

Jun 16, 2025

v1.7.2

Notable change

Added support for Qwen3 embeddigns

What's Changed

Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624
Add Qwen3Model by @alvarobartt in…

Jun 3, 2025

Jun 3, 2025

v1.7.1

What's Changed

[Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574
Update README.md and supported_models.md by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572
Back with…

Apr 8, 2025

Apr 8, 2025

v1.7.0

Notable changes

Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)
Added ModernBert support by @kozistr !

What's Changed

Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in…

Mar 28, 2025

Mar 28, 2025

v1.6.1

What's Changed

Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in https://github.com/huggingface/text-embeddings-inference/pull/245
add reranker model support for python backend by @kaixuanliu in…

Dec 13, 2024

Dec 13, 2024

v1.6.0

What's Changed

feat: support multiple backends at the same time by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/440
feat: GTE classification head by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/441 *…

Nov 5, 2024

Nov 5, 2024

v1.5.1

What's Changed

Download model.onnx_data by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/343
Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in…

Jul 10, 2024

Jul 10, 2024

v1.5.0

Notable Changes

ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
Add /similarity route

What's Changed

tokenizer max limit on input size by @ErikKaum in https://github.com/huggingface/text-embeddings-inference/pull/324
docs:…

Jul 2, 2024

Jul 2, 2024

v1.4.0

Notable Changes

Cuda support for the Qwen2 model architecture

What's Changed

feat(candle): support Qwen2 on Cuda by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/316
fix(candle): fix last token pooling

Full Changelog:…

Jun 28, 2024

Jun 28, 2024

v1.3.0

Notable changes

New truncation direction parameter
Cuda support for JinaCode model architecture
Cuda support for Mistral model architecture
Cuda…

Apr 25, 2024

Apr 25, 2024

v1.2.3

What's Changed

fix limit peak memory to build cuda-all docker image by @OlivierDehaene in https://github.com/huggingface/text-embeddings-inference/pull/246

Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.2.2...v1.2.3

Last Checked

3h ago

Latest

v1.9.3

Source

@huggingface/text-embeddings-inference

Tracking since Oct 13, 2023

.json·.md·.atom