Home/Hugging Face/Inference

Inference

Optimized inference servers for text and embeddings

Sun

Mon

Tue

Wed

Thu

Fri

Sat

JulAugSepOctNovDecJanFebMarAprMayJunJul

Less

Releases3Avg Interval5dAvg Cadence6/mo

Mar 23, 2026

Text Embeddings Inference v1.9.3

↗

What's Changed

Use rust-toolchain.toml before rustup on Dockerfile-{cuda,cuda-all} by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/842
fix(backend): replace bare except with Exception in device check by @llukito in…

Feb 25, 2026

Text Embeddings Inference v1.9.2

↗

What's Changed

Fix auto-truncate false setting by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/836
Set pad_token_id as nullable & add support for rope_parameters by @alvarobartt in…

Feb 17, 2026

Dec 19, 2025

Text Generation Inference v3.3.7

↗

What's Changed

misc(gha): expose action cache url and runtime as secrets by @mfuntowicz in https://github.com/huggingface/text-generation-inference/pull/2964
feat: support max_image_fetch_size to limit by @drbh in…

Oct 30, 2025

Text Embeddings Inference v1.8.3

↗

What's Changed

Bug Fixes

Fix error code for empty requests by @vrdn-23 in https://github.com/huggingface/text-embeddings-inference/pull/727
Fix the infinite loop when max_input_length is bigger than max-batch-tokens by @kozistr in…

Sep 17, 2025

Text Generation Inference v3.3.6

↗

What's Changed

Add missing backslash by @philsupertramp in https://github.com/huggingface/text-generation-inference/pull/3311
Revert "feat: bump flake including transformers and huggingface_hub versions" by @drbh in…

Sep 9, 2025

Text Embeddings Inference v1.8.2

↗

🔧 Fixed Intel MKL Support

Since Text Embeddings Inference (TEI) v1.7.0, Intel MKL support had been broken due to changes in the candle dependency. Neither static-linking nor dynamic-linking worked correctly, which caused models using Intel MKL on CPU to fail with…

Sep 4, 2025

Text Embeddings Inference v1.8.1

↗

Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for…

Sep 2, 2025

Text Generation Inference v3.3.5

↗

What's Changed

[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3274
Gaudi: add CI by @baptistecolle in…

Aug 5, 2025

Text Embeddings Inference v1.8.0

↗

Notable Changes

Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs -…

Jul 7, 2025

Text Embeddings Inference v1.7.4

↗

Noticeable Changes

Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null values, as well as a missing to_dtype call on the…

Jun 30, 2025

Text Embeddings Inference v1.7.3

↗

Noticeable Changes

Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.

What's Changed

Default to Qwen3 in README.md and docs/ examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641
Fix Qwen3 by…

Jun 19, 2025

Text Generation Inference v3.3.4

↗

Fix for Neuron models exported with batch_size 1.

What's Changed

[gaudi] gemma3 text and vlm model intial support. need to add sliding window … by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3270
Neuron backend fix by @dacorvo in…

Jun 18, 2025

Text Generation Inference v3.3.3

↗

Neuron backend update.

What's Changed

Remove useless packages by @yuanwu2017 in https://github.com/huggingface/text-generation-inference/pull/3253
Bump neuron SDK version by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3260
Perf opt by…

Jun 16, 2025

Text Embeddings Inference v1.7.2

↗

Notable change

Added support for Qwen3 embeddigns

What's Changed

Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624
Add Qwen3Model by @alvarobartt in…

Jun 3, 2025

Text Embeddings Inference v1.7.1

↗

What's Changed

[Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574
Update README.md and supported_models.md by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572
Back with…

May 30, 2025

Text Generation Inference v3.3.2

↗

Gaudi improvements.

What's Changed

upgrade to new vllm extension ops(fix issue in exponential bucketing) by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3239
Nix: switch to hf-nix by @danieldk in…

May 22, 2025

Text Generation Inference v3.3.1

↗

This release updates TGI to Torch 2.7 and CUDA 12.8.

What's Changed

change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3217
adjust the round_up_seq logic to align…

May 9, 2025

Text Generation Inference v3.3.0

↗

Notable changes

Prefill chunking for VLMs.

What's Changed

Fixing Qwen 2.5 VL (32B). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3157
Fixing tokenization like https://github.com/huggingface/text-embeddin… by @Narsil in…

Latest

Mar 23, 2026