releases.shpreview

Hugging Face/Inference

v1.7.3

June 30, 2025Text Embeddings InferenceView original ↗

Noticeable Changes

Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.

What's Changed

Default to Qwen3 in README.md and docs/ examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641
Fix Qwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/646
Add integration tests for Gaudi by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in https://github.com/huggingface/text-embeddings-inference/pull/648
Fix FlashQwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/650
Make flake work on metal by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/654
Fixing metal backend. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/655
Qwen3 hpu support by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/659
Update version to 1.7.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/666
Add last token pooling support for ORT. by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/664

New Contributors

@lance-miles made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/648
@tpendragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/664

Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.2...v1.7.3

Fetched April 7, 2026