v1.7.4
Noticeable Changes
Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downcast (now manually set to FP16 minimum value instead) leading to null values, as well as a missing to_dtype call on the attention_bias when working with batches.
What's Changed
- Fix Qwen3 Embedding Float16 DType by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/663
- Fix
fmtby re-runningpre-commitby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/671 - Update
versionto 1.7.4 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/677
Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.3...v1.7.4
Fetched April 7, 2026
