What's Changed
- Use
rust-toolchain.tomlbeforerustuponDockerfile-{cuda,cuda-all}by @alvarobartt in https://github.com/huggingface/text-...
Optimized inference servers for text and embeddings
rust-toolchain.toml before rustup on Dockerfile-{cuda,cuda-all} by @alvarobartt in https://github.com/huggingface/text-...Since Text Embeddings Inference (TEI) v1.7.0, Intel MKL support had been broken due to changes in the candle depend...
<img width="1200" height="648" alt="text-embeddings-inference-v1 8 1-embedding-gemma(1)" src="https://github.com/user-attachments/assets/8ad8fb64-cee4...
<img width="3600" height="1944" alt="text-embeddings-inference-v1 8 0(2)" src="https://github.com/user-attachments/assets/50df05b6-3821-4e2a-8de0-3e5c...
Qwen3 was not working fine on CPU / MPS when sending batched requests on FP16 precision, due to the FP32 minimum value downca...
Qwen3 support included for Intel HPU, and fixed for CPU / Metal / CUDA.
Fix for Neuron models exported with batch_size 1.
Neuron backend update.
Gaudi improvements.
This release updates TGI to Torch 2.7 and CUDA 12.8.