releases.shpreview

v1.9.0

$npx -y @buildinternet/releases show rel_Y8D6Bpmw3UbyRwtU5a0H4
<img width="1800" height="972" alt="text-embeddings-inference-v1 9 0" src="https://github.com/user-attachments/assets/fe3751d1-1a3a-4b1f-8cf5-5c2326c14a62" />

What's changed?

🚨 Breaking changes

Default GeLU implementation is now GeLU + tanh approximation instead of exact GeLU (aka. GeLU erf) to make sure that the CPU and CUDA embeddings are the same (as cuBLASlt only supports GeLU + tanh), which represents a slight misalignment from how Transformers handles it, as when hidden_act="gelu" is set in config.json, GeLU erf should be used. The numerical differences between GeLU + tanh and GeLU erf should have negligible impact on inference quality.

--auto-truncate now defaults to true, meaning that the sequences will be truncated to the lower value between the --max-batch-tokens or the maximum model length, to prevent the --max-batch-tokens from being lower than the actual maximum supported length.

🎉 Additions

🐛 Fixes

⚡ Improvements

📄 Other

🆕 New Contributors

Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.3...v1.9.0

Fetched April 7, 2026