Text Embeddings Inference shifted toward better container compatibility and model coverage. v1.9.0 made breaking changes to unify CPU and CUDA GeLU behavior by defaulting to the tanh approximation instead of exact GeLU, and flipped --auto-truncate to true by default. Subsequent releases fixed CUDA version mismatch errors in container deployments and added rope parameter support alongside backing for Perplexity's embedding models, while refinements addressed edge cases like nullable pad tokens and truncation setting persistence.
Fixed Docker build toolchain resolution and improved error handling in device detection. The release standardized Rust toolchain specification in CUDA Dockerfiles and replaced bare exception catching with explicit Exception handling.