releases.shpreview

v2.3.0

$npx -y @buildinternet/releases show rel_C900M4rOiWdq3aao1p9NL

Important changes

  • Renamed HUGGINGFACE_HUB_CACHE to use HF_HOME. This is done to harmonize environment variables across HF ecosystem. So locations of data moved from /data/models-.... to /data/hub/models-.... on the Docker.

  • Prefix caching by default ! To help with long running queries TGI will use prefix caching a reuse pre-existing queries in the kv-cache in order to speed up TTFT. This should be totally transparent for most users, however this has required a instense rewrite of internals and therefore bugs can potentially exist. Also we changed kernels from paged_attention to flashinfer (and flashdecoding as a fallback for some specific models that aren't supported by flashinfer).

  • Lots of performance improvements with Marlin and quantization.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.2.0...v2.3.0

Fetched April 7, 2026