v3.3.1
This release updates TGI to Torch 2.7 and CUDA 12.8.
What's Changed
- change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3217
- adjust the
round_up_seqlogic to align with prefill warmup phase on… by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3224 - Update to Torch 2.7.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3221
- Enable Llama4 for gaudi backend by @yuanwu2017 in https://github.com/huggingface/text-generation-inference/pull/3223
- fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all by @drbh in https://github.com/huggingface/text-generation-inference/pull/3230
- Deepseek r1 by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3211
- Refine warmup and upgrade to synapse AI 1.21.0 by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3234
- fix the crash in default ATTENTION path by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3235
- Switch to punica-sgmv kernel from the Hub by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3236
- move input_ids to hpu and remove disposal of adapter_meta by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3237
- Prepare for 3.3.1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3238
New Contributors
- @kaixuanliu made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3217
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.0...v3.3.1
Fetched April 7, 2026
