releases.shpreview

Hugging Face/Inference

v3.3.1

May 22, 2025Text Generation InferenceView original ↗

This release updates TGI to Torch 2.7 and CUDA 12.8.

What's Changed

change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3217
adjust the round_up_seq logic to align with prefill warmup phase on… by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3224
Update to Torch 2.7.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3221
Enable Llama4 for gaudi backend by @yuanwu2017 in https://github.com/huggingface/text-generation-inference/pull/3223
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all by @drbh in https://github.com/huggingface/text-generation-inference/pull/3230
Deepseek r1 by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3211
Refine warmup and upgrade to synapse AI 1.21.0 by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3234
fix the crash in default ATTENTION path by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3235
Switch to punica-sgmv kernel from the Hub by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3236
move input_ids to hpu and remove disposal of adapter_meta by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3237
Prepare for 3.3.1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3238

New Contributors

@kaixuanliu made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3217

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.0...v3.3.1

Fetched April 7, 2026