releases.shpreview

Hugging Face/Inference

v3.3.0

May 9, 2025Text Generation InferenceView original ↗

Notable changes

Prefill chunking for VLMs.

What's Changed

Fixing Qwen 2.5 VL (32B). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3157
Fixing tokenization like https://github.com/huggingface/text-embeddin… by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3156
Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3113
L4 fixes by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3161
setuptools <= 70.0 is vulnerable: CVE-2024-6345 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3171
transformers flash llm/vlm enabling in ipex by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3152
Upgrading the dependencies in Gaudi backend. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3170
Hotfixing gaudi deps. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3174
Hotfix gaudi2 with newer transformers. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3176
Support flashinfer for Gemma3 prefill by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3167
Get opentelemetry trace id from request headers instead of creating a new trace by @kozistr in https://github.com/huggingface/text-generation-inference/pull/2648
Bump sccache to 0.10.0 by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/3179
Fixing CI by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3184
Add option to configure prometheus port by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3187
Warmup gaudi backend by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3172
Put more wiggle room. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3189
Fixing the router + template for Qwen3. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3200
Skip {% generation %} and {% endgeneration %} template handling by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/3204
doc typo by @julien-c in https://github.com/huggingface/text-generation-inference/pull/3206
Pr 2982 ci branch by @drbh in https://github.com/huggingface/text-generation-inference/pull/3046
fix: bump snaps for mllama by @drbh in https://github.com/huggingface/text-generation-inference/pull/3202
Update client SDK snippets by @julien-c in https://github.com/huggingface/text-generation-inference/pull/3207
Fix HF_HUB_OFFLINE=1 for Gaudi backend by @regisss in https://github.com/huggingface/text-generation-inference/pull/3193
IPEX support FP8 kvcache/softcap/slidingwindow by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3144
forward and tokenize chooser use the same shape by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3196
Chunked Prefill VLM by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3188
Prepare for 3.3.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3220

New Contributors

@kozistr made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2648
@julien-c made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3206

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.2.3...v3.3.0

Fetched April 7, 2026