change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-generation-inference/pull/3217
adjust the round_up_seq logic to align…

May 9, 2025

v3.3.0

↗

Notable changes

Prefill chunking for VLMs.

What's Changed

Fixing Qwen 2.5 VL (32B). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3157
Fixing tokenization like https://github.com/huggingface/text-embeddin… by @Narsil in…

Apr 8, 2025

v3.2.3

↗

Main changes

Patching Llama 4

What's Changed

Use ROCM 6.3.1 by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3141
Update transformers to 4.51 by @mht-sharma in…

Apr 6, 2025

v3.2.2

↗

What's Changed

Minor fixes. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3125
configurable termination timeout by @ErikKaum in https://github.com/huggingface/text-generation-inference/pull/3126
CI: enable server tests for backends by…

Mar 18, 2025

v3.2.1

↗

What's Changed

Update to kernels 0.2.1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3084
Router: add gemma3-text model type by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3107
We need gcc during…

Mar 12, 2025

v3.2.0

↗

Important changes

BREAKING CHANGE: Lots of modifications around tool calling. Tool calling now respects fully OpenAI return results (arguments return type is a string instead of a real JSON object). Lots of improvements around the tool calling and side effects…

Mar 4, 2025

v3.1.1

↗

What's Changed

Back on nix main. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2979
hotfix: fix trtllm CI build on release by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2981
Add strftime_now callable…

Jan 31, 2025

v3.1.0

↗

Important changes

Deepseek R1 is fully supported on both AMD and Nvidia !

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id deepseek-ai/DeepSeek-R1

What's Changed

*…

Jan 24, 2025

v3.0.2

↗

Tl;dr

New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez

New models unlocked: Cohere2, olmo, olmo2, helium.

What's Changed

docs(README):…

Dec 11, 2024

v3.0.1

↗

Summary

Patch release to handle a few older models and corner cases.

What's Changed

Hotfix link2 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2812
Small update to docs by @Narsil in…

Dec 9, 2024

v3.0.0

↗

TL;DR

Big new release

Details: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

What's…

Nov 22, 2024

v2.4.1

↗

Notable changes

Choose input/total tokens automatically based on available VRAM
Support Qwen2 VL
Decrease latency of very large batches (> 128)

What's Changed

feat: add triton kernels to decrease latency of large batches by @OlivierDehaene in…

Oct 25, 2024

v2.4.0

↗

Notable changes

Experimental prefill chunking (PREFILL_CHUNKING=1)
Experimental FP8 KV cache support
Greatly decrease latency for large batches (> 128 requests)
Faster MoE kernels and support for GPTQ-quantized MoE
Faster implementation of MLLama

##…

Oct 3, 2024

v2.3.1

↗

Important changes

Added support for Mllama (3.2, vision models). Flashed, unpadded.
FP8 performance improvements
Moe performance improvements
BREAKING CHANGE - When using tools, models could answer with a tool call notify_error with the content error, it will…

From other products

v2.35.1May 6, 2026
Fixes imagegen size enum regression
Fixed a regression in the imagegen size enum.
OpenAI · Python SDK

Last Checked

1h ago

Latest

v3.3.7

Source

@huggingface/text-generation-inference

Tracking since Feb 3, 2023

.json·.md·.atom

Text Generation Inference

What's Changed

What's Changed

What's Changed

What's Changed

What's Changed

What's Changed

What's Changed

Notable changes

What's Changed

Main changes

What's Changed

What's Changed

What's Changed

Important changes

What's Changed

Important changes

What's Changed

What's Changed

Summary

What's Changed

TL;DR

What's…

Notable changes

What's Changed

Notable changes

Important changes

More from Hugging Face

From other products

More from Hugging Face

From other products