Tl;dr
New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez
New models unlocked: Cohere2, olmo, olmo2, helium.
docker run in README.md by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2861uv instead of poetry. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2919pre-commit run --all-files to fix CI by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2933alias for max_completion_tokens in ChatRequest by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2932Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.1...v3.0.2
Fetched April 7, 2026