releases.shpreview
Hugging Face/Text Generation Inference

Text Generation Inference

$npx -y @buildinternet/releases show text-generation-inference
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases2Avg0/wkVersionsv3.3.6 → v3.3.7
Dec 19, 2025

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.6...v3.3.7

Sep 17, 2025

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.5...v3.3.6

Sep 2, 2025

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.4...git

Jun 19, 2025

Fix for Neuron models exported with batch_size 1.

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.3...v3.3.4

Jun 18, 2025

Neuron backend update.

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.2...v3.3.3

May 30, 2025

Gaudi improvements.

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.1...v3.3.2

May 22, 2025

This release updates TGI to Torch 2.7 and CUDA 12.8.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.0...v3.3.1

May 9, 2025

Notable changes

  • Prefill chunking for VLMs.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.2.3...v3.3.0

Apr 8, 2025
Apr 6, 2025

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.2.1...v3.2.2

Mar 18, 2025

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.2.0...v3.2.1

Mar 12, 2025

Important changes

  • BREAKING CHANGE: Lots of modifications around tool calling. Tool calling now respects fully OpenAI return results (arguments return type is a string instead of a real JSON object). Lots of improvements around the tool calling and side effects fixed.

  • Added Gemma 3 support.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.1.1...v3.2.0

Mar 4, 2025

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.1.0...v3.1.1

Jan 31, 2025

Important changes

Deepseek R1 is fully supported on both AMD and Nvidia !

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id deepseek-ai/DeepSeek-R1

What's Changed

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.2...v3.1.0

Jan 24, 2025

Tl;dr

New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez

New models unlocked: Cohere2, olmo, olmo2, helium.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.1...v3.0.2

Dec 11, 2024

Summary

Patch release to handle a few older models and corner cases.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.0...v3.0.1

Dec 9, 2024

TL;DR

Big new release

Details: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.4.1...v3.0.0

Nov 22, 2024

Notable changes

  • Choose input/total tokens automatically based on available VRAM
  • Support Qwen2 VL
  • Decrease latency of very large batches (> 128)

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.4.1

Oct 25, 2024

Notable changes

  • Experimental prefill chunking (PREFILL_CHUNKING=1)
  • Experimental FP8 KV cache support
  • Greatly decrease latency for large batches (> 128 requests)
  • Faster MoE kernels and support for GPTQ-quantized MoE
  • Faster implementation of MLLama

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.4

Oct 3, 2024

Important changes

  • Added support for Mllama (3.2, vision models). Flashed, unpadded.
  • FP8 performance improvements
  • Moe performance improvements
  • BREAKING CHANGE - When using tools, models could answer with a tool call notify_error with the content error, it will instead output regular generation.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.3.1

Previous123Next
Latest
v3.3.7
Tracking Since
Feb 3, 2023
Last fetched Apr 19, 2026