releases.shpreview

Hugging Face/Inference

v3.2.0

March 12, 2025Text Generation InferenceView original ↗

Important changes

BREAKING CHANGE: Lots of modifications around tool calling. Tool calling now respects fully OpenAI return results (arguments return type is a string instead of a real JSON object). Lots of improvements around the tool calling and side effects fixed.
Added Gemma 3 support.

What's Changed

fix(neuron): explicitly install toolchain by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3072
Only add token when it is defined. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3073
Making sure Olmo (transformers backend) works. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3074
Making tool_calls a vector. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3075
Nix: add openai to impure shell for integration tests by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3081
Update --max-batch-total-tokens description by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/3083
Fix tool call2 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3076
Nix: the launcher needs a Python env with Torch for GPU detection by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3085
Add request parameters to OTel span for /v1/chat/completions endpoint by @aW3st in https://github.com/huggingface/text-generation-inference/pull/3000
Add qwen2 multi lora layers support by @EachSheep in https://github.com/huggingface/text-generation-inference/pull/3089
Add modules_to_not_convert in quantized model by @jiqing-feng in https://github.com/huggingface/text-generation-inference/pull/3053
Small test and typing fixes by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3078
hotfix: qwen2 formatting by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3093
Pr 3003 ci branch by @drbh in https://github.com/huggingface/text-generation-inference/pull/3007
Update the llamacpp backend by @angt in https://github.com/huggingface/text-generation-inference/pull/3022
Fix qwen vl by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3096
Update README.md by @celsowm in https://github.com/huggingface/text-generation-inference/pull/3095
Fix tool call3 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3086
Add gemma3 model by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3099
Fix tool call4 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3094
Update neuron backend by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3098
Preparing relase 3.2.0 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3100
Try to fix on main CI color. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3101

New Contributors

@EachSheep made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3089
@jiqing-feng made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3053

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.1.1...v3.2.0

Fetched April 7, 2026