v3.2.0
Important changes
-
BREAKING CHANGE: Lots of modifications around tool calling. Tool calling now respects fully OpenAI return results (arguments return type is a string instead of a real JSON object). Lots of improvements around the tool calling and side effects fixed.
-
Added Gemma 3 support.
What's Changed
- fix(neuron): explicitly install toolchain by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3072
- Only add token when it is defined. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3073
- Making sure Olmo (transformers backend) works. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3074
- Making
tool_callsa vector. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3075 - Nix: add
openaito impure shell for integration tests by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3081 - Update
--max-batch-total-tokensdescription by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/3083 - Fix tool call2 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3076
- Nix: the launcher needs a Python env with Torch for GPU detection by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3085
- Add request parameters to OTel span for
/v1/chat/completionsendpoint by @aW3st in https://github.com/huggingface/text-generation-inference/pull/3000 - Add qwen2 multi lora layers support by @EachSheep in https://github.com/huggingface/text-generation-inference/pull/3089
- Add modules_to_not_convert in quantized model by @jiqing-feng in https://github.com/huggingface/text-generation-inference/pull/3053
- Small test and typing fixes by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3078
- hotfix: qwen2 formatting by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3093
- Pr 3003 ci branch by @drbh in https://github.com/huggingface/text-generation-inference/pull/3007
- Update the llamacpp backend by @angt in https://github.com/huggingface/text-generation-inference/pull/3022
- Fix qwen vl by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3096
- Update README.md by @celsowm in https://github.com/huggingface/text-generation-inference/pull/3095
- Fix tool call3 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3086
- Add gemma3 model by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/3099
- Fix tool call4 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3094
- Update neuron backend by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3098
- Preparing relase 3.2.0 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3100
- Try to fix on main CI color. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3101
New Contributors
- @EachSheep made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3089
- @jiqing-feng made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3053
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.1.1...v3.2.0
Fetched April 7, 2026

