v3.1.1
What's Changed
- Back on nix main. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2979
- hotfix: fix trtllm CI build on release by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2981
- Add
strftime_nowcallable function forminijinjachat templates by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2983 - impureWithCuda: fix gcc version by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2990
- Improve qwen vl impl by @drbh in https://github.com/huggingface/text-generation-inference/pull/2943
- Using the "lockfile". by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2992
- Triton fix by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2995
- [Backend] Bump TRTLLM to v.0.17.0 by @mfuntowicz in https://github.com/huggingface/text-generation-inference/pull/2991
- Updating mllama after strftime. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2993
- Use kernels from the kernel hub by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2988
- fix Qwen VL break in intel platform by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3002
- Update the flaky mllama test. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3015
- Preventing single user hugging the server to death by asking by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3016
- Putting back the NCCL forced upgrade. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2999
- Support sigmoid scoring function in GPTQ-MoE by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3017
- [Backend] Add Llamacpp backend by @angt in https://github.com/huggingface/text-generation-inference/pull/2975
- Use eetq kernel from the hub by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3029
- Update README.md by @celsowm in https://github.com/huggingface/text-generation-inference/pull/3024
- Add
loop_controlsfeature tominijinjato handle{% break %}by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2998 - Pinning trufflehog. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3032
- It's find in some machine. using hf_hub::api::sync::Api to download c… by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3030
- Improve Transformers support by @Cyrilvallez in https://github.com/huggingface/text-generation-inference/pull/2970
- feat: add initial qwen2.5-vl model and test by @drbh in https://github.com/huggingface/text-generation-inference/pull/2971
- Using public external registry (to use external runners for CI). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3031
- Having less logs in case of failure for checking CI more easily. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3037
- feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable for telemetry by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/3027
- update ipex and torch to 2.6 for cpu by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3039
- flashinfer 0.2.0.post1 -> post2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3040
- fix qwen2 vl crash in continous batching by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3004
- Simplify logs2. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3045
- Update Gradio ChatInterface configuration in consuming_tgi.md by @angt in https://github.com/huggingface/text-generation-inference/pull/3042
- Improve tool call message processing by @drbh in https://github.com/huggingface/text-generation-inference/pull/3036
- Use
rotarykernel from the Hub by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3041 - Add Neuron backend by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3033
- You need to seek apparently. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3049
- some minor fix by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3048
- fix: run linters and fix formatting by @drbh in https://github.com/huggingface/text-generation-inference/pull/3057
- Avoid running neuron integration tests twice by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3054
- Add Gaudi Backend by @baptistecolle in https://github.com/huggingface/text-generation-inference/pull/3055
- Fix two edge cases in
RadixTrie::findby @danieldk in https://github.com/huggingface/text-generation-inference/pull/3067 - Add property-based testing for
RadixAllocatorby @danieldk in https://github.com/huggingface/text-generation-inference/pull/3068 - feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/3061
- Preparing for release. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3060
- Fix a tiny typo in
monitoring.mdtutorial by @sadra-barikbin in https://github.com/huggingface/text-generation-inference/pull/3056 - Patch rust release. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/3069
New Contributors
- @angt made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2975
- @celsowm made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3024
- @dacorvo made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3033
- @sadra-barikbin made their first contribution in https://github.com/huggingface/text-generation-inference/pull/3056
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.1.0...v3.1.1
Fetched April 7, 2026
