v3.1.0
Important changes
Deepseek R1 is fully supported on both AMD and Nvidia !
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id deepseek-ai/DeepSeek-R1
What's Changed
- Attempt to remove AWS S3 flaky cache for sccache by @mfuntowicz in https://github.com/huggingface/text-generation-inference/pull/2953
- Update to attention-kernels 0.2.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2950
- fix: Telemetry by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2957
- Fixing the oom maybe with 2.5.1 change. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2958
- Add backend name to telemetry by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2962
- Add fp8 support moe models by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2928
- Update to moe-kernels 0.8.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2966
- Hotfixing intel-cpu (not sure how it was working before). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2967
- Add deepseekv3 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2968
- doc: Update TRTLLM deployment doc. by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2960
- Update moe-kernel to 0.8.2 for rocm by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2977
- Prepare for release 3.1.0 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2972
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.2...v3.1.0
Fetched April 7, 2026

