v3.1.0

Important changes

Deepseek R1 is fully supported on both AMD and Nvidia !

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id deepseek-ai/DeepSeek-R1

What's Changed

Attempt to remove AWS S3 flaky cache for sccache by @mfuntowicz in https://github.com/huggingface/text-generation-inference/pull/2953
Update to attention-kernels 0.2.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2950
fix: Telemetry by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2957
Fixing the oom maybe with 2.5.1 change. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2958
Add backend name to telemetry by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2962
Add fp8 support moe models by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2928
Update to moe-kernels 0.8.0 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2966
Hotfixing intel-cpu (not sure how it was working before). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2967
Add deepseekv3 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2968
doc: Update TRTLLM deployment doc. by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2960
Update moe-kernel to 0.8.2 for rocm by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2977
Prepare for release 3.1.0 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2972

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.0.2...v3.1.0

Important changes

What's Changed

More from Hugging Face

More from Hugging Face