PREFILL_CHUNKING=1)_server.nix file by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2538RUN in Dockerfile by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2547DenseMoELayer and wire it up in Mixtral/Deepseek V2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2537--features google by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2566main by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2586desc_act for groupsize != -1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2590ChatRequest for Vertex AI Chat instead of VertexChat by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2651e4m3fn KV cache by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2655attention function by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2609desc_act=true by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2622impureWithCuda dev shell by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2677Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.4
Fetched April 7, 2026