releases.shpreview

v2.4.0

$npx -y @buildinternet/releases show rel_3V4rTkMvZccspqXWYQlS-

Notable changes

  • Experimental prefill chunking (PREFILL_CHUNKING=1)
  • Experimental FP8 KV cache support
  • Greatly decrease latency for large batches (> 128 requests)
  • Faster MoE kernels and support for GPTQ-quantized MoE
  • Faster implementation of MLLama

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.4

Fetched April 7, 2026