v3.3.5
What's Changed
- [gaudi] Refine rope memory, do not need to keep sin/cos cache per layer by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3274
- Gaudi: add CI by @baptistecolle in https://github.com/huggingface/text-generation-inference/pull/3160
- [gaudi] Gemma3 sliding window support by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3280
- xpu lora support by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3232
- Optimum neuron 0.2.2 by @dacorvo in https://github.com/huggingface/text-generation-inference/pull/3281
- [gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to m… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3284
- [gaudi] Deepseek v2 mla and add ep to unquantized moe by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3287
- [gaudi] Fix the CI test errors by @yuanwu2017 in https://github.com/huggingface/text-generation-inference/pull/3286
- Hpu gptq gidx support by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3297
- Migrate to V2 Pydantic interface by @emmanuel-ferdman in https://github.com/huggingface/text-generation-inference/pull/3262
- Xccl by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3252
- Multi modality fix by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3283
- some gptq case could not be handled by ipex. but could be handle by t… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3298
- fix outline import issue by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3282
- HuggingFaceM4/Idefics3-8B-Llama3 crash fix by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/3267
- Optimum neuron 0.3.0 by @tengomucho in https://github.com/huggingface/text-generation-inference/pull/3308
- Disable Cachix pushes by @danieldk in https://github.com/huggingface/text-generation-inference/pull/3312
- chore: prepare version 3.3.5 by @tengomucho in https://github.com/huggingface/text-generation-inference/pull/3314
- feat: bump flake including transformers and huggingface_hub versions by @drbh in https://github.com/huggingface/text-generation-inference/pull/3313
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v3.3.4...git
Fetched April 7, 2026
