v1.4.0: `torchao` FP8, TP & dataLoader support, fix memory leak
torchao FP8, initial Tensor Parallel support, and memory leak fixestorchao FP8This release introduces a new FP8 API and brings in a new backend: torchao. To use, pass in AORecipeKwargs to the Accelerator while setting mixed_precision="fp8". This is initial support, as it matures we will incorporate more into it (such as accelerate config/yaml) in future releases. See our benchmark examples here
We have intial support for an in-house solution to TP when working with accelerate dataloaders. check out the PR here
memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in https://github.com/huggingface/accelerate/pull/3391tests/test_quantization.py on XPU by @faaany in https://github.com/huggingface/accelerate/pull/3349require_non_xpu test markers by @faaany in https://github.com/huggingface/accelerate/pull/3301memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in https://github.com/huggingface/accelerate/pull/3391get_quantized_model_device_map by @faaany in https://github.com/huggingface/accelerate/pull/3397Full Changelog: https://github.com/huggingface/accelerate/compare/v1.3.0...v1.4.0
Fetched April 7, 2026