releases.shpreview

Hugging Face/Transformers

v1.4.0: `torchao` FP8, TP & dataLoader support, fix memory leak

v1.4.0

February 17, 2025AccelerateView original ↗

`torchao` FP8, initial Tensor Parallel support, and memory leak fixes

`torchao` FP8

This release introduces a new FP8 API and brings in a new backend: torchao. To use, pass in AORecipeKwargs to the Accelerator while setting mixed_precision="fp8". This is initial support, as it matures we will incorporate more into it (such as accelerate config/yaml) in future releases. See our benchmark examples here

TensorParallel

We have intial support for an in-house solution to TP when working with accelerate dataloaders. check out the PR here

Bug fixes

fix triton version check by @faaany in https://github.com/huggingface/accelerate/pull/3345
fix torch_dtype in estimate memory by @SunMarc in https://github.com/huggingface/accelerate/pull/3383
works for fp8 with deepspeed by @XiaobingSuper in https://github.com/huggingface/accelerate/pull/3361
[memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in https://github.com/huggingface/accelerate/pull/3391

What's Changed

fix triton version check by @faaany in https://github.com/huggingface/accelerate/pull/3345
[tests] enable BNB test cases in tests/test_quantization.py on XPU by @faaany in https://github.com/huggingface/accelerate/pull/3349
[Dev] Update release directions by @muellerzr in https://github.com/huggingface/accelerate/pull/3352
[tests] make cuda-only test work on other hardware accelerators by @faaany in https://github.com/huggingface/accelerate/pull/3302
[tests] remove require_non_xpu test markers by @faaany in https://github.com/huggingface/accelerate/pull/3301
Support more functionalities for MUSA backend by @fmo-mt in https://github.com/huggingface/accelerate/pull/3359
[tests] enable more bnb tests on XPU by @faaany in https://github.com/huggingface/accelerate/pull/3350
feat: support tensor parallel & Data loader by @kmehant in https://github.com/huggingface/accelerate/pull/3173
DeepSpeed github repo move sync by @stas00 in https://github.com/huggingface/accelerate/pull/3376
[tests] Fix bnb cpu error by @faaany in https://github.com/huggingface/accelerate/pull/3351
fix torch_dtype in estimate memory by @SunMarc in https://github.com/huggingface/accelerate/pull/3383
works for fp8 with deepspeed by @XiaobingSuper in https://github.com/huggingface/accelerate/pull/3361
fix: typos in documentation files by @maximevtush in https://github.com/huggingface/accelerate/pull/3388
[examples] upgrade code for seed setting by @faaany in https://github.com/huggingface/accelerate/pull/3387
[memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in https://github.com/huggingface/accelerate/pull/3391
add xpu check in get_quantized_model_device_map by @faaany in https://github.com/huggingface/accelerate/pull/3397
Torchao float8 training by @muellerzr in https://github.com/huggingface/accelerate/pull/3348

New Contributors

@kmehant made their first contribution in https://github.com/huggingface/accelerate/pull/3173
@XiaobingSuper made their first contribution in https://github.com/huggingface/accelerate/pull/3361
@maximevtush made their first contribution in https://github.com/huggingface/accelerate/pull/3388

Full Changelog: https://github.com/huggingface/accelerate/compare/v1.3.0...v1.4.0

Fetched April 7, 2026