releases.shpreview

Hugging Face/Accelerate/v1.8.0

v1.8.0: FSDPv2 + FP8, Regional Compilation for DeepSpeed, Faster Distributed Training on Intel CPUs, ipex.optimize deprecation

v1.8.0

June 19, 2025AccelerateView original ↗

FSDPv2 refactor + FP8 support

We've simplified how to prepare FSDPv2 models, as there were too many ways to compose FSDP2 with other features (e.g., FP8, torch.compile, activation checkpointing, etc.). Although the setup is now more restrictive, it leads to fewer errors and a more performant user experience. We’ve also added support for FP8. You can read about the results here. Thanks to @S1ro1 for this contribution!

[FSDP2] Refactor + FP8 by @S1ro1 in https://github.com/huggingface/accelerate/pull/3585

Faster Distributed Training on Intel CPUs

We updated the CCL_WORKER_COUNT variable and added KMP parameters for Intel CPU users. This significantly improves distributed training performance (e.g., Tensor Parallelism), with up to a 40% speed-up on Intel 4th Gen Xeon when training transformer TP models.

Set ccl and KMP param in simple launch by @jiqing-feng in https://github.com/huggingface/accelerate/pull/3575

Regional Compilation for DeepSpeed

We added support for regional compilation with the DeepSpeed engine. DeepSpeed’s .compile() modifies models in-place using torch.nn.Module.compile(...), rather than the out-of-place torch.compile(...), so we had to account for that. Thanks @IlyasMoutawwakil for this feature!

Fix deepspeed regional compilation by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3609

ipex.optimize deprecation

ipex.optimize is being deprecated. Most optimizations have been upstreamed to PyTorch, and future improvements will land there directly. For users without PyTorch 2.8, we’ll continue to rely on IPEX for now.

remove ipex.optimize in accelerate by @yao-matrix in https://github.com/huggingface/accelerate/pull/3608

Better XPU Support

We've greatly expanded and stabilized support for Intel XPUs:

enable fsdp2 benchmark on XPU by @yao-matrix in https://github.com/huggingface/accelerate/pull/3590
enable big_model_inference on xpu by @yao-matrix in https://github.com/huggingface/accelerate/pull/3595
enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU by @yao-matrix in
enable test_cli & test_example cases on XPU by @yao-matrix in https://github.com/huggingface/accelerate/pull/3578
enable torchao and pippy test cases on XPU by @yao-matrix in https://github.com/huggingface/accelerate/pull/3599
enable regional_compilation benchmark on xpu by @yao-matrix in https://github.com/huggingface/accelerate/pull/3592
fix xpu 8bit value loading by @jiqing-feng in https://github.com/huggingface/accelerate/pull/3623
add device-agnostic GradScaler by @yao-matrix in https://github.com/huggingface/accelerate/pull/3588
add xpu support in TorchTensorParallelPlugin by @yao-matrix in https://github.com/huggingface/accelerate/pull/3627

Trackers

We've added support for SwanLab as an experiment tracking backend. Huge thanks to @ShaohonChen for this contribution ! We also deferred all tracker initializations to prevent premature setup of distributed environments.

Integrate SwanLab for offline/online experiment tracking for Accelerate by @ShaohonChen in https://github.com/huggingface/accelerate/pull/3605
Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup by @yuanjua in https://github.com/huggingface/accelerate/pull/3581

What's Changed

Fix bf16 training with TP by @SunMarc in https://github.com/huggingface/accelerate/pull/3610
better handle FP8 with and without deepspeed by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3611
Update Gaudi Runners by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3593
goodbye torch_ccl by @yao-matrix in https://github.com/huggingface/accelerate/pull/3580
Add support for standalone mode when default port is occupied on single node by @laitifranz in https://github.com/huggingface/accelerate/pull/3576
Resolve logger warnings by @emmanuel-ferdman in https://github.com/huggingface/accelerate/pull/3582
Add kwargs to optimizer, scheduler and dataloader using function accelerator().load_state() by @luiz0992 in https://github.com/huggingface/accelerate/pull/3540
[docs] no hard-coded cuda in the ddp documentation by @faaany in https://github.com/huggingface/accelerate/pull/3589
change to use torch.device by @yao-matrix in https://github.com/huggingface/accelerate/pull/3594
Fix: list object has no attribute keys by @S1ro1 in https://github.com/huggingface/accelerate/pull/3603
Update Gaudi Runners by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3593
Fix bf16 training with TP by @SunMarc in https://github.com/huggingface/accelerate/pull/3610
better handle FP8 with and without deepspeed by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3611
Remove device_count for TPU launcher to avoid initializing runtime by @sorgfresser in https://github.com/huggingface/accelerate/pull/3587
Fix missing te.LayerNorm in intel_transformer_engine by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3619
Add fp8_e5m2 support in dtype_byte_size by @SunMarc in https://github.com/huggingface/accelerate/pull/3625
[Deepspeed] deepspeed auto grad accum by @kashif in https://github.com/huggingface/accelerate/pull/3630
Remove hardcoded cuda from fsdpv2 by @IlyasMoutawwakil in https://github.com/huggingface/accelerate/pull/3631
Integrate SwanLab for offline/online experiment tracking for Accelerate by @ShaohonChen in https://github.com/huggingface/accelerate/pull/3605
Fix Typos in Documentation and Comments by @leopardracer in https://github.com/huggingface/accelerate/pull/3621
feat: use datasets.IterableDataset shard if possible by @SunMarc in https://github.com/huggingface/accelerate/pull/3635
[DeepSpeed] sync gradient accum steps from deepspeed plugin by @kashif in https://github.com/huggingface/accelerate/pull/3632
Feat: add cpu offload by @S1ro1 in https://github.com/huggingface/accelerate/pull/3636
Fix: correct labels for fsdp2 examples by @S1ro1 in https://github.com/huggingface/accelerate/pull/3637
fix grad acc deepspeed by @SunMarc in https://github.com/huggingface/accelerate/pull/3638

New Contributors

@laitifranz made their first contribution in https://github.com/huggingface/accelerate/pull/3576
@emmanuel-ferdman made their first contribution in https://github.com/huggingface/accelerate/pull/3582
@yuanjua made their first contribution in https://github.com/huggingface/accelerate/pull/3581
@sorgfresser made their first contribution in https://github.com/huggingface/accelerate/pull/3587
@ShaohonChen made their first contribution in https://github.com/huggingface/accelerate/pull/3605
@leopardracer made their first contribution in https://github.com/huggingface/accelerate/pull/3621

Full Changelog: https://github.com/huggingface/accelerate/compare/v1.7.0...v1.8.0

Fetched April 7, 2026