TE MXFP8 support
We've added support for MXFP8 in our TransformerEngine integration. To use that, you need to set use_mxfp8_block_scaling in fp8_config. See nvidia docs [here]. (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling)
- Add support for TE MXFP8 recipe in accelerate by @pstjohn in https://github.com/huggingface/accelerate/pull/3688
FP16/BF16 Training for MPS devices
BF16 and FP16 support for MPS devices is finally here. You can now pass mixed_precision = "fp16" or "bf16" when training on a mac (fp16 requires torch 2.8 and bf16 requires torch 2.6)
- Add bf16/fp16 support for amp with mps device by @SunMarc in https://github.com/huggingface/accelerate/pull/3373
FSDP updates
The following PRs add respectively support to ignored_params and no_sync() for FSDPv2:
- feat: add ignored_params support for fsdp2 by @kmehant in https://github.com/huggingface/accelerate/pull/3731
- fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 by @EquationWalker in https://github.com/huggingface/accelerate/pull/3762
Mixed precision can now be passed as a dtype string from accelerate cli flag or fsdp_config in accelerate config file:
- feat: allow mixed precision policy as dtype by @kmehant in https://github.com/huggingface/accelerate/pull/3751
Nd-parallel updates
Some minor updates concerning nd-parallelism.
- Context Parallelism docs typos fixed by @sergiopaniego in https://github.com/huggingface/accelerate/pull/3761
- Feat: add to_json by @S1ro1 in https://github.com/huggingface/accelerate/pull/3743
- make torch_native_parallelism examples device agnostic by @yao-matrix in https://github.com/huggingface/accelerate/pull/3759
- [ND Parallel] Update examples, cleanup by @S1ro1 in https://github.com/huggingface/accelerate/pull/3737
Bump to Python 3.10
We've dropped support for python 3.9 as it reached EOL in October.
- Bump to python3.10 + update linter by @SunMarc in https://github.com/huggingface/accelerate/pull/3809
Lots of minor fixes:
- fix: CPU RAM efficient loading for nd or HSDP parallelisms by @kmehant in https://github.com/huggingface/accelerate/pull/3740
- xpu INT64 all_gather issue fixed in 2.9 by @yao-matrix in https://github.com/huggingface/accelerate/pull/3756
- Specify device_ids in torch.distributed.barrier for PartialState by @qgallouedec in https://github.com/huggingface/accelerate/pull/3744
- fix: specify device for process_tensor in example usage by @qgallouedec in https://github.com/huggingface/accelerate/pull/3755
- Lower complexity of get_balanced_memory by adding a set by @SamuelBarryCS in https://github.com/huggingface/accelerate/pull/3776
- Fix (skip) cuda cache flush when origin device is
cpuand offloaded tometaby @Qubitium in https://github.com/huggingface/accelerate/pull/3796 - Fix convert LayerNorm without bias to fp8 by @mjun0812 in https://github.com/huggingface/accelerate/pull/3725
- Add optional typing by @cyyever in https://github.com/huggingface/accelerate/pull/3769
- refactor: Use
within Accelerator.autocast()instead of__enter__()and__exit__()for more elegant style. by @EquationWalker in https://github.com/huggingface/accelerate/pull/3767 - switch XPU ccl backend to torch-builtin xccl in test_zero3_integration by @yao-matrix in https://github.com/huggingface/accelerate/pull/3773
- fix FSDP2 test case failure on XPU by @yao-matrix in https://github.com/huggingface/accelerate/pull/3771
- Fix tests by @SunMarc in https://github.com/huggingface/accelerate/pull/3722
- Protect import for device_mesh by @SunMarc in https://github.com/huggingface/accelerate/pull/3742
- Fix
SWANLAB_MODEby @SunMarc in https://github.com/huggingface/accelerate/pull/3808 - Fix tracking swanlab by @SunMarc in https://github.com/huggingface/accelerate/pull/3810
- refactor: nit change for get_parameters_from_modules (code debt) by @kmehant in https://github.com/huggingface/accelerate/pull/3815
- Remove deprecated FindTiedParametersResult by @cyyever in https://github.com/huggingface/accelerate/pull/3786
- Add optional typing by @cyyever in https://github.com/huggingface/accelerate/pull/3769
- remove mlflow from testing by @SunMarc in https://github.com/huggingface/accelerate/pull/3783
- enable 2 model hook ut cases on XPU by @yao-matrix in https://github.com/huggingface/accelerate/pull/3774
- Added Tip for better rendering by @sergiopaniego in https://github.com/huggingface/accelerate/pull/3781
- Fix typos by @cyyever in https://github.com/huggingface/accelerate/pull/3753
- fix: torch_npu import error in some envs by @yanyongyu in https://github.com/huggingface/accelerate/pull/3764
- Fix: typo makes tests fail by @S1ro1 in https://github.com/huggingface/accelerate/pull/3765
- fix Muti node CUDA error: invalid device ordinal #3775 by @RicardoDominguez in https://github.com/huggingface/accelerate/pull/3779
- use reset_peak_memory_stats on xpu by @yao-matrix in https://github.com/huggingface/accelerate/pull/3772
New Contributors
- @mjun0812 made their first contribution in https://github.com/huggingface/accelerate/pull/3725
- @sergiopaniego made their first contribution in https://github.com/huggingface/accelerate/pull/3761
- @EquationWalker made their first contribution in https://github.com/huggingface/accelerate/pull/3762
- @yanyongyu made their first contribution in https://github.com/huggingface/accelerate/pull/3764
- @RicardoDominguez made their first contribution in https://github.com/huggingface/accelerate/pull/3779
- @SamuelBarryCS made their first contribution in https://github.com/huggingface/accelerate/pull/3776
- @Qubitium made their first contribution in https://github.com/huggingface/accelerate/pull/3796
Full Changelog: https://github.com/huggingface/accelerate/compare/v1.10.1...v1.11.0
Fetched April 7, 2026
