v0.26.0 - MS-AMP Support, Critical Regression Fixes, and More
This release adds support for the MS-AMP (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs here. Introduced in https://github.com/huggingface/accelerate/pull/2232 by @muellerzr
In the prior release a new sampler for the DataLoader was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass use_seedable_sampler=True to the Accelerator. We will be propagating this up to the Trainer soon.
device_map we've made it possible to not returned grouped key results if desired in https://github.com/huggingface/accelerate/pull/2233device_map="cuda" etc thanks to @younesbelkada in https://github.com/huggingface/accelerate/pull/2254Many improvements to the docs have been made thanks to @stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to @pacman100 in https://github.com/huggingface/accelerate/pull/2288
A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. https://github.com/huggingface/accelerate/pull/2304 has now fixed this thanks to @pacman100
The DeepSpeed integration now also handles auto values better when making a configuration in https://github.com/huggingface/accelerate/pull/2313
Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in https://github.com/huggingface/accelerate/pull/2315For developers, we've made it much easier to run the tests on different devices with no change to the code thanks to @statelesshz in https://github.com/huggingface/accelerate/pull/2123 and https://github.com/huggingface/accelerate/pull/2235
offload_state_dict=True and dtype is specified by @fxmarty in https://github.com/huggingface/accelerate/pull/2116auto values for comm buffers by @stas00 in https://github.com/huggingface/accelerate/pull/2295offload_state_dict=True and dtype is specified by @fxmarty in https://github.com/huggingface/accelerate/pull/2116Big-Modeling] Harmonize device check to handle corner cases by @younesbelkada in https://github.com/huggingface/accelerate/pull/2254log_images for aim tracker by @Justin900429 in https://github.com/huggingface/accelerate/pull/2257check_tied_parameters_on_same_device by @SunMarc in https://github.com/huggingface/accelerate/pull/2218auto values for comm buffers by @stas00 in https://github.com/huggingface/accelerate/pull/2295prepare_data_loader by @izhx in https://github.com/huggingface/accelerate/pull/2310Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in https://github.com/huggingface/accelerate/pull/2315Full Changelog: https://github.com/huggingface/accelerate/compare/v0.25.0...v0.26.0
Fetched April 7, 2026