Accelerate 1.0.0 is here!
With accelerate 1.0, we are officially stating that the core parts of the API are now "stable" and ready for the future of what the world of distributed training and PyTorch has to handle. With these release notes, we will focus first on the major breaking changes to get your code fixed, followed by what is new specifically between 0.34.0 and 1.0.
To read more, check out our official blog here
dispatch_batches, split_batches, even_batches, and use_seedable_sampler to the Accelerator() should now be handled by creating an accelerate.utils.DataLoaderConfiguration() and passing this to the Accelerator() instead (Accelerator(dataloader_config=DataLoaderConfiguration(...)))Accelerator().use_fp16 and AcceleratorState().use_fp16 have been removed; this should be replaced by checking accelerator.mixed_precision == "fp16"Accelerator().autocast() no longer accepts a cache_enabled argument. Instead, an AutocastKwargs() instance should be used which handles this flag (among others) passing it to the Accelerator (Accelerator(kwargs_handlers=[AutocastKwargs(cache_enabled=True)]))accelerate.utils.is_tpu_available should be replaced with accelerate.utils.is_torch_xla_availableaccelerate.utils.modeling.shard_checkpoint should be replaced with split_torch_state_dict_into_shards from the huggingface_hub libraryaccelerate.tqdm.tqdm() no longer accepts True/False as the first argument, and instead, main_process_only should be passed in as a named argumentAfter long request, we finally have multiple model DeepSpeed support in Accelerate! (though it is quite early still). Read the full tutorial here, however essentially:
When using multiple models, a DeepSpeed plugin should be created for each model (and as a result, a separate config). a few examples are below:
(Where we train only one model, zero3, and another is used for inference, zero2)
from accelerate import Accelerator
from accelerate.utils import DeepSpeedPlugin
zero2_plugin = DeepSpeedPlugin(hf_ds_config="zero2_config.json")
zero3_plugin = DeepSpeedPlugin(hf_ds_config="zero3_config.json")
deepspeed_plugins = {"student": zero2_plugin, "teacher": zero3_plugin}
accelerator = Accelerator(deepspeed_plugins=deepspeed_plugins)
To then select which plugin to be used at a certain time (aka when calling prepare), we call `accelerator.state.select_deepspeed_plugin("name"), where the first plugin is active by default:
accelerator.state.select_deepspeed_plugin("student")
student_model, optimizer, scheduler = ...
student_model, optimizer, scheduler, train_dataloader = accelerator.prepare(student_model, optimizer, scheduler, train_dataloader)
accelerator.state.select_deepspeed_plugin("teacher") # This will automatically enable zero init
teacher_model = AutoModel.from_pretrained(...)
teacher_model = accelerator.prepare(teacher_model)
For disjoint models, separate accelerators should be used for each model, and their own .backward() should be called later:
for batch in dl:
outputs1 = first_model(**batch)
first_accelerator.backward(outputs1.loss)
first_optimizer.step()
first_scheduler.step()
first_optimizer.zero_grad()
outputs2 = model2(**batch)
second_accelerator.backward(outputs2.loss)
second_optimizer.step()
second_scheduler.step()
second_optimizer.zero_grad()
We've enabled MS-AMP support up to FSDP. At this time we are not going forward with implementing FSDP support with MS-AMP, due to design issues between both libraries that don't make them inter-op easily.
__reduce__ by @byi8220 in https://github.com/huggingface/accelerate/pull/3074_get_named_modules by @faaany in https://github.com/huggingface/accelerate/pull/3052skip_keys usage in forward hooks by @152334H in https://github.com/huggingface/accelerate/pull/3088torch.cuda.amp.GradScaler FutureWarning for pytorch 2.4+ by @Mon-ius in https://github.com/huggingface/accelerate/pull/3132Full Changelog: https://github.com/huggingface/accelerate/compare/v0.34.2...v1.0.0
Fetched April 7, 2026