v0.26.0 — Accelerate

Support for MS-AMP

This release adds support for the MS-AMP (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs here. Introduced in https://github.com/huggingface/accelerate/pull/2232 by @muellerzr

Core

In the prior release a new sampler for the DataLoader was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass use_seedable_sampler=True to the Accelerator. We will be propagating this up to the Trainer soon.

Big Model Inference

NPU support was added thanks to @statelesshz in https://github.com/huggingface/accelerate/pull/2222
When generating an automatic device_map we've made it possible to not returned grouped key results if desired in https://github.com/huggingface/accelerate/pull/2233
We now handle corner cases better when users pass device_map="cuda" etc thanks to @younesbelkada in https://github.com/huggingface/accelerate/pull/2254

FSDP and DeepSpeed

Many improvements to the docs have been made thanks to @stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to @pacman100 in https://github.com/huggingface/accelerate/pull/2288
A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. https://github.com/huggingface/accelerate/pull/2304 has now fixed this thanks to @pacman100
The DeepSpeed integration now also handles auto values better when making a configuration in https://github.com/huggingface/accelerate/pull/2313

Bits and Bytes

Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in https://github.com/huggingface/accelerate/pull/2315

Device Agnostic Testing

For developers, we've made it much easier to run the tests on different devices with no change to the code thanks to @statelesshz in https://github.com/huggingface/accelerate/pull/2123 and https://github.com/huggingface/accelerate/pull/2235

Bug Fixes

Check notebook launcher for 3090+ by @muellerzr in https://github.com/huggingface/accelerate/pull/2212
Fix dtype bug when offload_state_dict=True and dtype is specified by @fxmarty in https://github.com/huggingface/accelerate/pull/2116
fix tqdm wrapper to print when process id ==0 by @kashif in https://github.com/huggingface/accelerate/pull/2223
fix BFloat16 is not supported on MPS (#2226) by @jxysoft in https://github.com/huggingface/accelerate/pull/2227
Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
[deepspeed] fix setting auto values for comm buffers by @stas00 in https://github.com/huggingface/accelerate/pull/2295
Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in https://github.com/huggingface/accelerate/pull/2324
Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
Bring old seed technique back by @muellerzr in https://github.com/huggingface/accelerate/pull/2319

Major Contributors

@statelesshz for their work on device-agnostic testing and NPU support
@stas00 for many docfixes when it comes to DeepSpeed and FSDP

General Changelog

add missing whitespace by @stas00 in https://github.com/huggingface/accelerate/pull/2206
MNT Delete the delete doc workflows by @BenjaminBossan in https://github.com/huggingface/accelerate/pull/2217
Update docker images by @muellerzr in https://github.com/huggingface/accelerate/pull/2213
Add allgather check for xpu by @abhilash1910 in https://github.com/huggingface/accelerate/pull/2199
Check notebook launcher for 3090+ by @muellerzr in https://github.com/huggingface/accelerate/pull/2212
Fix dtype bug when offload_state_dict=True and dtype is specified by @fxmarty in https://github.com/huggingface/accelerate/pull/2116
fix tqdm wrapper to print when process id ==0 by @kashif in https://github.com/huggingface/accelerate/pull/2223
[data_loader] expand the error message by @stas00 in https://github.com/huggingface/accelerate/pull/2221
Update the 'Frameworks using Accelerate' section to include Amphion by @RMSnow in https://github.com/huggingface/accelerate/pull/2225
[Docs] Add doc for cpu/disk offload by @SunMarc in https://github.com/huggingface/accelerate/pull/2231
device agnostic testing by @statelesshz in https://github.com/huggingface/accelerate/pull/2123
Make cleaning optional for device map by @muellerzr in https://github.com/huggingface/accelerate/pull/2233
Add npu support to big model inference by @statelesshz in https://github.com/huggingface/accelerate/pull/2222
fix the DS failing test by @pacman100 in https://github.com/huggingface/accelerate/pull/2237
Fix nb tests by @muellerzr in https://github.com/huggingface/accelerate/pull/2230
fix BFloat16 is not supported on MPS (#2226) by @jxysoft in https://github.com/huggingface/accelerate/pull/2227
Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
[Big-Modeling] Harmonize device check to handle corner cases by @younesbelkada in https://github.com/huggingface/accelerate/pull/2254
Support log_images for aim tracker by @Justin900429 in https://github.com/huggingface/accelerate/pull/2257
Integrate MS-AMP Support for FP8 as a seperate backend by @muellerzr in https://github.com/huggingface/accelerate/pull/2232
refactor deepspeed dataloader prepare logic by @pacman100 in https://github.com/huggingface/accelerate/pull/2238
device agnostic deepspeed&fsdp testing by @statelesshz in https://github.com/huggingface/accelerate/pull/2235
Solve CUDA issues by @muellerzr in https://github.com/huggingface/accelerate/pull/2272
Uninstall DVC in the Trainer tests by @muellerzr in https://github.com/huggingface/accelerate/pull/2271
Rm DVCLive from test reqs as latest version causes failures by @muellerzr in https://github.com/huggingface/accelerate/pull/2279
typo fix by @stas00 in https://github.com/huggingface/accelerate/pull/2276
Add condition before using check_tied_parameters_on_same_device by @SunMarc in https://github.com/huggingface/accelerate/pull/2218
[doc] FSDP improvements by @stas00 in https://github.com/huggingface/accelerate/pull/2274
[deepspeed docs] auto-values aren't being covered by @stas00 in https://github.com/huggingface/accelerate/pull/2286
Improve FSDP config usability by @pacman100 in https://github.com/huggingface/accelerate/pull/2288
[doc] language fixes by @stas00 in https://github.com/huggingface/accelerate/pull/2292
Bump tj-actions/changed-files from 22.2 to 41 in /.github/workflows by @dependabot in https://github.com/huggingface/accelerate/pull/2300
add back dvclive to tests by @dberenbaum in https://github.com/huggingface/accelerate/pull/2280
Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
make test_state_checkpointing device agnostic by @statelesshz in https://github.com/huggingface/accelerate/pull/2290
[deepspeed] documentation by @stas00 in https://github.com/huggingface/accelerate/pull/2296
Add more missing items by @muellerzr in https://github.com/huggingface/accelerate/pull/2309
Update docs: Add warning for device_map=None for load_checkpoint_and_dispatch by @PhilJd in https://github.com/huggingface/accelerate/pull/2308
[deepspeed] fix setting auto values for comm buffers by @stas00 in https://github.com/huggingface/accelerate/pull/2295
DeepSpeed refactoring by @pacman100 in https://github.com/huggingface/accelerate/pull/2313
Fix DeepSpeed related regression by @pacman100 in https://github.com/huggingface/accelerate/pull/2304
Update test_deepspeed.py by @pacman100 in https://github.com/huggingface/accelerate/pull/2323
Bring old seed technique back by @muellerzr in https://github.com/huggingface/accelerate/pull/2319
Fix batch_size sanity check in prepare_data_loader by @izhx in https://github.com/huggingface/accelerate/pull/2310
Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in https://github.com/huggingface/accelerate/pull/2315
Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in https://github.com/huggingface/accelerate/pull/2324

New Contributors

@fxmarty made their first contribution in https://github.com/huggingface/accelerate/pull/2116
@RMSnow made their first contribution in https://github.com/huggingface/accelerate/pull/2225
@jxysoft made their first contribution in https://github.com/huggingface/accelerate/pull/2227
@vanbasten23 made their first contribution in https://github.com/huggingface/accelerate/pull/2242
@Justin900429 made their first contribution in https://github.com/huggingface/accelerate/pull/2257
@dependabot made their first contribution in https://github.com/huggingface/accelerate/pull/2300
@sudhakarsingh27 made their first contribution in https://github.com/huggingface/accelerate/pull/2305
@PhilJd made their first contribution in https://github.com/huggingface/accelerate/pull/2308
@izhx made their first contribution in https://github.com/huggingface/accelerate/pull/2310
@poedator made their first contribution in https://github.com/huggingface/accelerate/pull/2315

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.25.0...v0.26.0