releases.shpreview
Hugging Face/Accelerate

Accelerate

$npx -y @buildinternet/releases show accelerate
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases1Avg0/wkVersionsv1.13.0
Jun 8, 2023
v0.20.3: Patch release
  • Reset dataloader end_of_datalaoder at each iter in #1562 by @sgugger
v0.20.2: Patch release
  • fix the typo when setting the "_accelerator_prepared" attribute in #1560 by @Yura52
  • [core] Fix possibility to pass] NoneType objects in prepare in #1561 by @younesbelkada
Jun 7, 2023
v0.20.1: Patch release
  • Avoid double wrapping of all accelerate.prepare objects by @muellerzr in #1555
  • Fix load_state_dict when there is one device and disk by @sgugger in #1557
v0.20.0: MPS and fp4 support on Big Model Inference, 4-bit QLoRA, Intel GPU, Distributed Inference, and much more!

Big model inference

Support has been added to run device_map="auto" on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.

  • Add mps support to big inference modeling by @SunMarc in #1545
  • Adds fp4 support for model dispatching by @younesbelkada in #1505

4-bit QLoRA Support

  • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458

Distributed Inference Utilities

This version introduces a new Accelerator.split_between_processes utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more here

Introduce XPU support for Intel GPU

  • Intel GPU support initialization by @abhilash1910 in #1118

Add support for the new PyTorch XLA TPU runtime

  • Accelerate now supports the latest TPU runtimes #1393, #1385

A new optimizer method: LocalSGD

  • This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by @searchivarius in #1378

Papers with 🤗 Accelerate

  • We now have an entire section of the docs dedicated to official paper implementations and citations using the framework #1399, see it live here

Breaking changes

logging_dir has been fully deprecated, please use project_dir or a Project_configuration

What's new?

  • use existing mlflow experiment if exists by @Rusteam in #1403
  • changes required for DS integration by @pacman100 in #1406
  • fix deepspeed failing tests by @pacman100 in #1411
  • Make mlflow logging dir optional by @mattplo-decath in #1413
  • Fix bug on ipex for diffusers by @abhilash1910 in #1426
  • Improve Slack Updater by @muellerzr in #1433
  • Let quality yell at the user if it's a version difference by @muellerzr in #1438
  • Ensure that it gets installed by @muellerzr in #1439
  • [core] Introducing CustomDtype enum for custom dtypes by @younesbelkada in #1434
  • Fix XPU by @muellerzr in #1440
  • Make sure torch compiled model can also be unwrapped by @patrickvonplaten in #1437
  • fixed: ZeroDivisionError: division by zero by @sreio in #1436
  • fix potential OOM when resuming with multi-GPU training by @exhyy in #1444
  • Fixes in infer_auto_device_map by @sgugger in #1441
  • Raise error when logging improperly by @muellerzr in #1446
  • Fix ci by @muellerzr in #1447
  • Distributed prompting/inference utility by @muellerzr in #1410
  • Add to by @muellerzr in #1448
  • split_between_processes by @stevhliu in #1449
  • [docs] Replace state.rank -> process_index by @pcuenca in #1450
  • Auto multigpu logic by @muellerzr in #1452
  • Update with cli instructions by @muellerzr in #1453
  • Adds in_order argument that defaults to False, to log in order. by @JulesGM in #1262
  • fix error for CPU DDP using trainer api. by @sywangyi in #1455
  • Refactor and simplify xpu device in state by @abhilash1910 in #1456
  • Document how to use commands with python module instead of argparse by @muellerzr in #1457
  • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
  • Fix skip first batch being perminant by @muellerzr in #1466
  • update conversion of layers to retain original data type. by @avisinghal6 in #1467
  • Check for xpu specifically by @muellerzr in #1472
  • update register_empty_buffer to match torch args by @NouamaneTazi in #1465
  • Update gradient accumulation docs, and remove redundant example by @iantbutler01 in #1461
  • Imrpove sagemaker by @muellerzr in #1470
  • Split tensors as part of split_between_processes by @muellerzr in #1477
  • Move to device by @muellerzr in #1478
  • Fix gradient state bugs in multiple dataloader by @Ethan-yt in #1483
  • Add rdzv-backend by @muellerzr in #1490
  • Only use IPEX if available by @muellerzr in #1495
  • Update README.md by @lyhue1991 in #1493
  • Let gather_for_metrics always run by @muellerzr in #1496
  • Use empty like when we only need to create buffers by @thomasw21 in #1497
  • Allow key skipping in big model inference by @sgugger in #1491
  • fix crash when ipex is installed and torch has no xpu by @sywangyi in #1502
  • [bnb] Add fp4 support for dispatch by @younesbelkada in #1505
  • Fix 4bit model on multiple devices by @SunMarc in #1506
  • adjust overriding of model's forward function by @prathikr in #1492
  • Add assertion when call prepare with deepspeed config. by @tensimiku in #1468
  • NVME path support for deepspeed by @abhilash1910 in #1484
  • should set correct dtype to ipex optimize and use amp logic in native… by @sywangyi in #1511
  • Swap env vars for XPU and IPEX + CLI by @muellerzr in #1513
  • Fix a bug when parameters tied belong to the same module by @sgugger in #1514
  • Fixup deepspeed/cli tests by @muellerzr in #1526
  • Refactor mp into its own wrapper by @muellerzr in #1527
  • Check tied parameters by @SunMarc in #1529
  • Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by @muellerzr in #1531
  • Officially support naive PP for quantized models + PEFT by @younesbelkada in #1523
  • remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by @sywangyi in #1503
  • Prevent using extra VRAM for static device_map by @LSerranoPEReN in #1536
  • Update deepspeed.mdx by @LiamSwayne in #1541
  • Update performance.mdx by @LiamSwayne in #1543
  • Update deferring_execution.mdx by @LiamSwayne in #1544
  • Apply deprecations by @muellerzr in #1537
  • Add mps support to big inference modeling by @SunMarc in #1545
  • [documentation] grammar fixes in gradient_synchronization.mdx by @LiamSwayne in #1547
  • Eval mode by @muellerzr in #1540
  • Update migration.mdx by @LiamSwayne in #1549

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @will-cromar
    • Support TPU v4 with new PyTorch/XLA TPU runtime (#1393)
    • Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (#1385)
  • @searchivarius
    • Adding support for local SGD. (#1378)
  • @abhilash1910
    • Intel GPU support initialization (#1118)
    • Fix bug on ipex for diffusers (#1426)
    • Refactor and simplify xpu device in state (#1456)
    • NVME path support for deepspeed (#1484)
  • @sywangyi
    • fix error for CPU DDP using trainer api. (#1455)
    • fix crash when ipex is installed and torch has no xpu (#1502)
    • should set correct dtype to ipex optimize and use amp logic in native… (#1511)
    • remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (#1503)
  • @Ethan-yt
    • Fix gradient state bugs in multiple dataloader (#1483)
May 8, 2023
v0.19.0: IPEX Support, Foundations for Transformers Integration, FP8 for Ada Lovelace GPUs, and Squashed Bugs

What's New

  • Support for Intel IPEX support has been added, check out the how-to guide now!
  • Various modifications have been added to begin work on having 🤗 Accelerate be the foundation for the Trainer, keep an eye on the repos to see how our progress is coming along!
  • FP8 training is now supported on Ada Lovelance GPUs
  • The wandb integration now supports logging of images and tables through tracker.log_images and tracker.log_tables respectively
  • Many, many squashed bugs! (see the full detailed report for just what they were)
  • 17 new contributors to the framework, congratulations to all who took their first step! 🚀

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.18.0...v0.19.0

Mar 24, 2023
v0.18.0: GradientState enhancements and Big Model Inference Fixes

What's Changed

  • A new GradientAccumulationPlugin has been added to handle more configurations with the GradientState. Specifically you can optionally disable having Accelerate automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation
  • Some fixes related to the launch configuration and TPU launches were adjusted, and the dynamo_backend warning has been silenced.
  • Big model inference saw a number of fixes related to linear layers, drop_last on linear layers, tied weight loading, and handling of multiple tied parameters
  • A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher

Breaking Changes

  • find_tied_parameters now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.

What's New?

New Contributors

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.17.1...v0.18.0

Mar 13, 2023
v0.17.1: Patch release
  • Fix CPU error always being raised by @muellerzr in #1175
  • fixed typo in launch.py tpu_pod_launcher by @hackpert in #1180
  • Support special mapping of dtypes when preparing device map by @sgugger in #1179
Mar 9, 2023
v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training

PyTorch 2.0 support

This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile or not and then customize the options in accelerate.config or via a TorchDynamoPlugin.

  • update support for torch dynamo compile by @pacman100 in #1150

Process Control Enhancements

This release adds a new PartialState, which contains most of the capabilities of the AcceleratorState however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process when utilizing classes such as the Tracking API, as these now will automatically use only the main process for their work by default.

  • Refactor process executors to be in AcceleratorState by @muellerzr in #1039

TPU Pod Support (Experimental)

Launching from TPU pods is now supported, please see this issue for more information

  • Introduce TPU Pod launching to accelerate launch by @muellerzr in #1049

FP8 mixed precision training (Experimental)

This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).

  • Fp8 integration by @sgugger in #1086

What's new?

  • v0.17.0.dev0 by @sgugger (direct commit on main)
  • Deepspeed param check by @dhar174 in #1015
  • enabling mps device by default and removing related config by @pacman100 in #1030
  • fix: links to gradient synchronization by @prassanna-ravishankar in #1035
  • do not scale gradient in bf16 mode by @kashif in #1036
  • Pass keywords arguments of backward function deeper to DeepSpeed by @DistinctVision in #1037
  • Add daily slack notifier for nightlies by @muellerzr in #1042
  • Make sure direct parameters are properly set on device by @sgugger in #1043
  • Add cpu_offload_with_hook by @sgugger in #1045
  • Update quality tools to 2023 by @sgugger in #1046
  • Load tensors directly on device by @sgugger in #1028
  • Fix cpu_offload_with_hook code snippet by @pcuenca in #1047
  • Use create_task by @muellerzr in #1052
  • Fix args by adding in the defaults by @muellerzr in #1053
  • deepspeed hidden_size auto value default fixes by @pacman100 in #1060
  • Introduce PartialState by @muellerzr in #1055
  • Flag for deprecation by @muellerzr in #1061
  • Try with this by @muellerzr in #1062
  • Update integrations by @muellerzr in #1063
  • Swap utils over to use PartialState by @muellerzr in #1065
  • update fsdp docs and removing deepspeed version pinning by @pacman100 in #1059
  • Fix/implement process-execution decorators on the Accelerator by @muellerzr in #1070
  • Refactor state and make PartialState first class citizen by @muellerzr in #1071
  • Add error if passed --config_file does not exist by @muellerzr in #1074
  • SageMaker image_uri is now optional by @<NOT FOUND> in #1077
  • Allow custom SageMaker Estimator arguments by @<NOT FOUND> in #1080
  • Fix tpu_cluster arg by @muellerzr in #1081
  • Update complete_cv_example.py by @fcossio in #1082
  • Added SageMaker local mode config section by @<NOT FOUND> in #1084
  • Fix config by @muellerzr in #1090
  • adds missing "lfs" in pull by @CSchoel in #1091
  • add multi_cpu support to reduce by @alex-hh in #1094
  • Update README.md by @BM-K in #1100
  • Tracker rewrite and lazy process checker by @muellerzr in #1079
  • Update performance.mdx by @fcossio in #1107
  • Attempt to unwrap tracker. by @pcuenca in #1109
  • TensorBoardTracker: wrong arg def by @stas00 in #1111
  • Actually raise if exception by @muellerzr in #1124
  • Add test for ops and fix reduce by @muellerzr in #1122
  • Deep merge SageMaker additional_args, allowing more flexible configuration and env variable support by @dbpprt in #1113
  • Move dynamo.optimize to the end of model preparation by @ymwangg in #1128
  • Refactor launch for greater extensibility by @Yard1 in #1123
  • [Big model loading] Correct GPU only loading by @patrickvonplaten in #1121
  • Add tee and role to launch by @muellerzr in #1132
  • Expand warning and grab all GPUs available by default by @muellerzr in #1134
  • Fix multinode with GPU ids when each node has 1 by @muellerzr in #1127
  • deepspeed dataloader prepare fix by @pacman100 in #1126
  • fix ds dist init kwargs issue by @pacman100 in #1138
  • fix lr scheduler issue by @pacman100 in #1140
  • fsdp bf16 enable autocast by @pacman100 in #1125
  • Fix notebook_launcher by @muellerzr in #1141
  • fix partial state by @pacman100 in #1144
  • FSDP enhancements and fixes by @pacman100 in #1145
  • Fixed typos in notebook by @SamuelLarkin in #1146
  • Include a note in the gradient synchronization docs on "what can go wrong" and show the timings by @muellerzr in #1153
  • [Safetensors] Relax missing metadata constraint by @patrickvonplaten in #1151
  • Solve arrow keys being environment dependant for accelerate config by @p1atdev (direct commit on main)
  • Load custom state to cpu by @Guangxuan-Xiao in #1156
  • :memo: add a couple more trackers to the docs by @nateraw in #1158
  • Let GradientState know active dataloaders and reset the remainder by @muellerzr in #1162
  • Attempt to fix import error when PyTorch is build without torch.distributed module by @mfuntowicz in #1108
  • [Accelerator] Fix issue with 8bit models by @younesbelkada in #1155
  • Document skip_first_batches in the checkpoint usage guides by @muellerzr in #1164
  • Fix what files get deleted through total_limit by @muellerzr in #1165
  • Remove outdated command directions and use in tests by @muellerzr in #1166

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Yard1
    • Refactor launch for greater extensibility (#1123)
Jan 31, 2023
v0.16.0: Improved and Interactive Documentation, DataLoader Improvements

New code exploration doc tool

A new interactive tool has been introduced to the documentation to help users quickly learn how to utilize features of the framework before providing more details on them as shown below:

Not only does it provide a code diff, but it also includes an explanation and links to more resources the user should check out to learn more:

Try it out today in the docs

  • Add in code exploration tool to docs by @muellerzr in #1014
  • Light vs dark theme based on pick by @muellerzr in #1023

Skip batches in dataloaders

When resuming training, you can more efficiently skip batches in your dataloader with the new skip_first_batches function (also available as a method on your Accelerator).

  • Efficiently skip batches in a dataloader by @sgugger in #1002

DeepSpeed integration enhancements:

A new ZeRO-3 init context manager is added to provide granular control to users in situations involving nested/multiple models. Refactoring of DeepSpeed Config file support to remove ambiguity between it and Accelerate config.

Adding support for auto entries in the DeeSpeed config file to be filled via the accelerate launch command. Try it out today by referring to the section Things to note when using DeepSpeed Config File

  • ds zero-3 init context manager by @pacman100 in #932
  • raise error for duplicate accelerate config values when using deepspeed_config_file by @pacman100 in #941

What's new?

  • Flag to silence subprocess.CalledProcessError in launch by @Cyberes in #902
  • Add usage examples by @muellerzr in #904
  • Expand sanity checks by @muellerzr in #905
  • Fix conditional by @muellerzr in #907
  • fix issue that amp bf16 does not work for cpu in env with cuda. by @sywangyi in #906
  • fsdp enhancements by @pacman100 in #911
  • Fix typos accelerate -> accelerator by @pcuenca in #915
  • 🚨🚨🚨 Act on deprecations 🚨🚨🚨 by @muellerzr in #917
  • fix accelerate test failure with cpu config by @sywangyi in #909
  • Introduce project_dir and limit the number of saved checkpoints by @muellerzr in #916
  • Specify inference by @muellerzr in #921
  • Support init_on_device by @thomasw21 in #926
  • ds-z3-init and prepending ds env variables with ACCELERATE_ by @pacman100 in #928
  • Honor model dtype in load_checkpoint by @sgugger in #920
  • ds zero-3 init context manager by @pacman100 in #932
  • Fix silly typo by @tornikeo in #939
  • add mixed_precision_type property to AcceleratorState by @pacman100 in #935
  • fix batch size in prepare_dataloader for iterable datasets by @sanderland in #937
  • fix mp related test fails by @pacman100 in #943
  • Fix tracker by @muellerzr in #942
  • Fix offload when weights are on the GPU by @sgugger in #945
  • raise error for duplicate accelerate config values when using deepspeed_config_file by @pacman100 in #941
  • Add is_initialized method and refactor by @muellerzr in #949
  • Fix DeepSpeed tests by @muellerzr in #950
  • Don't automatically offload buffers when loading checkpoints by @sgugger in #951
  • Typo fix in src/accelerate/utils/modeling.py by @ryderwishart in #955
  • support master port when using ds multi-node launcher by @pacman100 in #959
  • Allowing encoded configuration for DeepSpeed by @cli99 in #895
  • Update README.md by @Don9wanKim in #968
  • Raise minimum version for distrib launch by @muellerzr in #978
  • Fix tied parameters test in big model inference by @sgugger in #979
  • Fix type error on line 36 by @dhar174 in #981
  • Ensure that last batch doesn't get dropped if perfectly even in gather_for_metrics by @muellerzr in #982
  • Skip wandb test for now by @muellerzr in #984
  • Fix test for converting tensor to proper dtype by @sgugger in #983
  • in sync with trfs, removing style_doc utils and using doc-builder instead by @pacman100 in #988
  • Add new release_memory util by @muellerzr in #990
  • adding support for kwargs in load_state by @pacman100 in #989
  • Fix scheduler incorrect steps when gradient accumulation enabled by @markovalexander in #999
  • Fix parameters tying in dispatch_model by @sgugger in #1000
  • improve deepspeed notes by @stas00 in #1003
  • Update toctree by @muellerzr in #1008
  • Add styleguide by @muellerzr in #1007
  • Maintain accumulation steps by @muellerzr in #1011
  • Saving and loading state hooks by @patrickvonplaten in #991
  • Fix test introduced in PR and introduce AcceleratorTestCase by @muellerzr in #1016
  • Allow the torch device to be set with an env var by @Yard1 in #1009
  • Fix import of LrScheduler by @sgugger in #1017
  • Don't force mixed precision as no in examples by @sgugger in #1018
  • Include steppage in performance docs by @muellerzr in #1013
  • Fix env var by @muellerzr in #1024
  • Change default for keep_fp32_wrapper by @muellerzr in #1025
  • Fix slow test by keeping tied weights on the same GPU by @sgugger in #1026
  • Start of adding examples by @muellerzr in #1001
  • More improvements to docstrings + examples by @muellerzr in #1010
  • With example by @muellerzr in #1027
  • sagemaker launcher fixes by @pacman100 in #1031
Dec 2, 2022
v0.15.0: Pytorch 2.0 stack support

PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack and you can try it using Accelerate on any model by using the dynamo_backend argument of the Accelerator, or when filling your config with accelerate config.

Note that to get the best performance, we recommend:

  • using an Ampere GPU (or more recent)
  • sticking to fixed shaped for now
  • Add support for torch dynamo by @sgugger in #829

New CLI commands

  • Added two new commands, accelerate config update and accelerate config default. The first will update a config file to have the latest keys added from latter releases of Accelerate, and the second will create a default configuration file automatically mimicking write_default_config() introduced in #851 and #853 by @muellerzr
  • Also introduced a filterable help for accelerate launch which will show options relevant to the choices shown, such as accelerate launch --multi_gpu will show launch parameters relevant to multi-gpu training.

What's new?

  • fix 🐛 by @pacman100 in #836
  • Deepspeed example should use gather_for_metrics by @HammadB in #821
  • Highlight selection with pretty colors by @muellerzr in #839
  • Add join_uneven_inputs context manager to Accelerator by @Chris-hughes10 in #820
  • Introduce default-config command by @muellerzr in #840
  • Fix log error and add log level to get_logger by @muellerzr in #842
  • Fix if/else by @muellerzr in #849
  • Fix complete_cv example by @muellerzr in #848
  • Refactor Accelerate config and introduce a multi-argument CLI interface by @muellerzr in #851
  • Clean up, add update command by @muellerzr in #853
  • Revert "Update pr docs actions by @mishig25 in #827)"
  • Switch default log to warn by @muellerzr in #859
  • Remove mixed precision hook as part of the unwrap_model by @muellerzr in #860
  • update deepspeed error message wrt batch_size by @pacman100 in #861
  • fix failing deepspeed test by @pacman100 in #868
  • Even more log level refined, leave alone if not explicitly set by @muellerzr in #871
  • Solve pickling issues by @muellerzr in #872
  • Spring cleaning by @muellerzr in #865
  • fixing lr_scheduler prepare issue when using pytorch nightly by @pacman100 in #878
  • fix fsdp state_dict_config because of PyTorch changes by @pacman100 in #877
  • Update deprecated logging warn by @SHi-ON in #881
  • fix a bug by @xiaohu2015 in #887
  • Allow safetensors offload by @sgugger in #873
  • fixing lr scheduler for pytorch nightly by @pacman100 in #884
  • Prefix all accelerate env vars with ACCELERATE by @muellerzr in #890
  • fix prefix issues in tests by @pacman100 in #891
  • Fix windows cli selector by @muellerzr in #893
  • Better description for improper kwargs by @muellerzr in #894
  • Support bfloat16 in load_offloaded_weight by @sgugger in #892

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Chris-hughes10
    • Add join_uneven_inputs context manager to Accelerator (#820)
Nov 8, 2022
v0.14.0: Megatron-LM integration and support for PyTorch 1.13

Megatron LM integration

Accelerate now supports Megatron-LM for the three model classes (BERT, GPT-2 and T5). You can learn more in the documentation.

  • Megatron-LM integration by @pacman100 in #667
  • ensure megatron is 2.2.0+ by @jeffra in #755
  • updating docs to use fork of megatron-lm and minor example/docs fix by @pacman100 in #766
  • adding support to return logits and generate for Megatron-LM GPT models by @pacman100 in #819

PyTorch 1.13 support

Fixes a bug that returned SIGKILL errors on Windows.

  • Isolate distrib_run by @muellerzr in #828

Kaggle support with the notebook_launcher

With Kaggle now giving instances with two T4 GPUs, Accelerate can leverage this to do multi-gpu training from the notebook

  • Work in kaggle! by @muellerzr in #783

What's new?

  • Add non_blocking kwarg to send_to_device() by @NouamaneTazi in #607
  • [ds launcher] un-hijack PYTHONPATH by @stas00 in #741
  • Fix num_processes is not defined by @muellerzr in #746
  • [Device map] nn.Parameter don't have children by @patrickvonplaten in #747
  • Use HTML relative paths for tiles by @lewtun in #749
  • Add gpu_ids to SageMakerConfig though it should never be set by @muellerzr in #751
  • Change num_cpu_threads_per_process default by @muellerzr in #753
  • Return unclipped gradient from grad_clip_norm_ by @samuelstevens in #756
  • refactor by @pacman100 in #758
  • update docs by @pacman100 in #759
  • Only wrap modules in DDP if they require grad by @samuelstevens in #761
  • Move io_same_device hook to before attach_align_device hook on cpu_offload and disk_offload. by @piEsposito in #768
  • Regression cli tests by @muellerzr in #772
  • Fix number of devices in get_balanced_memory by @sgugger in #774
  • Fix all github actions issues + depreciations by @muellerzr in #773
  • Fix flakey wandb test by @muellerzr in #775
  • Add defaults for launchers by @muellerzr in #778
  • Allow BatchSamplerShard to not even out batches by @sgugger in #776
  • Make rich toggleable and seperate out a new environment utility file by @muellerzr in #779
  • Add same_network + docs by @muellerzr in #780
  • fix transformers tests by @ArthurZucker in #777
  • Add Dev Container configuration by @Chris-hughes10 in #782
  • separate dataloader generator from sampler generator by @pacman100 in #789
  • Consider top-level buffers when computing infer_auto_device_map by @younesbelkada in #792
  • Add even_batches keyword to Accelerator by @Chris-hughes10 in #781
  • Fix device_map="auto" on CPU-only envs by @sgugger in #797
  • Fix extraction of state dict in offload by @sgugger in #795
  • fix: add pdsh as default launcher by @zanussbaum in #800
  • Deal with optimizer.differentiable in PyTorch 1.13.0 by @comaniac in #803
  • Introduce a pod-config command by @muellerzr in #802
  • Refactor CLI to improve readability by @muellerzr in #810
  • adding support to pickle and unpickle AcceleratedOptimizer by @pacman100 in #811
  • add recurse argument in remove_hook_from_module by @younesbelkada in #812
  • Act on deprecations by @muellerzr in #813
  • Mlflow-tracker-v2 🔥 by @nbroad1881 in #794
  • Update CLI docs and use mps rather than mps_device by @muellerzr in #814
  • Rename pod-config to tpu-config + docs by @muellerzr in #818
  • Update docs by @muellerzr in #823
  • rename sklearn to proper dep by @muellerzr in #825
  • Rename by @muellerzr in #824
  • Update pr docs actions by @mishig25 in #827

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Chris-hughes10
    • Add Dev Container configuration (#782)
    • Add even_batches keyword to Accelerator (#781)
Oct 17, 2022
v0.13.2 Patch release
  • [Device map] nn.Parameter don't have children in #747 by @patrickvonplaten
Oct 7, 2022
v0.13.1 Patch release
  • Fix num_processes is not defined #746 by @muellerzr
Oct 5, 2022
v0.13.0 Launcher update (multinode and GPU selection) and mutliple bug fixes

Better multinode support in the launcher

The accelerate command launch did not work well for distributed training using several machines. This is fixed in this version.

  • Use torchrun for multinode by @muellerzr in #631
  • Fix multi-node issues from launch by @muellerzr in #672

Launch training on specific GPUs only

Instead of prefixing your launch command with CUDA_VISIBLE_DEVICES=xxx you can now specify the GPUs you want to use in your Accelerate config.

  • Allow for GPU-ID specification on CLI by @muellerzr in #732

Better tracebacks and rich support

The tracebacks are now cleaned up to avoid printing several times the same error, and rich is integrated as an optional dependency.

  • Integrate Rich into Accelerate by @muellerzr in #613
  • Make rich an optional dep by @muellerzr in #673

What's new?

  • Fix typo in docs/index.mdx by @mishig25 in #610
  • Fix DeepSpeed CI by @muellerzr in #612
  • Added GANs example to examples by @EyalMichaeli in #619
  • Fix example by @muellerzr in #620
  • Update README.md by @ezhang7423 in #622
  • Fully remove subprocess from the multi-gpu launcher by @muellerzr in #623
  • M1 mps fixes by @pacman100 in #625
  • Fix multi-node issues and simplify param logic by @muellerzr in #627
  • update MPS support docs by @pacman100 in #629
  • minor tracker fixes for complete* examples by @pacman100 in #630
  • Put back in place the guard by @muellerzr in #634
  • make init_trackers to launch on main process by @Gladiator07 in #642
  • remove check for main process for trackers initialization by @Gladiator07 in #643
  • fix link by @philschmid in #645
  • Add static_graph arg to DistributedDataParallelKwargs. by @rom1504 in #637
  • Small nits to grad accum docs by @muellerzr in #656
  • Saving hyperparams in yaml file for Tensorboard for #521 by @Shreyz-max in #657
  • Use debug for loggers by @muellerzr in #655
  • Improve docstrings more by @muellerzr in #666
  • accelerate bibtex by @pacman100 in #660
  • Cache torch_tpu check by @muellerzr in #670
  • Manim animation of big model inference by @muellerzr in #671
  • Add aim tracker for accelerate by @muellerzr in #649
  • Specify local network on multinode by @muellerzr in #674
  • Test for min torch version + fix all issues by @muellerzr in #638
  • deepspeed enhancements and fixes by @pacman100 in #676
  • DeepSpeed launcher related changes by @pacman100 in #626
  • adding torchrun elastic params by @pacman100 in #680
  • :bug: fix by @pacman100 in #683
  • Fix skip in dispatch dataloaders by @sgugger in #682
  • Clean up DispatchDataloader a bit more by @sgugger in #686
  • rng state sync for FSDP by @pacman100 in #688
  • Fix DataLoader with samplers that are batch samplers by @sgugger in #687
  • fixing support for Apple Silicon GPU in notebook_launcher by @pacman100 in #695
  • fixing rng sync when using custom sampler and batch_sampler by @pacman100 in #696
  • Improve init_empty_weights to override tensor constructor by @thomasw21 in #699
  • override DeepSpeed grad_acc_steps from accelerator obj by @pacman100 in #698
  • [doc] Fix 404'd link in memory usage guides by @tomaarsen in #702
  • Add in report generation for test failures and make fail-fast false by @muellerzr in #703
  • Update runners with report structure, adjust env variable by @muellerzr in #704
  • docs: examples readability improvements by @ryanrussell in #709
  • docs: utils readability fixups by @ryanrussell in #711
  • refactor(test_tracking): key_occurrence readability fixup by @ryanrussell in #710
  • docs: hooks readability improvements by @ryanrussell in #712
  • sagemaker fixes and improvements by @pacman100 in #708
  • refactor(accelerate): readability improvements by @ryanrussell in #713
  • More docstring nits by @muellerzr in #715
  • Allow custom device placements for different objects by @sgugger in #716
  • Specify gradients in model preparation by @muellerzr in #722
  • Fix regression issue by @muellerzr in #724
  • Fix default for num processes by @sgugger in #726
  • Build and Release docker images on a release by @muellerzr in #725
  • Make running tests more efficient by @muellerzr in #611
  • Fix old naming by @muellerzr in #727
  • Fix issue with one-cycle logic by @muellerzr in #728
  • Remove auto-bug label in issue template by @sgugger in #735
  • Add a tutorial on proper benchmarking by @muellerzr in #734
  • Add an example zoo to the documentation by @muellerzr in #737
  • trlx by @muellerzr in #738
  • Fix memory leak by @muellerzr in #739
  • Include examples for CI by @muellerzr in #740
  • Auto grad accum example by @muellerzr in #742
Aug 4, 2022
v0.12.0 New doc, gather_for_metrics, balanced device map and M1 support

New documentation

The whole documentation has been revamped, just go look at it here!

  • Complete revamp of the docs by @muellerzr in #495

New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the gather your did in evaluation by gather_for_metrics.

  • Reenable Gather for Metrics by @muellerzr in #590
  • Fix gather_for_metrics by @muellerzr in #578
  • Add a gather_for_metrics capability by @muellerzr in #540

Balanced device maps

When loading big models for inference, device_map="auto" used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the documentation.

  • M1 GPU mps device integration by @pacman100 in #596

What's new?

  • Small fixed for balanced device maps by @sgugger in #583
  • Add balanced option for auto device map creation by @sgugger in #534
  • fixing deepspeed slow tests issue by @pacman100 in #604
  • add more conditions on casting by @younesbelkada in #606
  • Remove redundant .run in WandBTracker. by @zh-plus in #605
  • Fix some typos + wordings by @muellerzr in #603
  • reorg of test scripts and minor changes to tests by @pacman100 in #602
  • Move warning by @muellerzr in #598
  • Shorthand way to grab a tracker by @muellerzr in #594
  • Pin deepspeed by @muellerzr in #595
  • Improve docstring by @muellerzr in #591
  • TESTS! by @muellerzr in #589
  • Fix DispatchDataloader by @sgugger in #588
  • Use main_process_first in the examples by @muellerzr in #581
  • Skip and raise NotImplementedError for gather_for_metrics for now by @muellerzr in #580
  • minor FSDP launcher fix by @pacman100 in #579
  • Refine test in set_module_tensor_to_device by @sgugger in #577
  • Fix set_module_tensor_to_device by @sgugger in #576
  • Add 8 bit support - chapter II by @younesbelkada in #539
  • Fix tests, add wandb to gitignore by @muellerzr in #573
  • Fix step by @muellerzr in #572
  • Speed up main CI by @muellerzr in #571
  • ccl version check and import different module according to version by @sywangyi in #567
  • set default num_cpu_threads_per_process to improve oob performance by @sywangyi in #562
  • Add a tqdm helper by @muellerzr in #564
  • Rename actions to be a bit more accurate by @muellerzr in #568
  • Fix clean by @muellerzr in #569
  • enhancements and fixes for FSDP and DeepSpeed by @pacman100 in #532
  • fix: saving model weights by @csarron in #556
  • add on_main_process decorators by @ZhiyuanChen in #488
  • Update imports.py by @KimBioInfoStudio in #554
  • unpin datasets by @lhoestq in #563
  • Create good defaults in accelerate launch by @muellerzr in #553
  • Fix a few minor issues with example code in docs by @BenjaminBossan in #551
  • deepspeed version 0.6.7 fix by @pacman100 in #544
  • Rename test extras to testing by @muellerzr in #545
  • Add production testing + fix failing CI by @muellerzr in #547
  • Add a gather_for_metrics capability by @muellerzr in #540
  • Allow for kwargs to be passed to trackers by @muellerzr in #542
  • Add support for downcasting bf16 on TPUs by @muellerzr in #523
  • Add more documentation for device maps computations by @sgugger in #530
  • Restyle prepare one by @muellerzr in #531
  • Pick a better default for offload_state_dict by @sgugger in #529
  • fix some parameter setting does not work for CPU DDP and bf16 fail in… by @sywangyi in #527
  • Fix accelerate tests command by @sgugger in #528

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @sywangyi
    • ccl version check and import different module according to version (#567)
    • set default num_cpu_threads_per_process to improve oob performance (#562)
    • fix some parameter setting does not work for CPU DDP and bf16 fail in… (#527)
  • @ZhiyuanChen
    • add on_main_process decorators (#488)
Jul 18, 2022
v0.11.0 Gradient Accumulation and SageMaker Data Parallelism

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along gradient_accumulation_steps=xxx when instantiating the Accelerator and put all your training loop step under a with accelerator.accumulate(model):. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the documentation.

  • Add gradient accumulation doc by @muellerzr in #511
  • Make gradient accumulation work with dispatched dataloaders by @muellerzr in #510
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

  • SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by @pacman100 in #504
  • SageMaker DP Support by @pacman100 in #494

What's new?

  • Fix accelerate tests command by @sgugger in #528
  • FSDP integration enhancements and fixes by @pacman100 in #522
  • Warn user if no trackers are installed by @muellerzr in #524
  • Fixup all example CI tests and properly fail by @muellerzr in #517
  • fixing deepspeed multi-node launcher by @pacman100 in #514
  • Add special Parameters modules support by @younesbelkada in #519
  • Don't unwrap in save_state() by @cccntu in #489
  • Fix a bug when reduce a tensor. by @wwhio in #513
  • Add benchmarks by @sgugger in #506
  • Fix DispatchDataLoader length when split_batches=True by @sgugger in #509
  • Fix scheduler in gradient accumulation example by @muellerzr in #500
  • update dataloader wrappers to have total_batch_size attribute by @pacman100 in #493
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484
  • add use_distributed property by @ZhiyuanChen in #487
  • fixing fsdp autowrap functionality by @pacman100 in #475
  • Use datasets 2.2.0 for now by @muellerzr in #481
  • Rm gradient accumulation on TPU by @muellerzr in #479
  • Revert "Pin datasets for now by @muellerzr in #477)"
  • Pin datasets for now by @muellerzr in #477
  • Some typos and cosmetic fixes by @douwekiela in #472
  • Fix when TPU device check is ran by @muellerzr in #469
  • Refactor Utility Documentation by @muellerzr in #467
  • Add docbuilder to quality by @muellerzr in #468
  • Expose some is_*_available utils in docs by @muellerzr in #466
  • Cleanup CI Warnings by @muellerzr in #465
  • Link CI slow runners to the commit by @muellerzr in #464
  • Fix subtle bug in BF16 by @muellerzr in #463
  • Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by @muellerzr in #462
  • Handle bfloat16 weights in disk offload without adding memory overhead by @noamwies in #460)
  • Handle bfloat16 weights in disk offload by @sgugger in #460
  • Raise a clear warning if a user tries to modify the AcceleratorState by @muellerzr in #458
  • Right step point by @muellerzr in #459
  • Better checks for if a TPU device exists by @muellerzr in #456
  • Offload and modules with unused submodules by @sgugger in #442
Jun 15, 2022
V0.10.0 DeepSpeed integration revamp and TPU speedup

This release adds two major new features: the DeepSpeed integration has been revamped to match the one in Transformers Trainer, with multiple new options unlocked, and the TPU integration has been sped up.

This version also officially stops supporting Python 3.6 and requires Python 3.7+

DeepSpeed integration revamp

Users can now specify a DeepSpeed config file when they want to use DeepSpeed, which unlocks many new options. More details in the new documentation.

  • Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #432
  • DeepSpeed Revamp by @pacman100 in #405

TPU speedup

If you're using TPUs we have sped up the dataloaders and models quite a bit, on top of a few bug fixes.

  • Revamp TPU internals to be more efficient + enable mixed precision types by @muellerzr in #441

What's new?

  • Fix docstring by @muellerzr in #447
  • Add psutil as depenedency by @sgugger in #445
  • fix fsdp torch version dependency by @pacman100 in #437
  • Create Gradient Accumulation Example by @muellerzr in #431
  • init by @muellerzr in #429
  • Introduce no_sync context wrapper + clean up some more warnings for DDP by @muellerzr in #428
  • updating tests to resolve runner failures wrt deepspeed revamp by @pacman100 in #427
  • Fix secrets in Docker workflow by @muellerzr in #426
  • Introduce a Dependency Checker to trigger new Docker Builds on main by @muellerzr in #424
  • Enable slow tests nightly by @muellerzr in #421
  • Push out python 3.6 + fix all tests related to the upgrade by @muellerzr in #420
  • Speedup main CI by @muellerzr in #419
  • Switch to evaluate for metrics by @sgugger in #417
  • Create an issue template for Accelerate by @muellerzr in #415
  • Introduce post-merge runners by @muellerzr in #416
  • Fix debug_launcher issues by @muellerzr in #413
  • Use main egg by @muellerzr in #414
  • Introduce nightly runners by @muellerzr in #410
  • Update requirements to pin tensorboard and include psutil by @muellerzr in #408
  • Fix CUDA examples tests by @muellerzr in #407
  • Move datasets and transformers to under func by @muellerzr in #411
  • Fix CUDA Dockerfile by @muellerzr in #409
  • Hotfix all failing GPU tests by @muellerzr in #401
  • improve metrics logged in examples by @pacman100 in #399
  • Refactor offload_state_dict and fix in offload_weight by @sgugger in #398
  • Refactor version checking into a utility by @muellerzr in #395
  • Include fastai in frameworks by @muellerzr in #396
  • Add packaging to requirements by @muellerzr in #394
  • Better dispatch for submodules by @sgugger in #392
  • Build Docker Images nightly by @muellerzr in #391
  • Small bugfix for the stalebot workflow by @muellerzr in #390
  • Introduce stalebot by @muellerzr in #387
  • Create Dockerfiles for Accelerate by @muellerzr in #377
  • Mix precision -> Mixed precision by @muellerzr in #388
  • Fix OneCycle step length when in multiprocess by @muellerzr in #385
May 20, 2022
v0.9.0: Refactor utils to use in Transformers

This release offers no significant new API, it is just needed to have access to some utils in Transformers.

  • Handle deprication errors in launch by @muellerzr in #360
  • Update launchers.py by @tmabraham in #363
  • fix tracking by @pacman100 in #361
  • Remove tensor call by @muellerzr in #365
  • Add a utility for writing a barebones config file by @muellerzr in #371
  • fix deepspeed model saving by @pacman100 in #370
  • deepspeed save model temp fix by @pacman100 in #374
  • Refactor tests to use accelerate launch by @muellerzr in #373
  • fix zero stage-1 by @pacman100 in #378
  • fix shuffling for ShufflerIterDataPipe instances by @loubnabnl in #376
  • Better check for deepspeed availability by @sgugger in #379
  • Refactor some parts in utils by @sgugger in #380
May 12, 2022
v0.8.0: Big model inference

Big model inference

To handle very large models, new functionality has been added in Accelerate:

  • a context manager to initalize empty models
  • a function to load a sharded checkpoint directly on the right devices
  • a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
  • a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
  • a function that wraps the last three blocks in one simple call (load_checkpoint_and_dispatch)

See more in the documentation

  • Big model inference by @sgugger in #345

What's new

  • Create peak_memory_uasge_tracker.py by @pacman100 in #336
  • Fixed a typo to enable running accelerate correctly by @Idodox in #339
  • Introduce multiprocess logger by @muellerzr in #337
  • Refactor utils into its own module by @muellerzr in #340
  • Improve num_processes question in CLI by @muellerzr in #343
  • Handle Manual Wrapping in FSDP. Minor fix of fsdp example. by @pacman100 in #342
  • Better prompt for number of training devices by @muellerzr in #344
  • Fix prompt for num_processes by @pacman100 in #347
  • Fix sample calculation in examples by @muellerzr in #352
  • Fixing metric eval in distributed setup by @pacman100 in #355
  • DeepSpeed and FSDP plugin support through script by @pacman100 in #356
Apr 29, 2022
v0.7.1 Patch release
  • Fix fdsp config in cluster 331
  • Add guards for batch size finder 334
  • Patchfix infinite loop 335
Latest
v1.13.0
Tracking Since
Mar 5, 2021
Last fetched Apr 18, 2026