v0.23.0

Model Memory Estimator

A new model estimation tool to help calculate how much memory is needed for inference has been added. This does not download the pretrained weights, and utilizes init_empty_weights to stay memory efficient during the calculation.

Usage directions:

accelerate estimate-memory {model_name} --library {library_name} --dtypes fp16 int8

Or:

from accelerate.commands.estimate import estimate_command_parser, estimate_command, gather_data

parser = estimate_command_parser()
args = parser.parse_args(["bert-base-cased", "--dtypes", "float32"])
output = gather_data(args)

🤗 Hub is a first-class citizen

We've made the huggingface_hub library a first-class citizen of the framework! While this is mainly for the model estimation tool, this opens the doors for further integrations should they be wanted

`Accelerator` Enhancements:

gather_for_metrics will now also de-dupe for non-tensor objects. See #1937
mixed_precision="bf16" support on NPU devices. See #1949
New breakpoint API to help when dealing with trying to break from a condition on a single process. See #1940

Notebook Launcher Enhancements:

The notebook launcher now supports launching across multiple nodes! See #1913

FSDP Enhancements:

Activation checkpointing is now natively supported in the framework. See https://github.com/huggingface/accelerate/pull/1891
torch.compile support was fixed. See #1919

DeepSpeed Enhancements:

XPU/ccl support (#1827)
Easier gradient accumulation support, simply set gradient_accumulation_steps to "auto" in your deepspeed config, and Accelerate will use the one passed to Accelerator instead (#1901)
Support for custom schedulers and deepspeed optimizers (#1909)

What's Changed

Update release instructions by @sgugger in https://github.com/huggingface/accelerate/pull/1877
fix detach_hook by @SunMarc in https://github.com/huggingface/accelerate/pull/1880
Enable power users to bypass device_map="auto" training block by @muellerzr in https://github.com/huggingface/accelerate/pull/1881
Introduce model memory estimator by @muellerzr in https://github.com/huggingface/accelerate/pull/1876
Update with new url for explore by @muellerzr in https://github.com/huggingface/accelerate/pull/1884
Enable a token to be used by @muellerzr in https://github.com/huggingface/accelerate/pull/1886
Add doc on model memory usage by @muellerzr in https://github.com/huggingface/accelerate/pull/1887
Add hub as core dep by @muellerzr in https://github.com/huggingface/accelerate/pull/1885
update import of deepspeed integration from transformers by @pacman100 in https://github.com/huggingface/accelerate/pull/1894
Final nits on model util by @muellerzr in https://github.com/huggingface/accelerate/pull/1896
Fix nb launcher test by @muellerzr in https://github.com/huggingface/accelerate/pull/1899
Add FSDP activation checkpointing feature by @arde171 in https://github.com/huggingface/accelerate/pull/1891
Solve at least one failing test by @muellerzr in https://github.com/huggingface/accelerate/pull/1898
Deepspeed integration for XPU/ccl by @abhilash1910 in https://github.com/huggingface/accelerate/pull/1827
Add PR template by @muellerzr in https://github.com/huggingface/accelerate/pull/1906
deepspeed grad_acc_steps fixes by @pacman100 in https://github.com/huggingface/accelerate/pull/1901
Skip pypi transformers until release by @muellerzr in https://github.com/huggingface/accelerate/pull/1911
Fix docker images by @muellerzr in https://github.com/huggingface/accelerate/pull/1910
Use hosted CI runners for building docker images by @muellerzr in https://github.com/huggingface/accelerate/pull/1915
fix: add debug argument to sagemaker configuration by @maximegmd in https://github.com/huggingface/accelerate/pull/1904
improve help info when run accelerate config on npu by @statelesshz in https://github.com/huggingface/accelerate/pull/1895
support logging with mlflow in case of mlflow-skinny installed by @ghtaro in https://github.com/huggingface/accelerate/pull/1874
More CI fun - run all test parts always by @muellerzr in https://github.com/huggingface/accelerate/pull/1916
Expose auto in dataclass by @muellerzr in https://github.com/huggingface/accelerate/pull/1914
Add support for deepspeed optimizer and custom scheduler by @pacman100 in https://github.com/huggingface/accelerate/pull/1909
reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. by @statelesshz in https://github.com/huggingface/accelerate/pull/1926
Check for invalid keys by @muellerzr in https://github.com/huggingface/accelerate/pull/1935
clean num devices by @SunMarc in https://github.com/huggingface/accelerate/pull/1936
Bring back pypi to runners by @muellerzr in https://github.com/huggingface/accelerate/pull/1939
Support multi-node notebook launching by @ggaaooppeenngg in https://github.com/huggingface/accelerate/pull/1913
fix the fsdp docs by @pacman100 in https://github.com/huggingface/accelerate/pull/1947
Fix docs by @ggaaooppeenngg in https://github.com/huggingface/accelerate/pull/1951
Protect tensorflow dependency by @SunMarc in https://github.com/huggingface/accelerate/pull/1959
fix safetensor saving by @SunMarc in https://github.com/huggingface/accelerate/pull/1954
FIX: patch_environment restores pre-existing environment variables when finished by @BenjaminBossan in https://github.com/huggingface/accelerate/pull/1960
Better guards for slow imports by @muellerzr in https://github.com/huggingface/accelerate/pull/1963
[Tests] Finish all todos by @younesbelkada in https://github.com/huggingface/accelerate/pull/1957
Rm strtobool by @muellerzr in https://github.com/huggingface/accelerate/pull/1964
Implementing gather_for_metrics with dedup for non tensor objects by @Lorenzobattistela in https://github.com/huggingface/accelerate/pull/1937
add bf16 mixed precision support for NPU by @statelesshz in https://github.com/huggingface/accelerate/pull/1949
Introduce breakpoint API by @muellerzr in https://github.com/huggingface/accelerate/pull/1940
fix torch compile with FSDP by @pacman100 in https://github.com/huggingface/accelerate/pull/1919
Add force_hooks to dispatch_model by @austinapatel in https://github.com/huggingface/accelerate/pull/1969
update FSDP and DeepSpeed docs by @pacman100 in https://github.com/huggingface/accelerate/pull/1973
Flex fix patch for accelerate by @abhilash1910 in https://github.com/huggingface/accelerate/pull/1972
Remove checkpoints only on main process by @Kepnu4 in https://github.com/huggingface/accelerate/pull/1974

New Contributors

@arde171 made their first contribution in https://github.com/huggingface/accelerate/pull/1891
@maximegmd made their first contribution in https://github.com/huggingface/accelerate/pull/1904
@ghtaro made their first contribution in https://github.com/huggingface/accelerate/pull/1874
@ggaaooppeenngg made their first contribution in https://github.com/huggingface/accelerate/pull/1913
@Lorenzobattistela made their first contribution in https://github.com/huggingface/accelerate/pull/1937
@austinapatel made their first contribution in https://github.com/huggingface/accelerate/pull/1969
@Kepnu4 made their first contribution in https://github.com/huggingface/accelerate/pull/1974

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.22.0...v0.23.0