v0.22.0

Experimental distributed operations checking framework

A new framework has been introduced which can help catch timeout errors caused by distributed operations failing before they occur. As this adds a tiny bit of overhead, it is an opt-in scenario. Simply run your code with ACCELERATE_DEBUG_MODE="1" to enable this. Read more in the docs, introduced via https://github.com/huggingface/accelerate/pull/1756

`Accelerator.load_state` can now load the most recent checkpoint automatically

If a ProjectConfiguration has been made, using accelerator.load_state() (without any arguments passed) can now automatically find and load the latest checkpoint used, introduced via https://github.com/huggingface/accelerate/pull/1741

Multiple enhancements to gradient accumulation

In this release multiple new enhancements to distributed gradient accumulation have been added.

accelerator.accumulate() now supports passing in multiple models introduced via https://github.com/huggingface/accelerate/pull/1708
A util has been introduced to perform multiple forwards, then multiple backwards, and finally sync the gradients only on the last .backward() via https://github.com/huggingface/accelerate/pull/1726

FSDP Changes

FSDP support has been added for NPU and XPU devices via https://github.com/huggingface/accelerate/pull/1803 and https://github.com/huggingface/accelerate/pull/1806
A new method for supporting RAM-efficient loading of models with FSDP has been added via https://github.com/huggingface/accelerate/pull/1777

DataLoader Changes

Custom slice functions are now supported in the DataLoaderDispatcher added via https://github.com/huggingface/accelerate/pull/1846

What's New?

fix failing test on 8GPU by @statelesshz in https://github.com/huggingface/accelerate/pull/1724
Better control over DDP's no_sync by @NouamaneTazi in https://github.com/huggingface/accelerate/pull/1726
Get rid of calling get_scale() by patching the step method of optimizer. by @yuxinyuan in https://github.com/huggingface/accelerate/pull/1720
fix the bug in npu by @statelesshz in https://github.com/huggingface/accelerate/pull/1728
Adding a shape check for set_module_tensor_to_device. by @Narsil in https://github.com/huggingface/accelerate/pull/1731
Fix errors when optimizer is not a Pytorch optimizer. by @yuxinyuan in https://github.com/huggingface/accelerate/pull/1733
Make balanced memory able to work with non contiguous GPUs ids by @thomwolf in https://github.com/huggingface/accelerate/pull/1734
Fixed typo in __repr__ of AlignDevicesHook by @KacperWyrwal in https://github.com/huggingface/accelerate/pull/1735
Update docs by @muellerzr in https://github.com/huggingface/accelerate/pull/1736
Fixed the bug that split dict incorrectly by @yuangpeng in https://github.com/huggingface/accelerate/pull/1742
Let load_state automatically grab the latest save by @muellerzr in https://github.com/huggingface/accelerate/pull/1741
fix KwargsHandler.to_kwargs not working with os.environ initialization in __post_init__ by @CyCle1024 in https://github.com/huggingface/accelerate/pull/1738
fix typo by @cauyxy in https://github.com/huggingface/accelerate/pull/1747
Check for misconfiguration of single node & single GPU by @muellerzr in https://github.com/huggingface/accelerate/pull/1746
Remove unused constant by @muellerzr in https://github.com/huggingface/accelerate/pull/1749
Rework new constant for operations by @muellerzr in https://github.com/huggingface/accelerate/pull/1748
Expose autocast kwargs and simplify autocast wrapper by @muellerzr in https://github.com/huggingface/accelerate/pull/1740
Fix FSDP related issues by @pacman100 in https://github.com/huggingface/accelerate/pull/1745
FSDP enhancements and fixes by @pacman100 in https://github.com/huggingface/accelerate/pull/1753
Fix check failure in Accelerator.save_state using multi-gpu by @CyCle1024 in https://github.com/huggingface/accelerate/pull/1760
Fix error when max_memory argument is in unexpected order by @ranchlai in https://github.com/huggingface/accelerate/pull/1759
Fix offload on disk when executing on CPU by @sgugger in https://github.com/huggingface/accelerate/pull/1762
Change is_aim_available() function to not match aim >= 4.0.0 by @alberttorosyan in https://github.com/huggingface/accelerate/pull/1769
Introduce an experimental distributed operations framework by @muellerzr in https://github.com/huggingface/accelerate/pull/1756
Support wrapping multiple models in Accelerator.accumulate() by @yuxinyuan in https://github.com/huggingface/accelerate/pull/1708
Contigous on gather by @muellerzr in https://github.com/huggingface/accelerate/pull/1771
[FSDP] Fix load_fsdp_optimizer by @awgu in https://github.com/huggingface/accelerate/pull/1755
simplify and correct the deepspeed example by @pacman100 in https://github.com/huggingface/accelerate/pull/1775
Set ipex default in state by @muellerzr in https://github.com/huggingface/accelerate/pull/1776
Fix import error when torch>=2.0.1 and torch.distributed is disabled by @natsukium in https://github.com/huggingface/accelerate/pull/1800
reserve 10% GPU in get_balanced_memory to avoid OOM by @ranchlai in https://github.com/huggingface/accelerate/pull/1798
add support of float memory size in convert_file_size_to_int by @ranchlai in https://github.com/huggingface/accelerate/pull/1799
Allow users to resume from previous wandb runs with allow_val_change by @SumanthRH in https://github.com/huggingface/accelerate/pull/1796
Add FSDP for XPU by @abhilash1910 in https://github.com/huggingface/accelerate/pull/1803
Add FSDP for NPU by @statelesshz in https://github.com/huggingface/accelerate/pull/1806
Fix pytest import by @muellerzr in https://github.com/huggingface/accelerate/pull/1808
More specific logging in gather_for_metrics by @dleve123 in https://github.com/huggingface/accelerate/pull/1784
Detect device map auto and raise a helpful error when trying to not use model parallelism by @muellerzr in https://github.com/huggingface/accelerate/pull/1810
Typo fix by @muellerzr in https://github.com/huggingface/accelerate/pull/1812
Expand device-map warning by @muellerzr in https://github.com/huggingface/accelerate/pull/1819
Update bibtex to reflect team growth by @muellerzr in https://github.com/huggingface/accelerate/pull/1820
Improve docs on grad accumulation by @vwxyzjn in https://github.com/huggingface/accelerate/pull/1817
add warning when using to and cuda by @SunMarc in https://github.com/huggingface/accelerate/pull/1790
Fix bnb import by @muellerzr in https://github.com/huggingface/accelerate/pull/1813
Update docs and docstrings to match load_and_quantize_model arg by @JonathanRayner in https://github.com/huggingface/accelerate/pull/1822
Expose a bit of args/docstring fixup by @muellerzr in https://github.com/huggingface/accelerate/pull/1824
Better test by @muellerzr in https://github.com/huggingface/accelerate/pull/1825
Minor idiomatic change for fp8 check. by @float-trip in https://github.com/huggingface/accelerate/pull/1829
Use device as context manager for init_on_device by @shingjan in https://github.com/huggingface/accelerate/pull/1826
Ipex bug fix for device properties in modelling by @abhilash1910 in https://github.com/huggingface/accelerate/pull/1834
FIX: Bug with unwrap_model and keep_fp32_wrapper=False by @BenjaminBossan in https://github.com/huggingface/accelerate/pull/1838
Fix verify_device_map by @Rexhaif in https://github.com/huggingface/accelerate/pull/1842
Change CUDA check by @muellerzr in https://github.com/huggingface/accelerate/pull/1833
Fix the noneffective parameter: gpu_ids (Rel. Issue #1848) by @devymex in https://github.com/huggingface/accelerate/pull/1850
support for ram efficient loading of model with FSDP by @pacman100 in https://github.com/huggingface/accelerate/pull/1777
Loading logic safetensors by @SunMarc in https://github.com/huggingface/accelerate/pull/1853
fix dispatch for quantized model by @SunMarc in https://github.com/huggingface/accelerate/pull/1855
Update fsdp_with_peak_mem_tracking.py by @pacman100 in https://github.com/huggingface/accelerate/pull/1856
Add env variable for init_on_device by @shingjan in https://github.com/huggingface/accelerate/pull/1852
remove casting to FP32 when saving state dict by @pacman100 in https://github.com/huggingface/accelerate/pull/1868
support custom slice function in DataLoaderDispatcher by @thevasudevgupta in https://github.com/huggingface/accelerate/pull/1846
Include a note to the forums in the bug report by @muellerzr in https://github.com/huggingface/accelerate/pull/1871

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yuxinyuan
- Support wrapping multiple models in Accelerator.accumulate() (#1708)
- Fix errors when optimizer is not a Pytorch optimizer. (#1733)
- Get rid of calling get_scale() by patching the step method of optimizer. (#1720)
@NouamaneTazi
- Better control over DDP's no_sync (#1726)
@abhilash1910
- Add FSDP for XPU (#1803)
- Ipex bug fix for device properties in modelling (#1834)
@statelesshz
- Add FSDP for NPU (#1806)
- fix failing test on 8GPU (#1724)
- fix the bug in npu (#1728)
@thevasudevgupta
- support custom slice function in DataLoaderDispatcher (#1846)

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.21.0...v0.22.0