v0.13.0 — Accelerate

Better multinode support in the launcher

The accelerate command launch did not work well for distributed training using several machines. This is fixed in this version.

Use torchrun for multinode by @muellerzr in #631
Fix multi-node issues from launch by @muellerzr in #672

Launch training on specific GPUs only

Instead of prefixing your launch command with CUDA_VISIBLE_DEVICES=xxx you can now specify the GPUs you want to use in your Accelerate config.

Allow for GPU-ID specification on CLI by @muellerzr in #732

Better tracebacks and rich support

The tracebacks are now cleaned up to avoid printing several times the same error, and rich is integrated as an optional dependency.

Integrate Rich into Accelerate by @muellerzr in #613
Make rich an optional dep by @muellerzr in #673

What's new?

Fix typo in docs/index.mdx by @mishig25 in #610
Fix DeepSpeed CI by @muellerzr in #612
Added GANs example to examples by @EyalMichaeli in #619
Fix example by @muellerzr in #620
Update README.md by @ezhang7423 in #622
Fully remove subprocess from the multi-gpu launcher by @muellerzr in #623
M1 mps fixes by @pacman100 in #625
Fix multi-node issues and simplify param logic by @muellerzr in #627
update MPS support docs by @pacman100 in #629
minor tracker fixes for complete* examples by @pacman100 in #630
Put back in place the guard by @muellerzr in #634
make init_trackers to launch on main process by @Gladiator07 in #642
remove check for main process for trackers initialization by @Gladiator07 in #643
fix link by @philschmid in #645
Add static_graph arg to DistributedDataParallelKwargs. by @rom1504 in #637
Small nits to grad accum docs by @muellerzr in #656
Saving hyperparams in yaml file for Tensorboard for #521 by @Shreyz-max in #657
Use debug for loggers by @muellerzr in #655
Improve docstrings more by @muellerzr in #666
accelerate bibtex by @pacman100 in #660
Cache torch_tpu check by @muellerzr in #670
Manim animation of big model inference by @muellerzr in #671
Add aim tracker for accelerate by @muellerzr in #649
Specify local network on multinode by @muellerzr in #674
Test for min torch version + fix all issues by @muellerzr in #638
deepspeed enhancements and fixes by @pacman100 in #676
DeepSpeed launcher related changes by @pacman100 in #626
adding torchrun elastic params by @pacman100 in #680
:bug: fix by @pacman100 in #683
Fix skip in dispatch dataloaders by @sgugger in #682
Clean up DispatchDataloader a bit more by @sgugger in #686
rng state sync for FSDP by @pacman100 in #688
Fix DataLoader with samplers that are batch samplers by @sgugger in #687
fixing support for Apple Silicon GPU in notebook_launcher by @pacman100 in #695
fixing rng sync when using custom sampler and batch_sampler by @pacman100 in #696
Improve init_empty_weights to override tensor constructor by @thomasw21 in #699
override DeepSpeed grad_acc_steps from accelerator obj by @pacman100 in #698
[doc] Fix 404'd link in memory usage guides by @tomaarsen in #702
Add in report generation for test failures and make fail-fast false by @muellerzr in #703
Update runners with report structure, adjust env variable by @muellerzr in #704
docs: examples readability improvements by @ryanrussell in #709
docs: utils readability fixups by @ryanrussell in #711
refactor(test_tracking): key_occurrence readability fixup by @ryanrussell in #710
docs: hooks readability improvements by @ryanrussell in #712
sagemaker fixes and improvements by @pacman100 in #708
refactor(accelerate): readability improvements by @ryanrussell in #713
More docstring nits by @muellerzr in #715
Allow custom device placements for different objects by @sgugger in #716
Specify gradients in model preparation by @muellerzr in #722
Fix regression issue by @muellerzr in #724
Fix default for num processes by @sgugger in #726
Build and Release docker images on a release by @muellerzr in #725
Make running tests more efficient by @muellerzr in #611
Fix old naming by @muellerzr in #727
Fix issue with one-cycle logic by @muellerzr in #728
Remove auto-bug label in issue template by @sgugger in #735
Add a tutorial on proper benchmarking by @muellerzr in #734
Add an example zoo to the documentation by @muellerzr in #737
trlx by @muellerzr in #738
Fix memory leak by @muellerzr in #739
Include examples for CI by @muellerzr in #740
Auto grad accum example by @muellerzr in #742