v0.20.0

Big model inference

Support has been added to run device_map="auto" on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.

Add mps support to big inference modeling by @SunMarc in #1545
Adds fp4 support for model dispatching by @younesbelkada in #1505

4-bit QLoRA Support

4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458

Distributed Inference Utilities

This version introduces a new Accelerator.split_between_processes utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more here

Introduce XPU support for Intel GPU

Intel GPU support initialization by @abhilash1910 in #1118

Add support for the new PyTorch XLA TPU runtime

Accelerate now supports the latest TPU runtimes #1393, #1385

A new optimizer method: `LocalSGD`

This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by @searchivarius in #1378

Papers with 🤗 Accelerate

We now have an entire section of the docs dedicated to official paper implementations and citations using the framework #1399, see it live here

Breaking changes

logging_dir has been fully deprecated, please use project_dir or a Project_configuration

What's new?

use existing mlflow experiment if exists by @Rusteam in #1403
changes required for DS integration by @pacman100 in #1406
fix deepspeed failing tests by @pacman100 in #1411
Make mlflow logging dir optional by @mattplo-decath in #1413
Fix bug on ipex for diffusers by @abhilash1910 in #1426
Improve Slack Updater by @muellerzr in #1433
Let quality yell at the user if it's a version difference by @muellerzr in #1438
Ensure that it gets installed by @muellerzr in #1439
[core] Introducing CustomDtype enum for custom dtypes by @younesbelkada in #1434
Fix XPU by @muellerzr in #1440
Make sure torch compiled model can also be unwrapped by @patrickvonplaten in #1437
fixed: ZeroDivisionError: division by zero by @sreio in #1436
fix potential OOM when resuming with multi-GPU training by @exhyy in #1444
Fixes in infer_auto_device_map by @sgugger in #1441
Raise error when logging improperly by @muellerzr in #1446
Fix ci by @muellerzr in #1447
Distributed prompting/inference utility by @muellerzr in #1410
Add to by @muellerzr in #1448
split_between_processes by @stevhliu in #1449
[docs] Replace state.rank -> process_index by @pcuenca in #1450
Auto multigpu logic by @muellerzr in #1452
Update with cli instructions by @muellerzr in #1453
Adds in_order argument that defaults to False, to log in order. by @JulesGM in #1262
fix error for CPU DDP using trainer api. by @sywangyi in #1455
Refactor and simplify xpu device in state by @abhilash1910 in #1456
Document how to use commands with python module instead of argparse by @muellerzr in #1457
4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
Fix skip first batch being perminant by @muellerzr in #1466
update conversion of layers to retain original data type. by @avisinghal6 in #1467
Check for xpu specifically by @muellerzr in #1472
update register_empty_buffer to match torch args by @NouamaneTazi in #1465
Update gradient accumulation docs, and remove redundant example by @iantbutler01 in #1461
Imrpove sagemaker by @muellerzr in #1470
Split tensors as part of split_between_processes by @muellerzr in #1477
Move to device by @muellerzr in #1478
Fix gradient state bugs in multiple dataloader by @Ethan-yt in #1483
Add rdzv-backend by @muellerzr in #1490
Only use IPEX if available by @muellerzr in #1495
Update README.md by @lyhue1991 in #1493
Let gather_for_metrics always run by @muellerzr in #1496
Use empty like when we only need to create buffers by @thomasw21 in #1497
Allow key skipping in big model inference by @sgugger in #1491
fix crash when ipex is installed and torch has no xpu by @sywangyi in #1502
[bnb] Add fp4 support for dispatch by @younesbelkada in #1505
Fix 4bit model on multiple devices by @SunMarc in #1506
adjust overriding of model's forward function by @prathikr in #1492
Add assertion when call prepare with deepspeed config. by @tensimiku in #1468
NVME path support for deepspeed by @abhilash1910 in #1484
should set correct dtype to ipex optimize and use amp logic in native… by @sywangyi in #1511
Swap env vars for XPU and IPEX + CLI by @muellerzr in #1513
Fix a bug when parameters tied belong to the same module by @sgugger in #1514
Fixup deepspeed/cli tests by @muellerzr in #1526
Refactor mp into its own wrapper by @muellerzr in #1527
Check tied parameters by @SunMarc in #1529
Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by @muellerzr in #1531
Officially support naive PP for quantized models + PEFT by @younesbelkada in #1523
remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by @sywangyi in #1503
Prevent using extra VRAM for static device_map by @LSerranoPEReN in #1536
Update deepspeed.mdx by @LiamSwayne in #1541
Update performance.mdx by @LiamSwayne in #1543
Update deferring_execution.mdx by @LiamSwayne in #1544
Apply deprecations by @muellerzr in #1537
Add mps support to big inference modeling by @SunMarc in #1545
[documentation] grammar fixes in gradient_synchronization.mdx by @LiamSwayne in #1547
Eval mode by @muellerzr in #1540
Update migration.mdx by @LiamSwayne in #1549

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@will-cromar
- Support TPU v4 with new PyTorch/XLA TPU runtime (#1393)
- Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (#1385)
@searchivarius
- Adding support for local SGD. (#1378)
@abhilash1910
- Intel GPU support initialization (#1118)
- Fix bug on ipex for diffusers (#1426)
- Refactor and simplify xpu device in state (#1456)
- NVME path support for deepspeed (#1484)
@sywangyi
- fix error for CPU DDP using trainer api. (#1455)
- fix crash when ipex is installed and torch has no xpu (#1502)
- should set correct dtype to ipex optimize and use amp logic in native… (#1511)
- remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (#1503)
@Ethan-yt
- Fix gradient state bugs in multiple dataloader (#1483)