v0.21.0 — Accelerate

Model quantization with bitsandbytes

You can now quantize any model (no just Transformer models) using Accelerate. This is mainly for models having a lot of linear layers. See the documentation for more information!

Bnb quantization by @SunMarc in #1626

Support for Ascend NPUs

Accelerate now supports Ascend NPUs.

Add Ascend NPU accelerator support by @statelesshz in #1676

What's new?

Accelerate now requires Python 3.8+ and PyTorch 1.10+ :

🚨🚨🚨 Spring cleaning: Python 3.8 🚨🚨🚨 by @muellerzr in #1661
🚨🚨🚨 Spring cleaning: PyTorch 1.10 🚨🚨🚨 by @muellerzr in #1662
[doc build] Use secrets by @mishig25 in #1551
Update launch.mdx by @LiamSwayne in #1553
Avoid double wrapping of all accelerate.prepare objects by @muellerzr in #1555
Update README.md by @LiamSwayne in #1556
Fix load_state_dict when there is one device and disk by @sgugger in #1557
Fix tests not being ran on multi-GPU nightly by @muellerzr in #1558
fix the typo when setting the "_accelerator_prepared" attribute by @Yura52 in #1560
[core] Fix possibility to passNoneType objects in prepare by @younesbelkada in #1561
Reset dataloader end_of_datalaoder at each iter by @sgugger in #1562
Update big_modeling.mdx by @LiamSwayne in #1564
[bnb] Fix failing int8 tests by @younesbelkada in #1567
Update gradient sync docs to reflect importance of optimizer.step() by @dleve123 in #1565
Update mixed precision integrations in README by @sgugger in #1569
Raise error instead of warn by @muellerzr in #1568
Introduce listify, fix tensorboard silently failing by @muellerzr in #1570
Check for bak and expand docs on directory structure by @muellerzr in #1571
Perminant solution by @muellerzr in #1577
fix the bug in xpu by @mingxiaoh in #1508
Make sure that we only set is_accelerator_prepared on items accelerate actually prepares by @muellerzr in #1578
Expand prepare() doc by @muellerzr in #1580
Get Torch version using importlib instead of pkg_resources by @catwell in #1585
improve oob performance when use mpirun to start DDP finetune without accelerate launch by @sywangyi in #1575
Update training_tpu.mdx by @LiamSwayne in #1582
Return false if CUDA available by @muellerzr in #1581
fix logger level by @caopulan in #1579
Fix test by @muellerzr in #1586
Update checkpoint.mdx by @LiamSwayne in #1587
FSDP updates by @pacman100 in #1576
Update modeling.py by @ain-soph in #1595
Integration tests by @muellerzr in #1593
Add triggers for CI workflow by @muellerzr in #1597
Remove asking xpu plugin for non xpu devices by @abhilash1910 in #1594
Remove GPU safetensors env variable by @sgugger in #1603
reset end_of_dataloader for dataloader_dispatcher by @megavaz in #1609
fix for arc gpus by @abhilash1910 in #1615
Ignore low_zero option when only device is available by @sgugger in #1617
Fix failing multinode tests by @muellerzr in #1616
Doc to md by @sgugger in #1618
Fix tb issue by @muellerzr in #1623
Fix workflow by @muellerzr in #1625
Fix transformers sync bug with accumulate by @muellerzr in #1624
fixes offload dtype by @SunMarc in #1631
fix: Megatron is not installed. please build it from source. by @yuanwu2017 in #1636
deepspeed z2/z1 state_dict bloating fix by @pacman100 in #1638
Swap disable rich by @muellerzr in #1640
fix autocasting bug by @pacman100 in #1637
fix modeling low zero by @abhilash1910 in #1634
Add skorch to runners by @muellerzr in #1646
add save model by @SunMarc in #1641
Change dispatch_model when we have only one device by @SunMarc in #1648
Doc save model by @SunMarc in #1650
Fix device_map by @SunMarc in #1651
Check for port usage before launch by @muellerzr in #1656
[BigModeling] Add missing check for quantized models by @younesbelkada in #1652
Bump integration by @muellerzr in #1658
TIL by @muellerzr in #1657
docker cpu py version by @muellerzr in #1659
[BigModeling] Final fix for dispatch int8 and fp4 models by @younesbelkada in #1660
remove safetensor dep on shard_checkpoint by @SunMarc in #1664
change the import place to avoid import error by @pacman100 in #1653
Update broken Runhouse link in examples/README.md by @dongreenberg in #1668
Bnb quantization by @SunMarc in #1626
replace save funct in doc by @SunMarc in #1672
Doc big model inference by @SunMarc in #1670
Add docs for saving Transformers models by @deppen8 in #1671
fix bnb tests by @SunMarc in #1679
Fix workflow CI by @muellerzr in #1690
remove duplicate class by @SunMarc in #1691
update readme in examples by @statelesshz in #1678
Fix nightly tests by @muellerzr in #1696
Fixup docs by @muellerzr in #1697
Improve quality errors by @muellerzr in #1698
Move mixed precision wrapping ahead of DDP/FSDP wrapping by @ChenWu98 in #1682
Add offload for 8-bit model by @SunMarc in #1699
Deepcopy on Accelerator to return self by @muellerzr in #1694
Update tracking.md by @stevhliu in #1702
Skip tests when bnb isn't available by @muellerzr in #1706
Fix launcher validation by @abhilash1910 in #1705
Fixes for issue #1683: failed to run accelerate config in colab by @Erickrus in #1692
Fix the bug where DataLoaderDispatcher gets stuck in an infinite wait when the dataset is an IterDataPipe during multi-process training. by @yuxinyuan in #1709
add multi_gpu decorator by @SunMarc in #1712
Modify loading checkpoint behavior by @SunMarc in #1715
fix version by @SunMarc in #1701
Keep old behavior by @muellerzr in #1716
Optimize get_scale to reduce async calls by @muellerzr in #1718
Remove duplicate code by @muellerzr in #1717
New tactic by @muellerzr in #1719
add Comfy-UI by @pacman100 in #1723
add compatibility with peft by @SunMarc in #1725

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@LiamSwayne
- Update launch.mdx (#1553)
- Update README.md (#1556)
- Update big_modeling.mdx (#1564)
- Update training_tpu.mdx (#1582)
- Update checkpoint.mdx (#1587)
@mingxiaoh
- fix the bug in xpu (#1508)
@statelesshz
- update readme in examples (#1678)
- Add Ascend NPU accelerator support (#1676)
@ChenWu98
- Move mixed precision wrapping ahead of DDP/FSDP wrapping (#1682)