releases.shpreview

v0.21.0

v0.21.0: Model quantization and NPUs

$npx -y @buildinternet/releases show rel_GsOxfryJe4O4OsIpPN9V9

Model quantization with bitsandbytes

You can now quantize any model (no just Transformer models) using Accelerate. This is mainly for models having a lot of linear layers. See the documentation for more information!

  • Bnb quantization by @SunMarc in #1626

Support for Ascend NPUs

Accelerate now supports Ascend NPUs.

  • Add Ascend NPU accelerator support by @statelesshz in #1676

What's new?

Accelerate now requires Python 3.8+ and PyTorch 1.10+ :

  • 🚨🚨🚨 Spring cleaning: Python 3.8 🚨🚨🚨 by @muellerzr in #1661

  • 🚨🚨🚨 Spring cleaning: PyTorch 1.10 🚨🚨🚨 by @muellerzr in #1662

  • [doc build] Use secrets by @mishig25 in #1551

  • Update launch.mdx by @LiamSwayne in #1553

  • Avoid double wrapping of all accelerate.prepare objects by @muellerzr in #1555

  • Update README.md by @LiamSwayne in #1556

  • Fix load_state_dict when there is one device and disk by @sgugger in #1557

  • Fix tests not being ran on multi-GPU nightly by @muellerzr in #1558

  • fix the typo when setting the "_accelerator_prepared" attribute by @Yura52 in #1560

  • [core] Fix possibility to passNoneType objects in prepare by @younesbelkada in #1561

  • Reset dataloader end_of_datalaoder at each iter by @sgugger in #1562

  • Update big_modeling.mdx by @LiamSwayne in #1564

  • [bnb] Fix failing int8 tests by @younesbelkada in #1567

  • Update gradient sync docs to reflect importance of optimizer.step() by @dleve123 in #1565

  • Update mixed precision integrations in README by @sgugger in #1569

  • Raise error instead of warn by @muellerzr in #1568

  • Introduce listify, fix tensorboard silently failing by @muellerzr in #1570

  • Check for bak and expand docs on directory structure by @muellerzr in #1571

  • Perminant solution by @muellerzr in #1577

  • fix the bug in xpu by @mingxiaoh in #1508

  • Make sure that we only set is_accelerator_prepared on items accelerate actually prepares by @muellerzr in #1578

  • Expand prepare() doc by @muellerzr in #1580

  • Get Torch version using importlib instead of pkg_resources by @catwell in #1585

  • improve oob performance when use mpirun to start DDP finetune without accelerate launch by @sywangyi in #1575

  • Update training_tpu.mdx by @LiamSwayne in #1582

  • Return false if CUDA available by @muellerzr in #1581

  • fix logger level by @caopulan in #1579

  • Fix test by @muellerzr in #1586

  • Update checkpoint.mdx by @LiamSwayne in #1587

  • FSDP updates by @pacman100 in #1576

  • Update modeling.py by @ain-soph in #1595

  • Integration tests by @muellerzr in #1593

  • Add triggers for CI workflow by @muellerzr in #1597

  • Remove asking xpu plugin for non xpu devices by @abhilash1910 in #1594

  • Remove GPU safetensors env variable by @sgugger in #1603

  • reset end_of_dataloader for dataloader_dispatcher by @megavaz in #1609

  • fix for arc gpus by @abhilash1910 in #1615

  • Ignore low_zero option when only device is available by @sgugger in #1617

  • Fix failing multinode tests by @muellerzr in #1616

  • Doc to md by @sgugger in #1618

  • Fix tb issue by @muellerzr in #1623

  • Fix workflow by @muellerzr in #1625

  • Fix transformers sync bug with accumulate by @muellerzr in #1624

  • fixes offload dtype by @SunMarc in #1631

  • fix: Megatron is not installed. please build it from source. by @yuanwu2017 in #1636

  • deepspeed z2/z1 state_dict bloating fix by @pacman100 in #1638

  • Swap disable rich by @muellerzr in #1640

  • fix autocasting bug by @pacman100 in #1637

  • fix modeling low zero by @abhilash1910 in #1634

  • Add skorch to runners by @muellerzr in #1646

  • add save model by @SunMarc in #1641

  • Change dispatch_model when we have only one device by @SunMarc in #1648

  • Doc save model by @SunMarc in #1650

  • Fix device_map by @SunMarc in #1651

  • Check for port usage before launch by @muellerzr in #1656

  • [BigModeling] Add missing check for quantized models by @younesbelkada in #1652

  • Bump integration by @muellerzr in #1658

  • TIL by @muellerzr in #1657

  • docker cpu py version by @muellerzr in #1659

  • [BigModeling] Final fix for dispatch int8 and fp4 models by @younesbelkada in #1660

  • remove safetensor dep on shard_checkpoint by @SunMarc in #1664

  • change the import place to avoid import error by @pacman100 in #1653

  • Update broken Runhouse link in examples/README.md by @dongreenberg in #1668

  • Bnb quantization by @SunMarc in #1626

  • replace save funct in doc by @SunMarc in #1672

  • Doc big model inference by @SunMarc in #1670

  • Add docs for saving Transformers models by @deppen8 in #1671

  • fix bnb tests by @SunMarc in #1679

  • Fix workflow CI by @muellerzr in #1690

  • remove duplicate class by @SunMarc in #1691

  • update readme in examples by @statelesshz in #1678

  • Fix nightly tests by @muellerzr in #1696

  • Fixup docs by @muellerzr in #1697

  • Improve quality errors by @muellerzr in #1698

  • Move mixed precision wrapping ahead of DDP/FSDP wrapping by @ChenWu98 in #1682

  • Add offload for 8-bit model by @SunMarc in #1699

  • Deepcopy on Accelerator to return self by @muellerzr in #1694

  • Update tracking.md by @stevhliu in #1702

  • Skip tests when bnb isn't available by @muellerzr in #1706

  • Fix launcher validation by @abhilash1910 in #1705

  • Fixes for issue #1683: failed to run accelerate config in colab by @Erickrus in #1692

  • Fix the bug where DataLoaderDispatcher gets stuck in an infinite wait when the dataset is an IterDataPipe during multi-process training. by @yuxinyuan in #1709

  • add multi_gpu decorator by @SunMarc in #1712

  • Modify loading checkpoint behavior by @SunMarc in #1715

  • fix version by @SunMarc in #1701

  • Keep old behavior by @muellerzr in #1716

  • Optimize get_scale to reduce async calls by @muellerzr in #1718

  • Remove duplicate code by @muellerzr in #1717

  • New tactic by @muellerzr in #1719

  • add Comfy-UI by @pacman100 in #1723

  • add compatibility with peft by @SunMarc in #1725

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @LiamSwayne
    • Update launch.mdx (#1553)
    • Update README.md (#1556)
    • Update big_modeling.mdx (#1564)
    • Update training_tpu.mdx (#1582)
    • Update checkpoint.mdx (#1587)
  • @mingxiaoh
    • fix the bug in xpu (#1508)
  • @statelesshz
    • update readme in examples (#1678)
    • Add Ascend NPU accelerator support (#1676)
  • @ChenWu98
    • Move mixed precision wrapping ahead of DDP/FSDP wrapping (#1682)

Fetched April 7, 2026