releases.shpreview

v0.20.0

v0.20.0: MPS and fp4 support on Big Model Inference, 4-bit QLoRA, Intel GPU, Distributed Inference, and much more!

$npx -y @buildinternet/releases show rel_HPd44Sior_GZeeHkrx5Iq

Big model inference

Support has been added to run device_map="auto" on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.

  • Add mps support to big inference modeling by @SunMarc in #1545
  • Adds fp4 support for model dispatching by @younesbelkada in #1505

4-bit QLoRA Support

  • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458

Distributed Inference Utilities

This version introduces a new Accelerator.split_between_processes utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more here

Introduce XPU support for Intel GPU

  • Intel GPU support initialization by @abhilash1910 in #1118

Add support for the new PyTorch XLA TPU runtime

  • Accelerate now supports the latest TPU runtimes #1393, #1385

A new optimizer method: LocalSGD

  • This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by @searchivarius in #1378

Papers with 🤗 Accelerate

  • We now have an entire section of the docs dedicated to official paper implementations and citations using the framework #1399, see it live here

Breaking changes

logging_dir has been fully deprecated, please use project_dir or a Project_configuration

What's new?

  • use existing mlflow experiment if exists by @Rusteam in #1403
  • changes required for DS integration by @pacman100 in #1406
  • fix deepspeed failing tests by @pacman100 in #1411
  • Make mlflow logging dir optional by @mattplo-decath in #1413
  • Fix bug on ipex for diffusers by @abhilash1910 in #1426
  • Improve Slack Updater by @muellerzr in #1433
  • Let quality yell at the user if it's a version difference by @muellerzr in #1438
  • Ensure that it gets installed by @muellerzr in #1439
  • [core] Introducing CustomDtype enum for custom dtypes by @younesbelkada in #1434
  • Fix XPU by @muellerzr in #1440
  • Make sure torch compiled model can also be unwrapped by @patrickvonplaten in #1437
  • fixed: ZeroDivisionError: division by zero by @sreio in #1436
  • fix potential OOM when resuming with multi-GPU training by @exhyy in #1444
  • Fixes in infer_auto_device_map by @sgugger in #1441
  • Raise error when logging improperly by @muellerzr in #1446
  • Fix ci by @muellerzr in #1447
  • Distributed prompting/inference utility by @muellerzr in #1410
  • Add to by @muellerzr in #1448
  • split_between_processes by @stevhliu in #1449
  • [docs] Replace state.rank -> process_index by @pcuenca in #1450
  • Auto multigpu logic by @muellerzr in #1452
  • Update with cli instructions by @muellerzr in #1453
  • Adds in_order argument that defaults to False, to log in order. by @JulesGM in #1262
  • fix error for CPU DDP using trainer api. by @sywangyi in #1455
  • Refactor and simplify xpu device in state by @abhilash1910 in #1456
  • Document how to use commands with python module instead of argparse by @muellerzr in #1457
  • 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
  • Fix skip first batch being perminant by @muellerzr in #1466
  • update conversion of layers to retain original data type. by @avisinghal6 in #1467
  • Check for xpu specifically by @muellerzr in #1472
  • update register_empty_buffer to match torch args by @NouamaneTazi in #1465
  • Update gradient accumulation docs, and remove redundant example by @iantbutler01 in #1461
  • Imrpove sagemaker by @muellerzr in #1470
  • Split tensors as part of split_between_processes by @muellerzr in #1477
  • Move to device by @muellerzr in #1478
  • Fix gradient state bugs in multiple dataloader by @Ethan-yt in #1483
  • Add rdzv-backend by @muellerzr in #1490
  • Only use IPEX if available by @muellerzr in #1495
  • Update README.md by @lyhue1991 in #1493
  • Let gather_for_metrics always run by @muellerzr in #1496
  • Use empty like when we only need to create buffers by @thomasw21 in #1497
  • Allow key skipping in big model inference by @sgugger in #1491
  • fix crash when ipex is installed and torch has no xpu by @sywangyi in #1502
  • [bnb] Add fp4 support for dispatch by @younesbelkada in #1505
  • Fix 4bit model on multiple devices by @SunMarc in #1506
  • adjust overriding of model's forward function by @prathikr in #1492
  • Add assertion when call prepare with deepspeed config. by @tensimiku in #1468
  • NVME path support for deepspeed by @abhilash1910 in #1484
  • should set correct dtype to ipex optimize and use amp logic in native… by @sywangyi in #1511
  • Swap env vars for XPU and IPEX + CLI by @muellerzr in #1513
  • Fix a bug when parameters tied belong to the same module by @sgugger in #1514
  • Fixup deepspeed/cli tests by @muellerzr in #1526
  • Refactor mp into its own wrapper by @muellerzr in #1527
  • Check tied parameters by @SunMarc in #1529
  • Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by @muellerzr in #1531
  • Officially support naive PP for quantized models + PEFT by @younesbelkada in #1523
  • remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by @sywangyi in #1503
  • Prevent using extra VRAM for static device_map by @LSerranoPEReN in #1536
  • Update deepspeed.mdx by @LiamSwayne in #1541
  • Update performance.mdx by @LiamSwayne in #1543
  • Update deferring_execution.mdx by @LiamSwayne in #1544
  • Apply deprecations by @muellerzr in #1537
  • Add mps support to big inference modeling by @SunMarc in #1545
  • [documentation] grammar fixes in gradient_synchronization.mdx by @LiamSwayne in #1547
  • Eval mode by @muellerzr in #1540
  • Update migration.mdx by @LiamSwayne in #1549

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @will-cromar
    • Support TPU v4 with new PyTorch/XLA TPU runtime (#1393)
    • Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (#1385)
  • @searchivarius
    • Adding support for local SGD. (#1378)
  • @abhilash1910
    • Intel GPU support initialization (#1118)
    • Fix bug on ipex for diffusers (#1426)
    • Refactor and simplify xpu device in state (#1456)
    • NVME path support for deepspeed (#1484)
  • @sywangyi
    • fix error for CPU DDP using trainer api. (#1455)
    • fix crash when ipex is installed and torch has no xpu (#1502)
    • should set correct dtype to ipex optimize and use amp logic in native… (#1511)
    • remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (#1503)
  • @Ethan-yt
    • Fix gradient state bugs in multiple dataloader (#1483)

Fetched April 7, 2026

v0.20.0 — Accelerate — releases.sh