releases.shpreview

v0.11.0

v0.11.0 Gradient Accumulation and SageMaker Data Parallelism

$npx -y @buildinternet/releases show rel_IZtyZ9NOD_APVcakSZPbp

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along gradient_accumulation_steps=xxx when instantiating the Accelerator and put all your training loop step under a with accelerator.accumulate(model):. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the documentation.

  • Add gradient accumulation doc by @muellerzr in #511
  • Make gradient accumulation work with dispatched dataloaders by @muellerzr in #510
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

  • SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by @pacman100 in #504
  • SageMaker DP Support by @pacman100 in #494

What's new?

  • Fix accelerate tests command by @sgugger in #528
  • FSDP integration enhancements and fixes by @pacman100 in #522
  • Warn user if no trackers are installed by @muellerzr in #524
  • Fixup all example CI tests and properly fail by @muellerzr in #517
  • fixing deepspeed multi-node launcher by @pacman100 in #514
  • Add special Parameters modules support by @younesbelkada in #519
  • Don't unwrap in save_state() by @cccntu in #489
  • Fix a bug when reduce a tensor. by @wwhio in #513
  • Add benchmarks by @sgugger in #506
  • Fix DispatchDataLoader length when split_batches=True by @sgugger in #509
  • Fix scheduler in gradient accumulation example by @muellerzr in #500
  • update dataloader wrappers to have total_batch_size attribute by @pacman100 in #493
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484
  • add use_distributed property by @ZhiyuanChen in #487
  • fixing fsdp autowrap functionality by @pacman100 in #475
  • Use datasets 2.2.0 for now by @muellerzr in #481
  • Rm gradient accumulation on TPU by @muellerzr in #479
  • Revert "Pin datasets for now by @muellerzr in #477)"
  • Pin datasets for now by @muellerzr in #477
  • Some typos and cosmetic fixes by @douwekiela in #472
  • Fix when TPU device check is ran by @muellerzr in #469
  • Refactor Utility Documentation by @muellerzr in #467
  • Add docbuilder to quality by @muellerzr in #468
  • Expose some is_*_available utils in docs by @muellerzr in #466
  • Cleanup CI Warnings by @muellerzr in #465
  • Link CI slow runners to the commit by @muellerzr in #464
  • Fix subtle bug in BF16 by @muellerzr in #463
  • Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by @muellerzr in #462
  • Handle bfloat16 weights in disk offload without adding memory overhead by @noamwies in #460)
  • Handle bfloat16 weights in disk offload by @sgugger in #460
  • Raise a clear warning if a user tries to modify the AcceleratorState by @muellerzr in #458
  • Right step point by @muellerzr in #459
  • Better checks for if a TPU device exists by @muellerzr in #456
  • Offload and modules with unused submodules by @sgugger in #442

Fetched April 7, 2026