v0.27.0 — Accelerate

PyTorch 2.2.0 Support

With the latest release of PyTorch 2.2.0, we've guaranteed that there are no breaking changes regarding it

PyTorch-Native Pipeline Parallel Inference

With this release we are excited to announce support for pipeline-parallel inference by integrating PyTorch's PiPPy framework (so no need to use Megatron or DeepSpeed)! This supports automatic model-weight splitting to each device using a similar API to device_map="auto". This is still under heavy development, however the inference side is stable enough that we are ready for a release. Read more about it in our docs and check out the example zoo.

Requires pippy of version 0.2.0 or later (pip install torchpippy -U)

Example usage (combined with accelerate launch or torchrun):

from accelerate import PartialState, prepare_pippy
model = AutoModelForSequenceClassification.from_pretrained("gpt2")
model = prepare_pippy(model, split_points="auto", example_args=(input,))
input = input.to("cuda:0")
with torch.no_grad():
    output = model(input)
# The outputs are only on the final process by default
# You can pass in `gather_outputs=True` to prepare_pippy to
# make them available on all processes
if PartialState().is_last_process:
    output = torch.stack(tuple(output[0]))
    print(output.shape)

DeepSpeed

This release provides support for utilizing DeepSpeed on XPU devices thanks to @faaany

What's Changed

Convert model.hf_device_map back to Dict by @SunMarc in https://github.com/huggingface/accelerate/pull/2326
Fix model memory issue by @muellerzr in https://github.com/huggingface/accelerate/pull/2327
Fixed typos in readme files of docs folder. by @rishit5 in https://github.com/huggingface/accelerate/pull/2329
Disable P2P in just the 4000 series by @muellerzr in https://github.com/huggingface/accelerate/pull/2332
Avoid duplicating memory for tied weights in dispatch_model, and in forward with offloading by @fxmarty in https://github.com/huggingface/accelerate/pull/2330
Show DeepSpeed option when multi-XPU is selected in accelerate config by @faaany in https://github.com/huggingface/accelerate/pull/2346
FIX: add oneCCL environment variable for non-MPI launcher (accelerate launch) by @faaany in https://github.com/huggingface/accelerate/pull/2339
device agnostic test_accelerator/test_multigpu by @wangshuai09 in https://github.com/huggingface/accelerate/pull/2343
Fix mpi4py/failing deepspeed test issues by @muellerzr in https://github.com/huggingface/accelerate/pull/2353
Fix block_size picking in megatron_lm_gpt_pretraining example. by @nilq in https://github.com/huggingface/accelerate/pull/2342
Fix dispatch_model with tied weights test on T4 by @fxmarty in https://github.com/huggingface/accelerate/pull/2354
bugfix to allow usage of TE or MSAMP in FP8RecipeKwargs by @sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2355
Pin DeepSpeed until patch by @muellerzr in https://github.com/huggingface/accelerate/pull/2366
Remove init_hook_kwargs by @fxmarty in https://github.com/huggingface/accelerate/pull/2365
device agnostic optimizer testing by @statelesshz in https://github.com/huggingface/accelerate/pull/2363
add_hook_to_module and remove_hook_from_module compatibility with fx.GraphModule by @fxmarty in https://github.com/huggingface/accelerate/pull/2369
Adding requires_grad to kwargs when registering empty parameters. by @BlackSamorez in https://github.com/huggingface/accelerate/pull/2376
Add adapter_only option to save_fsdp_model and load_fsdp_model to only save/load PEFT weights by @AjayP13 in https://github.com/huggingface/accelerate/pull/2321
device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing by @wangshuai09 in https://github.com/huggingface/accelerate/pull/2356
Fix batch_size sanity check logic for split_batches by @izhx in https://github.com/huggingface/accelerate/pull/2344
Pin Torch version to <2.2.0 by @Rocketknight1 in https://github.com/huggingface/accelerate/pull/2394
Address PIP-632 deprecation of distutils by @AieatAssam in https://github.com/huggingface/accelerate/pull/2388
[don't merge yet] unpin torch by @ydshieh in https://github.com/huggingface/accelerate/pull/2406
Revert "[don't merge yet] unpin torch" by @muellerzr in https://github.com/huggingface/accelerate/pull/2407
Fix CI due to pytest by @muellerzr in https://github.com/huggingface/accelerate/pull/2408
Added activateEnviroment.sh to readme by @TJ-Solergibert in https://github.com/huggingface/accelerate/pull/2409
Fix XPU inference by @notsyncing in https://github.com/huggingface/accelerate/pull/2383
Fix the size of int and bool type when computing module size by @notsyncing in https://github.com/huggingface/accelerate/pull/2411
Adding Local SGD support for NPU by @statelesshz in https://github.com/huggingface/accelerate/pull/2415
Unpin torch by @muellerzr in https://github.com/huggingface/accelerate/pull/2418
Use Ruff for formatting too by @akx in https://github.com/huggingface/accelerate/pull/2400
torch-native pipeline parallelism for big models by @muellerzr in https://github.com/huggingface/accelerate/pull/2345
Update FSDP docs by @pacman100 in https://github.com/huggingface/accelerate/pull/2430
Make output end up on all GPUs at the end by @muellerzr in https://github.com/huggingface/accelerate/pull/2423
Migrate pippy examples over and run tests by @muellerzr in https://github.com/huggingface/accelerate/pull/2424
[FIX] fix the wrong nproc_per_node in the multi gpu test by @faaany in https://github.com/huggingface/accelerate/pull/2422
Fix fp8 things by @muellerzr in https://github.com/huggingface/accelerate/pull/2403
[FIX] allow Accelerator to prepare models in eval mode for XPU&CPU by @faaany in https://github.com/huggingface/accelerate/pull/2426
[Fix] make all tests pass on XPU by @faaany in https://github.com/huggingface/accelerate/pull/2427

New Contributors

@rishit5 made their first contribution in https://github.com/huggingface/accelerate/pull/2329
@faaany made their first contribution in https://github.com/huggingface/accelerate/pull/2346
@wangshuai09 made their first contribution in https://github.com/huggingface/accelerate/pull/2343
@nilq made their first contribution in https://github.com/huggingface/accelerate/pull/2342
@BlackSamorez made their first contribution in https://github.com/huggingface/accelerate/pull/2376
@AjayP13 made their first contribution in https://github.com/huggingface/accelerate/pull/2321
@Rocketknight1 made their first contribution in https://github.com/huggingface/accelerate/pull/2394
@AieatAssam made their first contribution in https://github.com/huggingface/accelerate/pull/2388
@ydshieh made their first contribution in https://github.com/huggingface/accelerate/pull/2406
@notsyncing made their first contribution in https://github.com/huggingface/accelerate/pull/2383
@akx made their first contribution in https://github.com/huggingface/accelerate/pull/2400

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.26.1...v0.27.0