Releases1Avg0/wkVersionsv1.13.0

v0.7.0: Logging API, FSDP, batch size finder and examples revamp

Logging API

Use any of your favorite logging libraries (TensorBoard, Wandb, CometML...) with just a few lines of code inside your training scripts with Accelerate. All details are in the documentation.

Add logging capabilities by @muellerzr in https://github.com/huggingface/accelerate/pull/293

Support for FSDP (fully sharded DataParallel)

PyTorch recently released a new model wrapper for sharded DDP training called FSDP. This release adds support for it (note that it doesn't work with mixed precision yet). See all caveats in the documentation.

PyTorch FSDP Feature Incorporation by @pacman100 in https://github.com/huggingface/accelerate/pull/321

Batch size finder

Say goodbye to the CUDA OOM errors with the new find_executable_batch_size decorator. Just decorate your training function and pick a starting batch size, then let Accelerate do the rest.

Add a memory-aware decorator for CUDA OOM avoidance by @muellerzr in https://github.com/huggingface/accelerate/pull/324

Examples revamp

The Accelerate examples are now split in two: you can find in the base folder a very simple nlp and computer vision examples, as well as complete versions incorporating all features. But you can also browse the examples in the by_feature subfolder, which will show you exactly what code to add for each given feature (checkpointing, tracking, cross-validation etc.)

Refactor Examples by Feature by @muellerzr in https://github.com/huggingface/accelerate/pull/312

What's Changed

Document save/load state by @muellerzr in https://github.com/huggingface/accelerate/pull/290
Refactor precisions to its own enum by @muellerzr in https://github.com/huggingface/accelerate/pull/292
Load model and optimizet states on CPU to void OOMs by @sgugger in https://github.com/huggingface/accelerate/pull/299
Fix example for datasets v2 by @sgugger in https://github.com/huggingface/accelerate/pull/298
Leave default as None in mixed_precision for launch command by @sgugger in https://github.com/huggingface/accelerate/pull/300
Pass lr_scheduler to Accelerator.prepare by @sgugger in https://github.com/huggingface/accelerate/pull/301
Create new TestCase classes and clean up W&B tests by @muellerzr in https://github.com/huggingface/accelerate/pull/304
Have custom trackers work with the API by @muellerzr in https://github.com/huggingface/accelerate/pull/305
Write tests for comet_ml by @muellerzr in https://github.com/huggingface/accelerate/pull/306
Fix training in DeepSpeed by @sgugger in https://github.com/huggingface/accelerate/pull/308
Update example scripts by @muellerzr in https://github.com/huggingface/accelerate/pull/307
Use --no_local_rank for DeepSpeed launch by @sgugger in https://github.com/huggingface/accelerate/pull/309
Fix Accelerate CLI CPU option + small fix for W&B tests by @muellerzr in https://github.com/huggingface/accelerate/pull/311
Fix DataLoader sharding for deepspeed in accelerate by @m3rlin45 in https://github.com/huggingface/accelerate/pull/315
Create a testing framework for example scripts and fix current ones by @muellerzr in https://github.com/huggingface/accelerate/pull/313
Refactor Tracker logic and write guards for logging_dir by @muellerzr in https://github.com/huggingface/accelerate/pull/316
Create Cross-Validation example by @muellerzr in https://github.com/huggingface/accelerate/pull/317
Create alias for Accelerator.free_memory by @muellerzr in https://github.com/huggingface/accelerate/pull/318
fix typo in docs of accelerate tracking by @loubnabnl in https://github.com/huggingface/accelerate/pull/320
Update examples to show how to deal with extra validation copies by @muellerzr in https://github.com/huggingface/accelerate/pull/319
Fixup all checkpointing examples by @muellerzr in https://github.com/huggingface/accelerate/pull/323
Introduce reduce operator by @muellerzr in https://github.com/huggingface/accelerate/pull/326

New Contributors

@m3rlin45 made their first contribution in https://github.com/huggingface/accelerate/pull/315
@loubnabnl made their first contribution in https://github.com/huggingface/accelerate/pull/320
@pacman100 made their first contribution in https://github.com/huggingface/accelerate/pull/321

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.6.0...v0.7.0

v0.6.2: Fix launcher with mixed precision

The launcher was ignoring the mixed precision attribute of the config since v0.6.0. This patch fixes that.

Patches an issue with mixed precision (see #286)

v0.6.0: Checkpointing and bfloat16 support

This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.

Checkpoints

Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint) and reload everything by calling accelerator.load_state(path_to_checkpoint)

Add in checkpointing capability by @muellerzr in https://github.com/huggingface/accelerate/pull/255
Implementation of saving and loading custom states by @muellerzr in https://github.com/huggingface/accelerate/pull/270

BFloat16 support

Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16 argument has been deprecated to be replaced by the more generic --mixed-precision.

Add bfloat16 support #243 by @ikergarcia1996 in https://github.com/huggingface/accelerate/pull/247

New env subcommand

You can now type accelerate env to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!

add env command by @johnnv1 in https://github.com/huggingface/accelerate/pull/280

New doc frontend

The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.

Convert documentation to the new front by @sgugger in https://github.com/huggingface/accelerate/pull/271

What's Changed

Fix send_to_device with non-tensor data by @sgugger in https://github.com/huggingface/accelerate/pull/177
Handle UserDict in all utils by @sgugger in https://github.com/huggingface/accelerate/pull/179
Use collections.abc.Mapping to handle both the dict and the UserDict types by @mariosasko in https://github.com/huggingface/accelerate/pull/180
fix: use store_true on argparse in nlp example by @monologg in https://github.com/huggingface/accelerate/pull/183
Update README.md by @TevenLeScao in https://github.com/huggingface/accelerate/pull/187
Add signature check for set_to_none in Optimizer.zero_grad by @sgugger in https://github.com/huggingface/accelerate/pull/189
fix typo in code snippet by @MrZilinXiao in https://github.com/huggingface/accelerate/pull/199
Add high-level API reference to README by @Chris-hughes10 in https://github.com/huggingface/accelerate/pull/204
fix rng_types in accelerator by @s-kumano in https://github.com/huggingface/accelerate/pull/206
Pass along drop_last in DispatchDataLoader by @sgugger in https://github.com/huggingface/accelerate/pull/212
Rename state to avoid name conflicts with pytorch's Optimizer class. by @yuxinyuan in https://github.com/huggingface/accelerate/pull/224
Fix lr scheduler num samples by @sgugger in https://github.com/huggingface/accelerate/pull/227
Add customization point for init_process_group kwargs by @sgugger in https://github.com/huggingface/accelerate/pull/228
Fix typo in installation docs by @jaketae in https://github.com/huggingface/accelerate/pull/234
make deepspeed optimizer match parameters of passed optimizer by @jmhessel in https://github.com/huggingface/accelerate/pull/246
Upgrade black to version ~=22.0 by @LysandreJik in https://github.com/huggingface/accelerate/pull/250
add support of gather_object by @ZhiyuanChen in https://github.com/huggingface/accelerate/pull/238
Add launch flags --module and --no_python (#256) by @parameter-concern in https://github.com/huggingface/accelerate/pull/258
Accelerate + Animus/Catalyst = 🚀 by @Scitator in https://github.com/huggingface/accelerate/pull/249
Add debug_launcher by @sgugger in https://github.com/huggingface/accelerate/pull/259
enhance compatibility of honor type by @ZhiyuanChen in https://github.com/huggingface/accelerate/pull/241
Add a flag to use CPU only in the config by @sgugger in https://github.com/huggingface/accelerate/pull/263
Basic fixes for DeepSpeed by @sgugger in https://github.com/huggingface/accelerate/pull/264
Ability to set the seed with randomness from inside Accelerate by @muellerzr in https://github.com/huggingface/accelerate/pull/266
Don't use dispatch_batches when torch is < 1.8.0 by @sgugger in https://github.com/huggingface/accelerate/pull/269
Make accelerated model with AMP possible to pickle by @BenjaminBossan in https://github.com/huggingface/accelerate/pull/274
Contributing guide by @LysandreJik in https://github.com/huggingface/accelerate/pull/254
replace texts and link (master -> main) by @johnnv1 in https://github.com/huggingface/accelerate/pull/282
Use workflow from doc-builder by @sgugger in https://github.com/huggingface/accelerate/pull/275
Pass along execution info to the exit of autocast by @sgugger in https://github.com/huggingface/accelerate/pull/284

New Contributors

@mariosasko made their first contribution in https://github.com/huggingface/accelerate/pull/180
@monologg made their first contribution in https://github.com/huggingface/accelerate/pull/183
@TevenLeScao made their first contribution in https://github.com/huggingface/accelerate/pull/187
@MrZilinXiao made their first contribution in https://github.com/huggingface/accelerate/pull/199
@Chris-hughes10 made their first contribution in https://github.com/huggingface/accelerate/pull/204
@s-kumano made their first contribution in https://github.com/huggingface/accelerate/pull/206
@yuxinyuan made their first contribution in https://github.com/huggingface/accelerate/pull/224
@jaketae made their first contribution in https://github.com/huggingface/accelerate/pull/234
@jmhessel made their first contribution in https://github.com/huggingface/accelerate/pull/246
@ikergarcia1996 made their first contribution in https://github.com/huggingface/accelerate/pull/247
@ZhiyuanChen made their first contribution in https://github.com/huggingface/accelerate/pull/238
@parameter-concern made their first contribution in https://github.com/huggingface/accelerate/pull/258
@Scitator made their first contribution in https://github.com/huggingface/accelerate/pull/249
@muellerzr made their first contribution in https://github.com/huggingface/accelerate/pull/255
@BenjaminBossan made their first contribution in https://github.com/huggingface/accelerate/pull/274
@johnnv1 made their first contribution in https://github.com/huggingface/accelerate/pull/280

Full Changelog: https://github.com/huggingface/accelerate/compare/v0.5.1...v0.6.0

v0.5.1: Patch release

Fix the two following bugs:

convert_to_fp32 returned booleans instead of tensors #173
wrong dataloader lenght when dispatch_batches=True #175

v0.5.0 Dispatch batches from main DataLoader

This release introduces support for iterating through a DataLoader only on the main process, that then dispatches the batches to all processes.

Dispatch batches from main DataLoader

The motivation behind this come from dataset streaming which introduces two difficulties:

there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.

This new feature is activated by default for all IterableDataset.

Central dataloader #164 (@sgugger)
Dynamic default for dispatch_batches #168 (@sgugger)

Various fixes

fix fp16 covert back to fp32 for issue: unsupported operand type(s) for /: 'dict' and 'int' #149 (@Doragd)
[Docs] Machine config is yaml not json #151 (@patrickvonplaten)
Fix gather for 0d tensor #152 (@sgugger)
[DeepSpeed] allow untested optimizers deepspeed #150 (@patrickvonplaten)
Raise errors instead of warnings with better tests #170 (@sgugger)

v0.4.0 Experimental DeepSpeed and multi-node CPU support

v0.4.0 Experimental DeepSpeed support

This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.

It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config and then launching your script with accelerate launch is enough, there are no changes in the main API.

DeepSpeed support

Add DeepSpeed support #82 (@vasudevgupta7)
DeepSpeed documentation #140 (@sgugger)

Multinode CPU support

Add distributed multi-node cpu only support (MULTI_CPU) #63 (@ddkalamk)

Various fixes

Fix batch_sampler error for IterableDataset #62 (@ddkalamk)
Honor namedtuples in inputs/outputs #67 (@sgugger)
Fix examples README #70 (@cccntu)
TPU not available in kaggle #73 (@yuangan)
Pass args in notebook_launcher for multi-GPU #78 (@sgugger)
Fix accelerate test with no config file #79 (@cccntu)
Use optimizer for consistency #81 (@kumapo)
Update README.md #87 (@Separius)
Add unscale_gradients method. #88 (@sgugger)
Add Accelerator.free_memory #89 (@sgugger)
[Feature] Add context manager to allow main process first. #98 (@Guillem96)
Pass along kwargs to backward #104 (@sgugger)
Add course banner #107 (@sgugger)
added closure argument to optimizer.step() #105 (@pmelchior)
Fix import error for torch 1.4.0 #108 (@sgugger)
Unwrap optimizer before unscaling #115 (@sgugger)
Fix DataLoader length when split_batches=True #121 (@sgugger)
Fix OptimWrapper init #127 (@sgugger)
Fix fp16 by converting outputs back to FP32 #134 (@sgugger)
Add caveat on weight-tying on TPUs #138 (@sgugger)
Add optimizer not stepped property #139 (@sgugger)

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).

This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus in the call to notebook_launcher.

Notebook launcher #44 (@sgugger)
Add notebook/colab example #52 (@sgugger)
Support for multi-GPU in notebook_launcher #56 (@sgugger)

Multi-node training

Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!

fix cluster.py indent error #35 (@JTT94)
Set all defaults from config in launcher #38 (@sgugger)
Fix port in config creation #50 (@sgugger)

Various bug fixes

Fix typos in examples README #28 (@arjunchandra)
Fix load from config #31 (@sgugger)
docs: minor spelling tweaks #33 (@brettkoonce)
Add set_to_none to AcceleratedOptimizer.zero_grad #43 (@sgugger)
fix #53 #54 (@Guitaricet)
update launch.py #58 (@Jesse1eung)

v0.2.1: Patch release

Fix a bug preventing the load of a config with accelerate launch

v0.2.0 SageMaker launcher

SageMaker launcher

It's now possible to launch your training script on AWS instances using SageMaker via accelerate launch.

Launch script on SageMaker #26 (@philschmid )
Add defaults for compute_environmnent #23 (@sgugger )
Add Configuration setup for SageMaker #17 (@philschmid )

Kwargs handlers

To customize how the different objects used for mixed precision or distributed training are instantiated, a new API called KwargsHandler is added. This allows the user to pass along the kwargs that will be passed to those objects if used (and it is ignored if those are not used in the current setup, so the script can still run on any kind of setup).

Add KwargsHandlers #15 (@sgugger )

Pad across processes

Trying to gather tensors that are not of the same size across processes resulted in a process hang, a new method Accelerator.pad_across_processes has been added to help with that.

Add utility to pad tensor across processes to max length #19 (@sgugger )

Various bug fixes

added thumbnail #25 (@philschmid )
Cleaner diffs in README and index #22 (@sgugger )
Use proper size #21 (@sgugger )
Alternate diff #20 (@sgugger )
Add YAML config support #16 (@sgugger )
Don't error on non-Tensors objects in move to device #13 (@sgugger )
Add CV example #10 (@sgugger )
Readme clean-up #9 (@thomwolf )
More flexible RNG synchronization #8 (@sgugger )
Fix typos and tighten grammar in README #7 (@lewtun )
Update README.md #6 (@voidful )
Fix TPU training in example #4 (@thomwolf )
Fix example name in README #3 (@LysandreJik )

v0.1.0 Initial release

Initial release of 🤗 Accelerate. Checkout the main README or the docs to learn more about it!

Accelerate

Logging API

Support for FSDP (fully sharded DataParallel)

Batch size finder

Examples revamp

What's Changed

New Contributors

Checkpoints

BFloat16 support

New env subcommand

New doc frontend

What's Changed

New Contributors

Dispatch batches from main DataLoader

Various fixes

v0.4.0 Experimental DeepSpeed support

DeepSpeed support

Multinode CPU support

Various fixes

Notebook launcher

Multi-node training

Various bug fixes

SageMaker launcher

Kwargs handlers

Pad across processes

Various bug fixes

More from this team

Similar releases

Other sources from this team

Similar sources

Similar releases

More from this team

Other sources from this team

Similar sources