Use any of your favorite logging libraries (TensorBoard, Wandb, CometML...) with just a few lines of code inside your training scripts with Accelerate. All details are in the documentation.
PyTorch recently released a new model wrapper for sharded DDP training called FSDP. This release adds support for it (note that it doesn't work with mixed precision yet). See all caveats in the documentation.
Say goodbye to the CUDA OOM errors with the new find_executable_batch_size decorator. Just decorate your training function and pick a starting batch size, then let Accelerate do the rest.
The Accelerate examples are now split in two: you can find in the base folder a very simple nlp and computer vision examples, as well as complete versions incorporating all features. But you can also browse the examples in the by_feature subfolder, which will show you exactly what code to add for each given feature (checkpointing, tracking, cross-validation etc.)
mixed_precision for launch command by @sgugger in https://github.com/huggingface/accelerate/pull/300lr_scheduler to Accelerator.prepare by @sgugger in https://github.com/huggingface/accelerate/pull/301Full Changelog: https://github.com/huggingface/accelerate/compare/v0.6.0...v0.7.0
The launcher was ignoring the mixed precision attribute of the config since v0.6.0. This patch fixes that.
Patches an issue with mixed precision (see #286)
This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.
Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint) and reload everything by calling accelerator.load_state(path_to_checkpoint)
Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16 argument has been deprecated to be replaced by the more generic --mixed-precision.
You can now type accelerate env to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!
The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.
store_true on argparse in nlp example by @monologg in https://github.com/huggingface/accelerate/pull/183set_to_none in Optimizer.zero_grad by @sgugger in https://github.com/huggingface/accelerate/pull/189debug_launcher by @sgugger in https://github.com/huggingface/accelerate/pull/259Full Changelog: https://github.com/huggingface/accelerate/compare/v0.5.1...v0.6.0
Fix the two following bugs:
convert_to_fp32 returned booleans instead of tensors #173dispatch_batches=True #175This release introduces support for iterating through a DataLoader only on the main process, that then dispatches the batches to all processes.
The motivation behind this come from dataset streaming which introduces two difficulties:
This new feature is activated by default for all IterableDataset.
dispatch_batches #168 (@sgugger)This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.
It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config and then launching your script with accelerate launch is enough, there are no changes in the main API.
accelerate test with no config file #79 (@cccntu)optimizer for consistency #81 (@kumapo)unscale_gradients method. #88 (@sgugger)OptimWrapper init #127 (@sgugger)After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).
This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus in the call to notebook_launcher.
Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!
set_to_none to AcceleratedOptimizer.zero_grad #43 (@sgugger)Fix a bug preventing the load of a config with accelerate launch
It's now possible to launch your training script on AWS instances using SageMaker via accelerate launch.
To customize how the different objects used for mixed precision or distributed training are instantiated, a new API called KwargsHandler is added. This allows the user to pass along the kwargs that will be passed to those objects if used (and it is ignored if those are not used in the current setup, so the script can still run on any kind of setup).
Trying to gather tensors that are not of the same size across processes resulted in a process hang, a new method Accelerator.pad_across_processes has been added to help with that.