v0.34.0: StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!
safetensors version 0.4.3.numpy 2.0.0accelerate library will handle this automatically with accelerator.end_training(), or you can do it manually using PartialState().destroy_process_group().transfer_to_npu, ensuring better performance and compatibility.StatefulDataLoader from torchdata, allowing better handling of data loading states. Enable by passing use_stateful_dataloader=True to the DataLoaderConfiguration, and when calling load_state() the DataLoader will automatically be resumed from its last step, no more having to iterate through passed batches.prepare_data_loader() function is now independent of the Accelerator, giving you more flexibility towards which API levels you would like to use.DataLoader states, ensuring smoother training sessions.set_epoch function for MpDeviceLoaderWrapper.TransformerEngine FP8 training, including better defaults for the quantized FP8 weights.TransformerEngine integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with raw TransformersEngine, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them hereTransformerEngine and accelerate as well. Use docker pull huggingface/accelerate@gpu-fp8-transformerengine to quickly get an environment going.torchpippy no more, long live torch.distributed.pipeliningtorchpippy is now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on[1, n, n] rather than [2, n, n] as before.pipelining no longer supports encoder/decoder models, so the t5 example has been removed.torchpippy potentially if needed.FullyShardedDataParallelPlugin yourself manually with no need for environment patching:from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)
accelerate launch and need to ensure the env variables are setup properly for model loading:from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
enable_fsdp_ram_efficient_loading()
axolotl library, so very big kudos to their wonderful workstep when loading the state by @muellerzr in https://github.com/huggingface/accelerate/pull/2992find_tied_params for models with shared layers by @qubvel in https://github.com/huggingface/accelerate/pull/2986transformer_engine on import by @oraluben in https://github.com/huggingface/accelerate/pull/3056skip_first_batches support for StatefulDataloader and fix all the tests by @muellerzr in https://github.com/huggingface/accelerate/pull/3068step when loading the state by @muellerzr in https://github.com/huggingface/accelerate/pull/2992find_tied_params for models with shared layers by @qubvel in https://github.com/huggingface/accelerate/pull/2986end_training by @SunMarc in https://github.com/huggingface/accelerate/pull/3012torchdata.stateful_dataloader.StatefulDataLoader within the Accelerator by @byi8220 in https://github.com/huggingface/accelerate/pull/2895prepare_data_loader() from Accelerator by @siddk in https://github.com/huggingface/accelerate/pull/3047transformer_engine on import by @oraluben in https://github.com/huggingface/accelerate/pull/3056skip_first_batches support for StatefulDataloader and fix all the tests by @muellerzr in https://github.com/huggingface/accelerate/pull/3068Fetched April 7, 2026