v1.10.0: N-D Parallelism
Training large models across multiple GPUs can be complex, especially when combining different parallelism strategies (e.g TP, CP, DP). To simplify this process, we've collaborated with Axolotl to introduce an easy-to-use integration that allows you to apply any combination of parallelism strategies directly in your training script. Just pass a ParallelismConfig specifying the size of each parallelism type—it's that simple.
Learn more about how it works in our latest blogpost.
parallelism_config = ParallelismConfig(
dp_shard_size=2,
dp_replicate_size=2,
cp_size=2,
tp_size=2,
)
accelerator = Accelerator(
parallelism_config=parallelism_config,
...
)
model = AutoModelForCausalLM.from_pretrained("your-model-name", device_mesh=accelerator.torch_device_mesh)
model = accelerator.prepare(model)
ParallelismConfig from PartialState by @SunMarc in https://github.com/huggingface/accelerate/pull/3720We've fixed ignored modules attribute. With this, it is now possible to train PEFT model that moe layers that contrains q_proj and v_proj parameters. This is especially important for fine-tuning gpt-oss model.
Full Changelog: https://github.com/huggingface/accelerate/compare/v1.9.0...v1.10.0
Fetched April 7, 2026