Oct 16-20, 2025
- Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if
nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
- like torch impl, select from several LR scale adjustment fns via
adjust_lr_fn
- select from several NS coefficient presets or specify your own via
ns_coefficients
- First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in
timm
- License fields added to pretrained cfgs in code
- Release 1.0.21
What's Changed
New Contributors
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.20...v1.0.21