timm shifted focus toward production robustness and distributed training compatibility. Security hardening of checkpoint loading now defaults to weights_only=True, while attention handling across ViT and EVA models received refinement for masked and causal tasks. The library also tuned optimizers (Muon, AdamP, SGDP) for distributed scenarios like FSDP2 and DTensor, and removed torch.jit usage ahead of its official deprecation. A breaking change in ParallelScalingBlock QKV bias handling landed in v1.0.25, though it doesn't affect released model weights.
Focused on robustness and efficiency across core vision transformers. Tightened pickle loading security with weights_only=True by default and overhauled attention mask handling for ViT/EVA models to properly resolve boolean masks and propagate causal flags for self-supervised tasks. Added Patch Representation Refinement as a pooling option and optimized Hiera's scaled dot-product attention to unlock Flash Attention kernels.