weights_only=True, add safe_global for ArgParse.is_causal through for SSL tasks.Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.25...v1.0.26
ParallelScalingBlock (& DiffParallelScalingBlock)
timm models but could impact downstream use.Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.24...v1.0.25
dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
timm variant of the CSATv2 model at 512x512 & 640x640
__init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
LsePlus and SimPoolDropBlock2d (also add support to ByobNet based models)Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.22...v1.0.24
dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
timm variant of the CSATv2 model at 512x512 & 640x640
__init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
LsePlus and SimPoolDropBlock2d (also add support to ByobNet based models)Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.22...v1.0.23
Patch release for priority LayerScale initialization regression in 1.0.21
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.21...v1.0.22
nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)adjust_lr_fnns_coefficientstimmhuggingface_hub integration by @Wauplin in https://github.com/huggingface/pytorch-image-models/pull/2592Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.20...v1.0.21
lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)timm model. ViT support done via the EVA base model w/ a new RotaryEmbeddingDinoV3 to match the DINOv3 specific RoPE impl
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.19...v1.0.20
Patch release for Python 3.9 compat break in 1.0.18
set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation timeFull Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18
set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation timeFull Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.17...v1.0.18
eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub| model | img_size | top1 | top5 | param_count |
|---|---|---|---|---|
| vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 |
| vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 |
| vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 |
| vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 |
| vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 |
| vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 |
| vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 |
| vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 |
| vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 |
| vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 |
| vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 |
| vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 |
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.16...v1.0.17
| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |
|---|---|---|---|---|
| naflexvit_base_patch16_par_gap.e300_s576_in1k | 83.67 | 96.45 | 86.63 | 576 |
| naflexvit_base_patch16_parfac_gap.e300_s576_in1k | 83.63 | 96.41 | 86.46 | 576 |
| naflexvit_base_patch16_gap.e300_s576_in1k | 83.50 | 96.46 | 86.63 | 576 |
forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVitpython validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256--naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training--naflex-max-seq-len argument sets the target sequence length for validation--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation--naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq lentimm weights
forward_intermediates() and some additional fixes thanks to https://github.com/brianhou0208
forward_intermediates() thanks to https://github.com/brianhou0208local-dir: pretrained schema, can use local-dir:/path/to/model/folder for model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.download argument from torch_kwargs for torchvision ImageNet class by @ryan-caesar-ramos in https://github.com/huggingface/pytorch-image-models/pull/2486head_dim reference in AttentionRope class of attention.py by @amorehead in https://github.com/huggingface/pytorch-image-models/pull/2519forward_intermediates() by @brianhou0208 in https://github.com/huggingface/pytorch-image-models/pull/2501Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.15...v1.0.16
vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k - 88.1% top-1vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k - 87.9% top-1vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k - 87.3% top-1vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12ktimm--input-size by @JosuaRieder in https://github.com/huggingface/pytorch-image-models/pull/2417Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.14...v1.0.15
vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1vit_so150m_patch16_reg4_gap_256.sbb_e250_in12kFull Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.13...v1.0.14
bfloat16 or float16wandb project name arg added by https://github.com/caojiaolong, use arg.experiment for nametorch.utils.checkpoint.checkpoint() wrapper in timm.models that defaults use_reentrant=False, unless TIMM_REENTRANT_CKPT=1 is set in env.convnext_nano 384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384vit_large_patch14_clip_224.dfn2b_s39bRmsNorm layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to SimpleNorm layer, it's LN w/o centering or bias. There were only two timm models using it, and they have been updated.cache_dir arg for model creationtrust_remote_code for HF datasets wrapperinception_next_atto model added by creatorhf-hub: based loading, and thus will work with new Transformers TimmWrapperModelQuickstart doc by @ariG23498 in https://github.com/huggingface/pytorch-image-models/pull/2381Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.12...v1.0.13
list_optimizers, get_optimizer_class, get_optimizer_info to reworked create_optimizer_v2 fn to explore optimizers, get info or classoptim.optim_factory, move fns to optim/_optim_factory.py and optim/_param_groups.py and encourage import via timm.optimAdd a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat
Full Changelog: https://github.com/huggingface/pytorch-image-models/compare/v1.0.11...v1.0.12
Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.
timm.models.registry, increased priority of existing deprecation warnings to be visibletimm as vit_intern300m_patch14_448| model | top1 | top5 | param_count | img_size |
|---|---|---|---|---|
| vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k | 87.438 | 98.256 | 64.11 | 384 |
| vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k | 86.608 | 97.934 | 64.11 | 256 |
| vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k | 86.594 | 98.02 | 60.4 | 384 |
| vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k | 85.734 | 97.61 | 60.4 | 256 |
| model | top1 | top5 | param_count | img_size |
|---|---|---|---|---|
| resnet50d.ra4_e3600_r224_in1k | 81.838 | 95.922 | 25.58 | 288 |
| efficientnet_b1.ra4_e3600_r240_in1k | 81.440 | 95.700 | 7.79 | 288 |
| resnet50d.ra4_e3600_r224_in1k | 80.952 | 95.384 | 25.58 | 224 |
| efficientnet_b1.ra4_e3600_r240_in1k | 80.406 | 95.152 | 7.79 | 240 |
| mobilenetv1_125.ra4_e3600_r224_in1k | 77.600 | 93.804 | 6.27 | 256 |
| mobilenetv1_125.ra4_e3600_r224_in1k | 76.924 | 93.234 | 6.27 | 224 |
Add SAM2 (HieraDet) backbone arch & weight loading support
Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k
| model | top1 | top5 | param_count |
|---|---|---|---|
| hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k | 84.912 | 97.260 | 35.01 |
| hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k | 84.560 | 97.106 | 35.01 |
mobilenet_edgetpu_v2_m weights w/ ra4 mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.| model | top1 | top1_err | top5 | top5_err | param_count | img_size |
|---|---|---|---|---|---|---|
| mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k | 84.99 | 15.01 | 97.294 | 2.706 | 32.59 | 544 |
| mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k | 84.772 | 15.228 | 97.344 | 2.656 | 32.59 | 480 |
| mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k | 84.64 | 15.36 | 97.114 | 2.886 | 32.59 | 448 |
| mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k | 84.314 | 15.686 | 97.102 | 2.898 | 32.59 | 384 |
| mobilenetv4_conv_aa_large.e600_r384_in1k | 83.824 | 16.176 | 96.734 | 3.266 | 32.59 | 480 |
| mobilenetv4_conv_aa_large.e600_r384_in1k | 83.244 | 16.756 | 96.392 | 3.608 | 32.59 | 384 |
| mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k | 82.99 | 17.01 | 96.67 | 3.33 | 11.07 | 320 |
| mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k | 82.364 | 17.636 | 96.256 | 3.744 | 11.07 | 256 |
| model | top1 | top1_err | top5 | top5_err | param_count | img_size |
|---|---|---|---|---|---|---|
| efficientnet_b0.ra4_e3600_r224_in1k | 79.364 | 20.636 | 94.754 | 5.246 | 5.29 | 256 |
| efficientnet_b0.ra4_e3600_r224_in1k | 78.584 | 21.416 | 94.338 | 5.662 | 5.29 | 224 |
| mobilenetv1_100h.ra4_e3600_r224_in1k | 76.596 | 23.404 | 93.272 | 6.728 | 5.28 | 256 |
| mobilenetv1_100.ra4_e3600_r224_in1k | 76.094 | 23.906 | 93.004 | 6.996 | 4.23 | 256 |
| mobilenetv1_100h.ra4_e3600_r224_in1k | 75.662 | 24.338 | 92.504 | 7.496 | 5.28 | 224 |
| mobilenetv1_100.ra4_e3600_r224_in1k | 75.382 | 24.618 | 92.312 | 7.688 | 4.23 | 224 |
set_input_size() added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.set_input_size, always_partition and strict_img_size args have been added to __init__ to allow more flexible input size constraintstiny < .5M param models for testing that are actually trained on ImageNet-1k| model | top1 | top1_err | top5 | top5_err | param_count | img_size | crop_pct |
|---|---|---|---|---|---|---|---|
| test_efficientnet.r160_in1k | 47.156 | 52.844 | 71.726 | 28.274 | 0.36 | 192 | 1.0 |
| test_byobnet.r160_in1k | 46.698 | 53.302 | 71.674 | 28.326 | 0.46 | 192 | 1.0 |
| test_efficientnet.r160_in1k | 46.426 | 53.574 | 70.928 | 29.072 | 0.36 | 160 | 0.875 |
| test_byobnet.r160_in1k | 45.378 | 54.622 | 70.572 | 29.428 | 0.46 | 160 | 0.875 |
| test_vit.r160_in1k | 42.0 | 58.0 | 68.664 | 31.336 | 0.37 | 192 | 1.0 |
| test_vit.r160_in1k | 40.822 | 59.178 | 67.212 | 32.788 | 0.37 | 160 | 0.875 |
| model | top1 | top1_err | top5 | top5_err | param_count | img_size |
|---|---|---|---|---|---|---|
| mobilenetv4_hybrid_large.ix_e600_r384_in1k | 84.356 | 15.644 | 96.892 | 3.108 | 37.76 | 448 |
| mobilenetv4_hybrid_large.ix_e600_r384_in1k | 83.990 | 16.010 | 96.702 | 3.298 | 37.76 | 384 |
| mobilenetv4_hybrid_medium.ix_e550_r384_in1k | 83.394 | 16.606 | 96.760 | 3.240 | 11.07 | 448 |
| mobilenetv4_hybrid_medium.ix_e550_r384_in1k | 82.968 | 17.032 | 96.474 | 3.526 | 11.07 | 384 |
| mobilenetv4_hybrid_medium.ix_e550_r256_in1k | 82.492 | 17.508 | 96.278 | 3.722 | 11.07 | 320 |
| mobilenetv4_hybrid_medium.ix_e550_r256_in1k | 81.446 | 18.554 | 95.704 | 4.296 | 11.07 | 256 |
timm trained weights added: