releases.shpreview

v0.9.0

Release v0.9.0

$npx -y @buildinternet/releases show rel_oojkXC9U8KpGxKFkxyt3k

First non pre-release in a loooong while, changelog from 0.6.x below...

May 11, 2023

  • timm 0.9 released, transition from 0.8.xdev releases

May 10, 2023

  • Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in timm
  • DINOv2 vit feature backbone weights added thanks to Leng Yue
  • FB MAE vit feature backbone weights added
  • OpenCLIP DataComp-XL L/14 feat backbone weights added
  • MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
  • Experimental get_intermediate_layers function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
  • Model creation throws error if pretrained=True and no weights exist (instead of continuing with random initialization)
  • Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
  • bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use bnb prefix, ie bnbadam8bit
  • Misc cleanup and fixes
  • Final testing before switching to a 0.9 and bringing timm out of pre-release state

April 27, 2023

  • 97% of timm models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
  • Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.

April 21, 2023

  • Gradient accumulation support added to train script and tested (--grad-accum-steps), thanks Taeksang Kim
  • More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
  • Added --head-init-scale and --head-init-bias to train.py to scale classiifer head and set fixed bias for fine-tune
  • Remove all InplaceABN (inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).

April 12, 2023

  • Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
  • Refactor dropout args for vit and vit-like models, separate drop_rate into drop_rate (classifier dropout), proj_drop_rate (block mlp / out projections), pos_drop_rate (position embedding drop), attn_drop_rate (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
  • fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
  • Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.

April 5, 2023

  • ALL ResNet models pushed to Hugging Face Hub with multi-weight support
  • New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
    • resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
    • resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
    • seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
    • seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

March 31, 2023

  • Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
modeltop1top5img_sizeparam_countgmacsmacts
convnext_xxlarge.clip_laion2b_soup_ft_in1k88.61298.704256846.47198.09124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_38488.31298.578384200.13101.11126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_32087.96898.47320200.1370.2188.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_38487.13898.21238488.5945.2184.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k86.34497.9725688.5920.0937.55
  • Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
modeltop1top5param_countimg_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k90.05499.042305.08448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k89.94699.01305.08448
eva_giant_patch14_560.m30m_ft_in22k_in1k89.79298.9921014.45560
eva02_large_patch14_448.mim_in22k_ft_in1k89.62698.954305.08448
eva02_large_patch14_448.mim_m38m_ft_in1k89.5798.918305.08448
eva_giant_patch14_336.m30m_ft_in22k_in1k89.5698.9561013.01336
eva_giant_patch14_336.clip_ft_in1k89.46698.821013.01336
eva_large_patch14_336.in22k_ft_in22k_in1k89.21498.854304.53336
eva_giant_patch14_224.clip_ft_in1k88.88298.6781012.56224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k88.69298.72287.12448
eva_large_patch14_336.in22k_ft_in1k88.65298.722304.53336
eva_large_patch14_196.in22k_ft_in22k_in1k88.59298.656304.14196
eva02_base_patch14_448.mim_in22k_ft_in1k88.2398.56487.12448
eva_large_patch14_196.in22k_ft_in1k87.93498.504304.14196
eva02_small_patch14_336.mim_in22k_ft_in1k85.7497.61422.13336
eva02_tiny_patch14_336.mim_in22k_ft_in1k80.65895.5245.76336
  • Multi-weight and HF hub for DeiT and MLP-Mixer based models

March 22, 2023

  • More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
  • Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
  • FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
  • RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
  • More ImageNet-12k pretrained and 1k fine-tuned timm weights:
    • rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
    • rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
    • regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
    • regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
    • regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
  • Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
  • Minor bug fixes and improvements.

Feb 26, 2023

  • Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
  • Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
  • 0.8.15dev0

Feb 20, 2023

  • Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
  • 0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

  • safetensor checkpoint support added
  • Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
  • Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to vit_*, vit_relpos*, coatnet / maxxvit (to start)
  • Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
  • gradient checkpointing works with features_only=True

Feb 7, 2023

  • New inference benchmark numbers added in results folder.
  • Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
    • convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
    • convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
    • convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
    • convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
  • Add DaViT models. Supports features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo.
  • Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
  • Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
    • New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports features_only=True.
    • Minor updates to EfficientFormer.
    • Refactor LeViT models to stages, add features_only=True support to new conv variants, weight remap required.
  • Move ImageNet meta-data (synsets, indices) from /results to timm/data/_info.
  • Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in timm
    • Update inference.py to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
  • Ready for 0.8.10 pypi pre-release (final testing).

Jan 20, 2023

  • Add two convnext 12k -> 1k fine-tunes at 384x384

    • convnext_tiny.in12k_ft_in1k_384 - 85.1 @ 384
    • convnext_small.in12k_ft_in1k_384 - 86.2 @ 384
  • Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw base MaxViT and CoAtNet 1/2 models

modeltop1top5samples / secParams (M)GMACAct (M)
maxvit_xlarge_tf_512.in21k_ft_in1k88.5398.6421.76475.77534.141413.22
maxvit_xlarge_tf_384.in21k_ft_in1k88.3298.5442.53475.32292.78668.76
maxvit_base_tf_512.in21k_ft_in1k88.2098.5350.87119.88138.02703.99
maxvit_large_tf_512.in21k_ft_in1k88.0498.4036.42212.33244.75942.15
maxvit_large_tf_384.in21k_ft_in1k87.9898.5671.75212.03132.55445.84
maxvit_base_tf_384.in21k_ft_in1k87.9298.54104.71119.6573.80332.90
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k87.8198.37106.55116.1470.97318.95
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k87.4798.37149.49116.0972.98213.74
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k87.3998.31160.8073.8847.69209.43
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k86.8998.02375.86116.1423.1592.64
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k86.6498.02501.03116.0924.2062.77
maxvit_base_tf_512.in1k86.6097.9250.75119.88138.02703.99
coatnet_2_rw_224.sw_in12k_ft_in1k86.5797.89631.8873.8715.0949.22
maxvit_large_tf_512.in1k86.5297.8836.04212.33244.75942.15
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k86.4997.90620.5873.8815.1854.78
maxvit_base_tf_384.in1k86.2997.80101.09119.6573.80332.90
maxvit_large_tf_384.in1k86.2397.6970.56212.03132.55445.84
maxvit_small_tf_512.in1k86.1097.7688.6369.1367.26383.77
maxvit_tiny_tf_512.in1k85.6797.58144.2531.0533.49257.59
maxvit_small_tf_384.in1k85.5497.46188.3569.0235.87183.65
maxvit_tiny_tf_384.in1k85.1197.38293.4630.9817.53123.42
maxvit_large_tf_224.in1k84.9396.97247.71211.7943.68127.35
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k84.9096.961025.4541.728.1140.13
maxvit_base_tf_224.in1k84.8596.99358.25119.4724.0495.01
maxxvit_rmlp_small_rw_256.sw_in1k84.6397.06575.5366.0114.6758.38
coatnet_rmlp_2_rw_224.sw_in1k84.6196.74625.8173.8815.1854.78
maxvit_rmlp_small_rw_224.sw_in1k84.4996.76693.8264.9010.7549.30
maxvit_small_tf_224.in1k84.4396.83647.9668.9311.6653.17
maxvit_rmlp_tiny_rw_256.sw_in1k84.2396.78807.2129.156.7746.92
coatnet_1_rw_224.sw_in1k83.6296.38989.5941.728.0434.60
maxvit_tiny_rw_224.sw_in1k83.5096.501100.5329.065.1133.11
maxvit_tiny_tf_224.in1k83.4196.591004.9430.925.6035.78
coatnet_rmlp_1_rw_224.sw_in1k83.3696.451093.0341.697.8535.47
maxxvitv2_nano_rw_256.sw_in1k83.1196.331276.8823.706.2623.05
maxxvit_rmlp_nano_rw_256.sw_in1k83.0396.341341.2416.784.3726.05
maxvit_rmlp_nano_rw_256.sw_in1k82.9696.261283.2415.504.4731.92
maxvit_nano_rw_256.sw_in1k82.9396.231218.1715.454.4630.28
coatnet_bn_0_rw_224.sw_in1k82.3996.191600.1427.444.6722.04
coatnet_0_rw_224.sw_in1k82.3995.841831.2127.444.4318.73
coatnet_rmlp_nano_rw_224.sw_in1k82.0595.872109.0915.152.6220.34
coatnext_nano_rw_224.sw_in1k81.9595.922525.5214.702.4712.80
coatnet_nano_rw_224.sw_in1k81.7095.642344.5215.142.4115.41
maxvit_rmlp_pico_rw_256.sw_in1k80.5395.211594.717.521.8524.86

Jan 11, 2023

  • Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT .in12k tags)
    • convnext_nano.in12k_ft_in1k - 82.3 @ 224, 82.9 @ 288 (previously released)
    • convnext_tiny.in12k_ft_in1k - 84.2 @ 224, 84.5 @ 288
    • convnext_small.in12k_ft_in1k - 85.2 @ 224, 85.3 @ 288

Jan 6, 2023

  • Finally got around to adding --model-kwargs and --opt-kwargs to scripts to pass through rare args directly to model classes from cmd line
    • train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
    • train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
  • Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.

Jan 5, 2023

Dec 23, 2022 🎄☃

  • Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
    • NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
  • Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
  • More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
  • More ImageNet-12k (subset of 22k) pretrain models popping up:
    • efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
    • vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
    • vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
    • convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

Dec 8, 2022

  • Add 'EVA l' to vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
modeltop1param_countgmacmactshub
eva_large_patch14_336.in22k_ft_in22k_in1k89.2304.5191.1270.2link
eva_large_patch14_336.in22k_ft_in1k88.7304.5191.1270.2link
eva_large_patch14_196.in22k_ft_in22k_in1k88.6304.161.663.5link
eva_large_patch14_196.in22k_ft_in1k87.9304.161.663.5link

Dec 6, 2022

modeltop1param_countgmacmactshub
eva_giant_patch14_560.m30m_ft_in22k_in1k89.81014.41906.82577.2link
eva_giant_patch14_336.m30m_ft_in22k_in1k89.61013620.6550.7link
eva_giant_patch14_336.clip_ft_in1k89.41013620.6550.7link
eva_giant_patch14_224.clip_ft_in1k89.11012.6267.2192.6link

Dec 5, 2022

  • Pre-release (0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install with pip install --pre timm
    • vision_transformer, maxvit, convnext are the first three model impl w/ support
    • model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
    • bugs are likely, but I need feedback so please try it out
    • if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
  • Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use --torchcompile argument
  • Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
  • Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
modeltop1param_countgmacmactshub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k88.6632.5391407.5link
vit_large_patch14_clip_336.openai_ft_in12k_in1k88.3304.5191.1270.2link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k88.2632167.4139.4link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k88.2304.5191.1270.2link
vit_large_patch14_clip_224.openai_ft_in12k_in1k88.2304.281.188.8link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k87.9304.281.188.8link
vit_large_patch14_clip_224.openai_ft_in1k87.9304.281.188.8link
vit_large_patch14_clip_336.laion2b_ft_in1k87.9304.5191.1270.2link
vit_huge_patch14_clip_224.laion2b_ft_in1k87.6632167.4139.4link
vit_large_patch14_clip_224.laion2b_ft_in1k87.3304.281.188.8link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k87.286.955.5101.6link
vit_base_patch16_clip_384.openai_ft_in12k_in1k8786.955.5101.6link
vit_base_patch16_clip_384.laion2b_ft_in1k86.686.955.5101.6link
vit_base_patch16_clip_384.openai_ft_in1k86.286.955.5101.6link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k86.286.617.623.9link
vit_base_patch16_clip_224.openai_ft_in12k_in1k85.986.617.623.9link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k85.888.317.923.9link
vit_base_patch16_clip_224.laion2b_ft_in1k85.586.617.623.9link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k85.488.313.116.5link
vit_base_patch16_clip_224.openai_ft_in1k85.386.617.623.9link
vit_base_patch32_clip_384.openai_ft_in12k_in1k85.288.313.116.5link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k83.388.24.45link
vit_base_patch32_clip_224.laion2b_ft_in1k82.688.24.45link
vit_base_patch32_clip_224.openai_ft_in1k81.988.24.45link
  • Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
    • There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
modeltop1param_countgmacmactshub
maxvit_xlarge_tf_512.in21k_ft_in1k88.5475.8534.11413.2link
maxvit_xlarge_tf_384.in21k_ft_in1k88.3475.3292.8668.8link
maxvit_base_tf_512.in21k_ft_in1k88.2119.9138704link
maxvit_large_tf_512.in21k_ft_in1k88212.3244.8942.2link
maxvit_large_tf_384.in21k_ft_in1k88212132.6445.8link
maxvit_base_tf_384.in21k_ft_in1k87.9119.673.8332.9link
maxvit_base_tf_512.in1k86.6119.9138704link
maxvit_large_tf_512.in1k86.5212.3244.8942.2link
maxvit_base_tf_384.in1k86.3119.673.8332.9link
maxvit_large_tf_384.in1k86.2212132.6445.8link
maxvit_small_tf_512.in1k86.169.167.3383.8link
maxvit_tiny_tf_512.in1k85.73133.5257.6link
maxvit_small_tf_384.in1k85.56935.9183.6link
maxvit_tiny_tf_384.in1k85.13117.5123.4link
maxvit_large_tf_224.in1k84.9211.843.7127.4link
maxvit_base_tf_224.in1k84.9119.52495link
maxvit_small_tf_224.in1k84.468.911.753.2link
maxvit_tiny_tf_224.in1k83.430.95.635.8link

Oct 15, 2022

  • Train and validation script enhancements
  • Non-GPU (ie CPU) device support
  • SLURM compatibility for train script
  • HF datasets support (via ReaderHfds)
  • TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
  • in_chans !=3 support for scripts / loader
  • Adan optimizer
  • Can enable per-step LR scheduling via args
  • Dataset 'parsers' renamed to 'readers', more descriptive of purpose
  • AMP args changed, APEX via --amp-impl apex, bfloat16 supportedf via --amp-dtype bfloat16
  • main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
  • master -> main branch rename

Fetched April 7, 2026