releases.shpreview

v1.0.3

Release v1.0.3

$npx -y @buildinternet/releases show rel_ELyKeQiI9GNqkoDfT7K-u

May 14, 2024

  • Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
  • Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
  • Add normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)
  • Version 1.0.3 release

May 11, 2024

  • Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.
modeltop1top5param_countimg_size
vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k86.20297.87464.11256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k85.41897.4860.4256
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k84.32296.81263.95256
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k83.90696.68460.23256
vit_base_patch16_rope_reg1_gap_256.sbb_in1k83.86696.6786.43256
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k83.8196.82438.74256
vit_betwixt_patch16_reg4_gap_256.sbb_in1k83.70696.61660.4256
vit_betwixt_patch16_reg1_gap_256.sbb_in1k83.62896.54460.4256
vit_medium_patch16_reg4_gap_256.sbb_in1k83.4796.62238.88256
vit_medium_patch16_reg1_gap_256.sbb_in1k83.46296.54838.88256
vit_little_patch16_reg4_gap_256.sbb_in1k82.51496.26222.52256
vit_wee_patch16_reg1_gap_256.sbb_in1k80.25695.36013.42256
vit_pwee_patch16_reg1_gap_256.sbb_in1k80.07295.13615.25256
vit_mediumd_patch16_reg4_gap_256.sbb_in12kN/AN/A64.11256
vit_betwixt_patch16_reg4_gap_256.sbb_in12kN/AN/A60.4256
  • AttentionExtract helper added to extract attention maps from timm models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949
  • forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.
  • 1017 of 1047 model architectures support features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.
  • Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.

April 11, 2024

  • Prepping for a long overdue 1.0 release, things have been stable for a while now.
  • Significant feature that's been missing for a while, features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')
  • Above feature support achieved through a new forward_intermediates() API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input) 
output = model.forward_head(final_feat)  # pooling + classifier head

print(final_feat.shape)
torch.Size([2, 197, 768])

for f in intermediates:
    print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])

print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))

for o in output:    
    print(o.shape)   
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
  • TinyCLIP vision tower weights added, thx Thien Tran

Fetched April 7, 2026