Release v1.0.3
normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.| model | top1 | top5 | param_count | img_size |
|---|---|---|---|---|
| vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k | 86.202 | 97.874 | 64.11 | 256 |
| vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k | 85.418 | 97.48 | 60.4 | 256 |
| vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k | 84.322 | 96.812 | 63.95 | 256 |
| vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k | 83.906 | 96.684 | 60.23 | 256 |
| vit_base_patch16_rope_reg1_gap_256.sbb_in1k | 83.866 | 96.67 | 86.43 | 256 |
| vit_medium_patch16_rope_reg1_gap_256.sbb_in1k | 83.81 | 96.824 | 38.74 | 256 |
| vit_betwixt_patch16_reg4_gap_256.sbb_in1k | 83.706 | 96.616 | 60.4 | 256 |
| vit_betwixt_patch16_reg1_gap_256.sbb_in1k | 83.628 | 96.544 | 60.4 | 256 |
| vit_medium_patch16_reg4_gap_256.sbb_in1k | 83.47 | 96.622 | 38.88 | 256 |
| vit_medium_patch16_reg1_gap_256.sbb_in1k | 83.462 | 96.548 | 38.88 | 256 |
| vit_little_patch16_reg4_gap_256.sbb_in1k | 82.514 | 96.262 | 22.52 | 256 |
| vit_wee_patch16_reg1_gap_256.sbb_in1k | 80.256 | 95.360 | 13.42 | 256 |
| vit_pwee_patch16_reg1_gap_256.sbb_in1k | 80.072 | 95.136 | 15.25 | 256 |
| vit_mediumd_patch16_reg4_gap_256.sbb_in12k | N/A | N/A | 64.11 | 256 |
| vit_betwixt_patch16_reg4_gap_256.sbb_in12k | N/A | N/A | 60.4 | 256 |
timm models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')forward_intermediates() API that can be used with a feature wrapping module or direclty.model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input)
output = model.forward_head(final_feat) # pooling + classifier head
print(final_feat.shape)
torch.Size([2, 197, 768])
for f in intermediates:
print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))
for o in output:
print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
Fetched April 7, 2026