releases.shpreview

v0.1-attn-weights

$npx -y @buildinternet/releases show rel_EgU8OIJ98-jM-jGD6G14x

A collection of weights I've trained comparing various types of SE-like (SE, ECA, GC, etc), self-attention (bottleneck, halo, lambda) blocks, and related non-attn baselines.

ResNet-26-T series

  • [2, 2, 2, 2] repeat Bottlneck block ResNet architecture
  • ReLU activations
  • 3 layer stem with 24, 32, 64 chs, max-pool
  • avg pool in shortcut downsample
  • self-attn blocks replace 3x3 in both blocks for last stage, and second block of penultimate stage
modeltop1top1_errtop5top5_errparam_countimg_sizecropt_pctinterpolation
botnet26t_25679.24620.75494.535.4712.492560.95bicubic
halonet26t79.1320.8794.3145.68612.482560.95bicubic
lambda_resnet26t79.11220.88894.595.4110.962560.94bicubic
lambda_resnet26rpt_25678.96421.03694.4285.57210.992560.94bicubic
resnet26t77.87222.12893.8346.16616.012560.94bicubic

Details:

  • HaloNet - 8 pixel block size, 2 pixel halo (overlap), relative position embedding
  • BotNet - relative position embedding
  • Lambda-ResNet-26-T - 3d lambda conv, kernel = 9
  • Lambda-ResNet-26-RPT - relative position embedding

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnet26t2967.5586.252256256857.62297.98425625616.01
botnet26t_2562642.0896.879256256809.41315.70625625612.49
halonet26t2601.9198.375256256783.92325.97625625612.48
lambda_resnet26t2354.1108.732256256697.28366.52125625610.96
lambda_resnet26rpt_2561847.34138.563256256644.84197.89212825610.99

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnet26t3691.9469.3272562561188.17214.9625625616.01
botnet26t_2563291.6377.762562561126.68226.65325625612.49
halonet26t3230.579.2322562561077.82236.93425625612.48
lambda_resnet26rpt_2562324.15110.133256256864.42147.48512825610.99
lambda_resnet26tNot Supported

ResNeXT-26-T series

  • [2, 2, 2, 2] repeat Bottlneck block ResNeXt architectures
  • SiLU activations
  • grouped 3x3 convolutions in bottleneck, 32 channels per group
  • 3 layer stem with 24, 32, 64 chs, max-pool
  • avg pool in shortcut downsample
  • channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
  • when active, self-attn blocks replace 3x3 conv in both blocks for last stage, and second block of penultimate stage
modeltop1top1_errtop5top5_errparam_countimg_sizecropt_pctinterpolation
eca_halonext26ts79.48420.51694.6005.40010.762560.94bicubic
eca_botnext26ts_25679.27020.73094.5945.40610.592560.95bicubic
bat_resnext26ts78.26821.73294.15.910.732560.9bicubic
seresnext26ts77.85222.14893.7846.21610.392560.9bicubic
gcresnext26ts77.80422.19693.8246.17610.482560.9bicubic
eca_resnext26ts77.44622.55493.576.4310.32560.9bicubic
resnext26ts76.76423.23693.1366.86410.32560.9bicubic

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnext26ts3006.5785.134256256864.4295.64625625610.3
seresnext26ts2931.2787.321256256836.92305.19325625610.39
eca_resnext26ts2925.4787.495256256837.78305.00325625610.3
gcresnext26ts2870.0189.186256256818.35311.9725625610.48
eca_botnext26ts_2562652.0396.513256256790.43323.25725625610.59
eca_halonext26ts2593.0398.705256256766.07333.54125625610.76
bat_resnext26ts2469.78103.64256256697.21365.96425625610.73

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

NOTE: there are performance issues with certain grouped conv configs with channels last layout, backwards pass in particular is really slow. Also causing issues for RegNet and NFNet networks.

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnext26ts3952.3764.755256256608.67420.04925625610.3
eca_resnext26ts3815.7767.074256256594.35430.14625625610.3
seresnext26ts3802.7567.304256256592.82431.1425625610.39
gcresnext26ts3626.9770.57256256581.83439.11925625610.48
eca_botnext26ts_2563515.8472.8256256611.71417.86225625610.59
eca_halonext26ts3410.1275.057256256597.52427.78925625610.76
bat_resnext26ts3053.8383.811256256533.23478.83925625610.73

ResNet-33-T series.

  • [2, 3, 3, 2] repeat Bottlneck block ResNet architecture
  • SiLU activations
  • 3 layer stem with 24, 32, 64 chs, no max-pool, 1st and 3rd conv stride 2
  • avg pool in shortcut downsample
  • channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
  • when active, self-attn blocks replace 3x3 conv last block of stage 2 and 3, and both blocks of final stage
  • FC 1x1 conv between last block and classifier

The 33-layer models have an extra 1x1 FC layer between last conv block and classifier. There is both a non-attenion 33 layer baseline and a 32 layer without the extra FC.

modeltop1top1_errtop5top5_errparam_countimg_sizecropt_pctinterpolation
sehalonet33ts80.98619.01495.2724.72813.692560.94bicubic
seresnet33ts80.38819.61295.1084.89219.782560.94bicubic
eca_resnet33ts80.13219.86895.0544.94619.682560.94bicubic
gcresnet33ts79.9920.0194.9885.01219.882560.94bicubic
resnet33ts79.35220.64894.5965.40419.682560.94bicubic
resnet32ts79.02820.97294.4445.55617.962560.94bicubic

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnet32ts2502.96102.266256256733.27348.50725625617.96
resnet33ts2473.92103.466256256725.34352.30925625619.68
seresnet33ts2400.18106.646256256695.19367.41325625619.78
eca_resnet33ts2394.77106.886256256696.93366.63725625619.68
gcresnet33ts2342.81109.257256256678.22376.40425625619.88
sehalonet33ts1857.65137.794256256577.34442.54525625613.69

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizetrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
resnet32ts3306.2277.4162562561012.82252.15825625617.96
resnet33ts3257.5978.5732562561002.38254.77825625619.68
seresnet33ts3128.0881.826256256950.27268.58125625619.78
eca_resnet33ts3127.1181.852256256948.84269.12325625619.68
gcresnet33ts2984.8785.753256256916.98278.16925625619.88
sehalonet33ts2188.23116.975256256711.63179.0312825613.69

ResNet-50(ish) models

In Progress

RegNet"Z" series

  • RegNetZ inspired architecture, inverted bottleneck, SE attention, pre-classifier FC, essentially an EfficientNet w/ grouped conv instead of depthwise
  • b, c, and d are three different sizes I put together to cover differing flop ranges, not based on the paper (https://arxiv.org/abs/2103.06877) or a search process
  • for comparison to RegNetY and paper RegNetZ models, at 224x224 b,c, and d models are 1.45, 1.92, and 4.58 GMACs respectively, b, and c are trained at 256 here so higher than that (see tables)
  • haloregnetz_c uses halo attention for all of last stage, and interleaved every 3 (for 4) of penultimate stage
  • b, c variants use a stem / 1st stage like the paper, d uses a 3-deep tiered stem with 2-1-2 striding

ImageNet-1k validation at train resolution

modeltop1top1_errtop5top5_errparam_countimg_sizecropt_pctinterpolation
regnetz_d83.42216.57896.6363.36427.582560.95bicubic
regnetz_c82.16417.83696.0583.94213.462560.94bicubic
haloregnetz_b81.05818.94295.24.811.682240.94bicubic
regnetz_b79.86820.13294.9885.0129.722240.94bicubic

ImageNet-1k validation at optimal test res

modeltop1top1_errtop5top5_errparam_countimg_sizecropt_pctinterpolation
regnetz_d84.0415.9696.873.1327.583200.95bicubic
regnetz_c82.51617.48496.3563.64413.463200.94bicubic
haloregnetz_b81.05818.94295.24.811.682240.94bicubic
regnetz_b80.72819.27295.474.539.722880.94bicubic

Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizeinfer_GMACstrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
regnetz_b2703.4294.682562241.45764.85333.3482562249.72
haloregnetz_b2086.22122.6952562241.88620.1411.41525622411.68
regnetz_c1653.19154.8362562562.51459.41277.26812825613.46
regnetz_d1060.91241.2842562565.98296.51430.14312825627.58

Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09

NOTE: channels last layout is painfully slow for backward pass here due to some sort of cuDNN issue

modelinfer_samples_per_secinfer_step_timeinfer_batch_sizeinfer_img_sizeinfer_GMACstrain_samples_per_sectrain_step_timetrain_batch_sizetrain_img_sizeparam_count
regnetz_b4152.5961.6342562241.45399.37639.5722562249.72
haloregnetz_b2770.7892.3782562241.88364.22701.38625622411.68
regnetz_c2512.4101.8782562562.51376.72338.37212825613.46
regnetz_d1456.05175.82562565.98111.321148.27912825627.58

Fetched April 7, 2026