v0.9.0 — timm (pytorch-image-models)

First non pre-release in a loooong while, changelog from 0.6.x below...

May 11, 2023

timm 0.9 released, transition from 0.8.xdev releases

May 10, 2023

Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in timm
DINOv2 vit feature backbone weights added thanks to Leng Yue
FB MAE vit feature backbone weights added
OpenCLIP DataComp-XL L/14 feat backbone weights added
MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
Experimental get_intermediate_layers function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
Model creation throws error if pretrained=True and no weights exist (instead of continuing with random initialization)
Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use bnb prefix, ie bnbadam8bit
Misc cleanup and fixes
Final testing before switching to a 0.9 and bringing timm out of pre-release state

April 27, 2023

97% of timm models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.

April 21, 2023

Gradient accumulation support added to train script and tested (--grad-accum-steps), thanks Taeksang Kim
More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
Added --head-init-scale and --head-init-bias to train.py to scale classiifer head and set fixed bias for fine-tune
Remove all InplaceABN (inplace_abn) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).

April 12, 2023

Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
Refactor dropout args for vit and vit-like models, separate drop_rate into drop_rate (classifier dropout), proj_drop_rate (block mlp / out projections), pos_drop_rate (position embedding drop), attn_drop_rate (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.

April 5, 2023

ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- All past timm trained weights added with recipe based tags to differentiate
- All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
- Add torchvision v2 recipe weights to existing torchvision originals
- See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
- resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
- resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

March 31, 2023

Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.

model	top1	top5	img_size	param_count	gmacs	macts
convnext_xxlarge.clip_laion2b_soup_ft_in1k	88.612	98.704	256	846.47	198.09	124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384	88.312	98.578	384	200.13	101.11	126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320	87.968	98.47	320	200.13	70.21	88.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384	87.138	98.212	384	88.59	45.21	84.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k	86.344	97.97	256	88.59	20.09	37.55

Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.

model	top1	top5	param_count	img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k	90.054	99.042	305.08	448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k	89.946	99.01	305.08	448
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.792	98.992	1014.45	560
eva02_large_patch14_448.mim_in22k_ft_in1k	89.626	98.954	305.08	448
eva02_large_patch14_448.mim_m38m_ft_in1k	89.57	98.918	305.08	448
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.56	98.956	1013.01	336
eva_giant_patch14_336.clip_ft_in1k	89.466	98.82	1013.01	336
eva_large_patch14_336.in22k_ft_in22k_in1k	89.214	98.854	304.53	336
eva_giant_patch14_224.clip_ft_in1k	88.882	98.678	1012.56	224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k	88.692	98.722	87.12	448
eva_large_patch14_336.in22k_ft_in1k	88.652	98.722	304.53	336
eva_large_patch14_196.in22k_ft_in22k_in1k	88.592	98.656	304.14	196
eva02_base_patch14_448.mim_in22k_ft_in1k	88.23	98.564	87.12	448
eva_large_patch14_196.in22k_ft_in1k	87.934	98.504	304.14	196
eva02_small_patch14_336.mim_in22k_ft_in1k	85.74	97.614	22.13	336
eva02_tiny_patch14_336.mim_in22k_ft_in1k	80.658	95.524	5.76	336

Multi-weight and HF hub for DeiT and MLP-Mixer based models

March 22, 2023

More weights pushed to HF hub along with multi-weight support, including: regnet.py, rexnet.py, byobnet.py, resnetv2.py, swin_transformer.py, swin_transformer_v2.py, swin_transformer_v2_cr.py
Swin Transformer models support feature extraction (NCHW feat maps for swinv2_cr_*, and NHWC for all others) and spatial embedding outputs.
FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
More ImageNet-12k pretrained and 1k fine-tuned timm weights:
- rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
- rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
- regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
- regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
- regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
Minor bug fixes and improvements.

Feb 26, 2023

Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
Update convnext_xxlarge default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
0.8.15dev0

Feb 20, 2023

Add 320x320 convnext_large_mlp.clip_laion2b_ft_320 and convnext_lage_mlp.clip_laion2b_ft_soup_320 CLIP image tower weights for features & fine-tune
0.8.13dev0 pypi release for latest changes w/ move to huggingface org

Feb 16, 2023

safetensor checkpoint support added
Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to vit_*, vit_relpos*, coatnet / maxxvit (to start)
Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
gradient checkpointing works with features_only=True

Feb 7, 2023

New inference benchmark numbers added in results folder.
Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
- convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
- convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
Add DaViT models. Supports features_only=True. Adapted from https://github.com/dingmyu/davit by Fredo.
Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports features_only=True.
- Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add features_only=True support to new conv variants, weight remap required.
Move ImageNet meta-data (synsets, indices) from /results to timm/data/_info.
Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in timm
- Update inference.py to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
Ready for 0.8.10 pypi pre-release (final testing).

Jan 20, 2023

Add two convnext 12k -> 1k fine-tunes at 384x384
- convnext_tiny.in12k_ft_in1k_384 - 85.1 @ 384
- convnext_small.in12k_ft_in1k_384 - 86.2 @ 384
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw base MaxViT and CoAtNet 1/2 models

model	top1	top5	samples / sec	Params (M)	GMAC	Act (M)
maxvit_xlarge_tf_512.in21k_ft_in1k	88.53	98.64	21.76	475.77	534.14	1413.22
maxvit_xlarge_tf_384.in21k_ft_in1k	88.32	98.54	42.53	475.32	292.78	668.76
maxvit_base_tf_512.in21k_ft_in1k	88.20	98.53	50.87	119.88	138.02	703.99
maxvit_large_tf_512.in21k_ft_in1k	88.04	98.40	36.42	212.33	244.75	942.15
maxvit_large_tf_384.in21k_ft_in1k	87.98	98.56	71.75	212.03	132.55	445.84
maxvit_base_tf_384.in21k_ft_in1k	87.92	98.54	104.71	119.65	73.80	332.90
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k	87.81	98.37	106.55	116.14	70.97	318.95
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k	87.47	98.37	149.49	116.09	72.98	213.74
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k	87.39	98.31	160.80	73.88	47.69	209.43
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k	86.89	98.02	375.86	116.14	23.15	92.64
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k	86.64	98.02	501.03	116.09	24.20	62.77
maxvit_base_tf_512.in1k	86.60	97.92	50.75	119.88	138.02	703.99
coatnet_2_rw_224.sw_in12k_ft_in1k	86.57	97.89	631.88	73.87	15.09	49.22
maxvit_large_tf_512.in1k	86.52	97.88	36.04	212.33	244.75	942.15
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k	86.49	97.90	620.58	73.88	15.18	54.78
maxvit_base_tf_384.in1k	86.29	97.80	101.09	119.65	73.80	332.90
maxvit_large_tf_384.in1k	86.23	97.69	70.56	212.03	132.55	445.84
maxvit_small_tf_512.in1k	86.10	97.76	88.63	69.13	67.26	383.77
maxvit_tiny_tf_512.in1k	85.67	97.58	144.25	31.05	33.49	257.59
maxvit_small_tf_384.in1k	85.54	97.46	188.35	69.02	35.87	183.65
maxvit_tiny_tf_384.in1k	85.11	97.38	293.46	30.98	17.53	123.42
maxvit_large_tf_224.in1k	84.93	96.97	247.71	211.79	43.68	127.35
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k	84.90	96.96	1025.45	41.72	8.11	40.13
maxvit_base_tf_224.in1k	84.85	96.99	358.25	119.47	24.04	95.01
maxxvit_rmlp_small_rw_256.sw_in1k	84.63	97.06	575.53	66.01	14.67	58.38
coatnet_rmlp_2_rw_224.sw_in1k	84.61	96.74	625.81	73.88	15.18	54.78
maxvit_rmlp_small_rw_224.sw_in1k	84.49	96.76	693.82	64.90	10.75	49.30
maxvit_small_tf_224.in1k	84.43	96.83	647.96	68.93	11.66	53.17
maxvit_rmlp_tiny_rw_256.sw_in1k	84.23	96.78	807.21	29.15	6.77	46.92
coatnet_1_rw_224.sw_in1k	83.62	96.38	989.59	41.72	8.04	34.60
maxvit_tiny_rw_224.sw_in1k	83.50	96.50	1100.53	29.06	5.11	33.11
maxvit_tiny_tf_224.in1k	83.41	96.59	1004.94	30.92	5.60	35.78
coatnet_rmlp_1_rw_224.sw_in1k	83.36	96.45	1093.03	41.69	7.85	35.47
maxxvitv2_nano_rw_256.sw_in1k	83.11	96.33	1276.88	23.70	6.26	23.05
maxxvit_rmlp_nano_rw_256.sw_in1k	83.03	96.34	1341.24	16.78	4.37	26.05
maxvit_rmlp_nano_rw_256.sw_in1k	82.96	96.26	1283.24	15.50	4.47	31.92
maxvit_nano_rw_256.sw_in1k	82.93	96.23	1218.17	15.45	4.46	30.28
coatnet_bn_0_rw_224.sw_in1k	82.39	96.19	1600.14	27.44	4.67	22.04
coatnet_0_rw_224.sw_in1k	82.39	95.84	1831.21	27.44	4.43	18.73
coatnet_rmlp_nano_rw_224.sw_in1k	82.05	95.87	2109.09	15.15	2.62	20.34
coatnext_nano_rw_224.sw_in1k	81.95	95.92	2525.52	14.70	2.47	12.80
coatnet_nano_rw_224.sw_in1k	81.70	95.64	2344.52	15.14	2.41	15.41
maxvit_rmlp_pico_rw_256.sw_in1k	80.53	95.21	1594.71	7.52	1.85	24.86

Jan 11, 2023

Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT .in12k tags)
- convnext_nano.in12k_ft_in1k - 82.3 @ 224, 82.9 @ 288 (previously released)
- convnext_tiny.in12k_ft_in1k - 84.2 @ 224, 84.5 @ 288
- convnext_small.in12k_ft_in1k - 85.2 @ 224, 85.3 @ 288

Jan 6, 2023

Finally got around to adding --model-kwargs and --opt-kwargs to scripts to pass through rare args directly to model classes from cmd line
- train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
- train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.

Jan 5, 2023

ConvNeXt-V2 models and weights added to existing convnext.py
- Paper: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
- Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)

Dec 23, 2022 🎄☃

Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
More ImageNet-12k (subset of 22k) pretrain models popping up:
- efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
- vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
- vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
- convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

Dec 8, 2022

Add 'EVA l' to vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
- original source: https://github.com/baaivision/EVA

model	top1	param_count	gmac	macts	hub
eva_large_patch14_336.in22k_ft_in22k_in1k	89.2	304.5	191.1	270.2	link
eva_large_patch14_336.in22k_ft_in1k	88.7	304.5	191.1	270.2	link
eva_large_patch14_196.in22k_ft_in22k_in1k	88.6	304.1	61.6	63.5	link
eva_large_patch14_196.in22k_ft_in1k	87.9	304.1	61.6	63.5	link

Dec 6, 2022

Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to beit.py.
- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636

model	top1	param_count	gmac	macts	hub
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.8	1014.4	1906.8	2577.2	link
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.6	1013	620.6	550.7	link
eva_giant_patch14_336.clip_ft_in1k	89.4	1013	620.6	550.7	link
eva_giant_patch14_224.clip_ft_in1k	89.1	1012.6	267.2	192.6	link

Dec 5, 2022

Pre-release (0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install with pip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use --torchcompile argument
Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models

model	top1	param_count	gmac	macts	hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k	88.6	632.5	391	407.5	link
vit_large_patch14_clip_336.openai_ft_in12k_in1k	88.3	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k	88.2	632	167.4	139.4	link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k	88.2	304.5	191.1	270.2	link
vit_large_patch14_clip_224.openai_ft_in12k_in1k	88.2	304.2	81.1	88.8	link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_224.openai_ft_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_336.laion2b_ft_in1k	87.9	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in1k	87.6	632	167.4	139.4	link
vit_large_patch14_clip_224.laion2b_ft_in1k	87.3	304.2	81.1	88.8	link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k	87.2	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in12k_in1k	87	86.9	55.5	101.6	link
vit_base_patch16_clip_384.laion2b_ft_in1k	86.6	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in1k	86.2	86.9	55.5	101.6	link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k	86.2	86.6	17.6	23.9	link
vit_base_patch16_clip_224.openai_ft_in12k_in1k	85.9	86.6	17.6	23.9	link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k	85.8	88.3	17.9	23.9	link
vit_base_patch16_clip_224.laion2b_ft_in1k	85.5	86.6	17.6	23.9	link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k	85.4	88.3	13.1	16.5	link
vit_base_patch16_clip_224.openai_ft_in1k	85.3	86.6	17.6	23.9	link
vit_base_patch32_clip_384.openai_ft_in12k_in1k	85.2	88.3	13.1	16.5	link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k	83.3	88.2	4.4	5	link
vit_base_patch32_clip_224.laion2b_ft_in1k	82.6	88.2	4.4	5	link
vit_base_patch32_clip_224.openai_ft_in1k	81.9	88.2	4.4	5	link

Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing

model	top1	param_count	gmac	macts	hub
maxvit_xlarge_tf_512.in21k_ft_in1k	88.5	475.8	534.1	1413.2	link
maxvit_xlarge_tf_384.in21k_ft_in1k	88.3	475.3	292.8	668.8	link
maxvit_base_tf_512.in21k_ft_in1k	88.2	119.9	138	704	link
maxvit_large_tf_512.in21k_ft_in1k	88	212.3	244.8	942.2	link
maxvit_large_tf_384.in21k_ft_in1k	88	212	132.6	445.8	link
maxvit_base_tf_384.in21k_ft_in1k	87.9	119.6	73.8	332.9	link
maxvit_base_tf_512.in1k	86.6	119.9	138	704	link
maxvit_large_tf_512.in1k	86.5	212.3	244.8	942.2	link
maxvit_base_tf_384.in1k	86.3	119.6	73.8	332.9	link
maxvit_large_tf_384.in1k	86.2	212	132.6	445.8	link
maxvit_small_tf_512.in1k	86.1	69.1	67.3	383.8	link
maxvit_tiny_tf_512.in1k	85.7	31	33.5	257.6	link
maxvit_small_tf_384.in1k	85.5	69	35.9	183.6	link
maxvit_tiny_tf_384.in1k	85.1	31	17.5	123.4	link
maxvit_large_tf_224.in1k	84.9	211.8	43.7	127.4	link
maxvit_base_tf_224.in1k	84.9	119.5	24	95	link
maxvit_small_tf_224.in1k	84.4	68.9	11.7	53.2	link
maxvit_tiny_tf_224.in1k	83.4	30.9	5.6	35.8	link

Oct 15, 2022

Train and validation script enhancements
Non-GPU (ie CPU) device support
SLURM compatibility for train script
HF datasets support (via ReaderHfds)
TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
in_chans !=3 support for scripts / loader
Adan optimizer
Can enable per-step LR scheduling via args
Dataset 'parsers' renamed to 'readers', more descriptive of purpose
AMP args changed, APEX via --amp-impl apex, bfloat16 supportedf via --amp-dtype bfloat16
main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
master -> main branch rename