Missed one of the abalation model entrypoints, update README

rwightman · rwightman · commit 208e7912f744 · 2020-05-12T13:36:31.000-07:00
diff --git a/README.md b/README.md
@@ -2,6 +2,9 @@
 
 ## What's New
 
+### May 12, 2020
+* Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
+
 ### May 3, 2020
 * Pruned EfficientNet B1, B2, and B3 (https://arxiv.org/abs/2002.08258) contributed by [Yonathan Aflalo](https://github.com/yoniaflalo)
 
@@ -70,41 +73,6 @@
 * Add RandAugment trained EfficientNet-B0 weight with 77.7 top-1. Trained by [Michael Klachko](https://github.com/michaelklachko) with this code and recent hparams (see Training section)
 * Add `avg_checkpoints.py` script for post training weight averaging and update all scripts with header docstrings and shebangs.
 
-### Dec 30, 2019
-* Merge [Dushyant Mehta's](https://github.com/mehtadushy) PR for SelecSLS (Selective Short and Long Range Skip Connections) networks. Good GPU memory consumption and throughput. Original: https://github.com/mehtadushy/SelecSLS-Pytorch
-
-### Dec 28, 2019
-* Add new model weights and training hparams (see Training Hparams section)
-  * `efficientnet_b3` - 81.5 top-1, 95.7 top-5 at default res/crop, 81.9, 95.8 at 320x320 1.0 crop-pct
-     * trained with RandAugment, ended up with an interesting but less than perfect result (see training section)
-  * `seresnext26d_32x4d`- 77.6 top-1, 93.6 top-5
-     * deep stem (32, 32, 64), avgpool downsample
-     * stem/dowsample from bag-of-tricks paper
-  * `seresnext26t_32x4d`- 78.0 top-1, 93.7 top-5
-     * deep tiered stem (24, 48, 64), avgpool downsample (a modified 'D' variant)
-     * stem sizing mods from Jeremy Howard and fastai devs discussing ResNet architecture experiments
-
-### Dec 23, 2019
-* Add RandAugment trained MixNet-XL weights with 80.48 top-1.
-* `--dist-bn` argument added to train.py, will distribute BN stats between nodes after each train epoch, before eval
-
-### Dec 4, 2019
-* Added weights from the first training from scratch of an EfficientNet (B2) with my new RandAugment implementation. Much better than my previous B2 and very close to the official AdvProp ones (80.4 top-1, 95.08 top-5).
-
-### Nov 29, 2019
-* Brought EfficientNet and MobileNetV3 up to date with my https://github.com/rwightman/gen-efficientnet-pytorch code. Torchscript and ONNX export compat excluded.
-  * AdvProp weights added
-  * Official TF MobileNetv3 weights added
-* EfficientNet and MobileNetV3 hook based 'feature extraction' classes added. Will serve as basis for using models as backbones in obj detection/segmentation tasks. Lots more to be done here...
-* HRNet classification models and weights added from https://github.com/HRNet/HRNet-Image-Classification
-* Consistency in global pooling, `reset_classifer`, and `forward_features` across models
-  * `forward_features` always returns unpooled feature maps now
-* Reasonable chance I broke something... let me know
-
-### Nov 22, 2019
-* Add ImageNet training RandAugment implementation alongside AutoAugment. PyTorch Transform compatible format, using PIL. Currently training two EfficientNet models from scratch with promising results... will update.
-* `drop-connect` cmd line arg finally added to `train.py`, no need to hack model fns. Works for efficientnet/mobilenetv3 based models, ignored otherwise.
-
 ## Introduction 
 
 For each competition, personal, or freelance project involving images + Convolution Neural Networks, I build on top of an evolving collection of code and models. This repo contains a (somewhat) cleaned up and paired down iteration of that code. Hopefully it'll be of use to others.
diff --git a/timm/models/layers/split_attn.py b/timm/models/layers/split_attn.py
@@ -68,14 +68,12 @@ def forward(self, x):
             x_gap = x
         x_gap = F.adaptive_avg_pool2d(x_gap, 1)
         x_gap = self.fc1(x_gap)
-
         if self.bn1 is not None:
             x_gap = self.bn1(x_gap)
         x_gap = self.act1(x_gap)
-
         x_attn = self.fc2(x_gap)
-        x_attn = self.rsoftmax(x_attn).view(B, -1, 1, 1)
 
+        x_attn = self.rsoftmax(x_attn).view(B, -1, 1, 1)
         if self.radix > 1:
             out = (x * x_attn.reshape((B, self.radix, RC // self.radix, 1, 1))).sum(dim=1)
         else:
diff --git a/timm/models/resnest.py b/timm/models/resnest.py
@@ -237,8 +237,25 @@ def resnest269e(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
     return model
 
 
+@register_model
+def resnest50d_4s2x40d(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
+    """ResNeSt-50 4s2x40d from https://github.com/zhanghang1989/ResNeSt/blob/master/ablation.md
+    """
+    default_cfg = default_cfgs['resnest50d_4s2x40d']
+    model = ResNet(
+        ResNestBottleneck, [3, 4, 6, 3], num_classes=num_classes, in_chans=in_chans,
+        stem_type='deep', stem_width=32, avg_down=True, base_width=40, cardinality=2,
+        block_args=dict(radix=4, avd=True, avd_first=True), **kwargs)
+    model.default_cfg = default_cfg
+    if pretrained:
+        load_pretrained(model, default_cfg, num_classes, in_chans)
+    return model
+
+
 @register_model
 def resnest50d_1s4x24d(pretrained=False, num_classes=1000, in_chans=3, **kwargs):
+    """ResNeSt-50 1s4x24d from https://github.com/zhanghang1989/ResNeSt/blob/master/ablation.md
+    """
     default_cfg = default_cfgs['resnest50d_1s4x24d']
     model = ResNet(
         ResNestBottleneck, [3, 4, 6, 3], num_classes=num_classes, in_chans=in_chans,