Note
A detailed analysis of scale generalization for various models is given in our preprint
Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks
Thomas Altstidl, An Nguyen, Leo Schwinn, Franz Köferl, Christopher Mutschler, Björn Eskofier, Dario Zanca
This repository contains the official source code accompanying our preprint. If you are reading this, it is likely that you fall into one or more of the following groups. Click on those that are applicable for you to get started.
I am interested in using the Scaled and Translated Image Recognition (STIR) dataset.
- Download one or more data files from Zenodo.
- Grab a copy of dataset.py.
- Example usage that loads training data from
emoji.npz
for scales 17 through 64.
from dataset import STIRDataset
dataset = STIRDataset('data/emoji.npz')
# Obtain images and labels for training
images, labels = dataset.to_torch(split='train', scales=range(17, 65), shuffle=True)
# Obtain known scales and positions for above
scales, positions = dataset.get_latents(split='train', scales=range(17, 65), shuffle=True)
# Get metadata and label descriptions
metadata = dataset.metadata
label_descriptions = dataset.labeldata
I am interested in reviewing your results.
We provide a subset of our results for review. Others are available upon request as they are larger in size.
- clean.csv contains testing accuracy and time (columns
metrics.test_acc
andmetrics.train_time
) - generalization.csv contains accuracies per scale (columns
s17
throughs64
)
I am interested in using the proposed layer in my own work.
- Grab a copy of layers.py.
- Example usage that applies one 7x7 scaled convolutional layer followed by pixel-wise pooling.
from torch import nn
from layers import SiConv2d, ScalePool
class MyModel(nn.Module):
def __init__(self):
super().__init__()
# 7x7 base kernel rescaled to 29 different scales
self.conv = SiConv2d(3, 16, 29, 7, interp_mode='bicubic')
self.pool = ScalePool(mode='pixel')
def forward(self, x):
x = self.conv(x)
x = self.pool(x)
The remainer of this document will focus on reproducing the results given in our preprint.
Warning
While we have taken great care to document everything, the scope of this project makes it likely that minor details may still be missing. If you have trouble recreating our experiments on your own machines, please create a new issue and we'd be more than happy to assist.
The provided code should work in most environments and has been tested to work at least in Windows 10/11 (local environment) and Linux (cluster node environment). Python 3.8 was used, although newer versions should also work. We recommend creating a new virtual environment and installing all requirements there:
cd /path/to/provided/code
python -m venv .venv
source .venv/bin/activate
pip install requirements.txt
Before training a model, you will need to either create or download the respective data files you intend to use. These can be downloaded from Zenodo. Then, execute the following script with your selected parameters to train a single model.
python scripts/train.py [...]
-
--model {standard, pixel_pool, slice_pool, energy_pool, conv3d, ensemble, spatial_transform, xu, kanazawa, hermite, disco}
Name of (scale-equivariant) model that should be trained. Implementations are given in
siconvnet/models.py
. -
--dataset {emoji, mnist, trafficsign, aerial}
Name of dataset on which the model should be trained. The respective
[d].npz
file needs to be in the current working directory. See paper Fig. 3. - --evaluation {1, 2, 3, 4} Evaluation scenario on which the model should be trained. Defines scales for training and evaluation. See paper Fig. 3.
-
--kernel-size {3, 7, 11, 15}
Kernel size of all convolutions. Defines size
$k \times k$ of trainable kernel weights. Fixed to 7 in paper. - --interpolation {nearest, bilinear, bicubic, area} Interpolation method used to generate larger kernels. Only applies to our models. Fixed to bicubic in paper.
- --lr {1e-2, 1e-3} Learning rate of Adam optimizer used to train model.
- --seed number Seed used to initialize random number generators for reproducibility. Seeds used in paper are 1 through 50.
The training script writes results to MLflow. Before proceeding with the evaluation, you need to export all runs. Unless you changed the tracking destination, this is done using the following command. We provide our own filtered export in clean.csv.
mlflow experiments csv -x 0 -o clean.csv
Then, execute the following script with your selected parameters to evaluate all models.
python scripts/eval.py [...]
- --runs path/to/clean.csv Path to the exported runs from MLflow. Should point to file exported using above command.
- --models path/to/models Path to the run artifacts saved by MLflow. Should be
mlruns/0
when run locally. - --data {emoji, mnist, trafficsign, aerial} Name of dataset for which models should be evaluated.
- --generalization Flag for scale generalization. If enabled, will write
generalization_*.csv
files. - --equivariance Flag for scale equivariance. If enabled, will write
eval/*/errors.npz
files. - --index-correlation Flag for pooling scale correlation. If enabled, will write
eval/*/indices.npz
files.
To recreate the plots given in the paper and in the supplementary document you may use the scripts provided in the plots/
directory. We provide clean.csv and generalization.csv here. Others are available upon request as they are larger in size.
equivariance.py
was used for Fig. 6 & Suppl. Fig. 3 and requires bothscripts/clean.csv
andplots/eval/*/errors.npz
generalization.py
was used for Fig. 5 & Suppl. Fig. 2 and requires bothscripts/clean.csv
andplots/generalization_*.csv
hyperparam.py
was used for Fig. 4 & Suppl. Fig. 1 and requires onlyscripts/clean.csv
indices.py
was used for Fig. 7 & Suppl. Fig. 4 and requires bothscripts/clean.csv
andplots/eval/*/indices.npz
time.py
was used for Tab. 2 and requires onlyscripts/clean.csv