Training Protocol

Model

The main segmentation model is an EfficientUnet++ with a EfficientNet-b5 encoder.

Losses

The model was trained using a combination of three losses and a loss weighting scheme.
$L_{total} = L_{dice} + \alpha * L_{boundary} + L_{focal}$

and ɑ (ramped on over the first 100 epochs) is defined as:
$\alpha = min(0.01*epoch, 0.99)$

Generaliced Dice Loss

...

Boundary loss

...

Focal Loss

...

Training

The following pytorch lightning settings and training tricks were used:

batch_size: 32
ADAM optimizer with learning_rate of 0.0003 (altered using CosineAnnealingLR with T_max=10)
mixed precision training
gradient clipping enabled (mode: norm, 0.5)
stochastic weight averaging

Data

Data samples of size 4x256x256 (CxWxH; channels: R,G,B,NIR), normalized
Data augmentation:
- HorizontalFlip or VerticalFlip, p=0.5
- RandomRotate90, p=0.5
- RandomBrightnessContrast, brightness_limit=0.2, contrast_limit=0.15, brightness_by_max=False
- Normalize
Normalization: 4-channel mean/ std for all data from 2017-2020