Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm.
Sparsity algorithm zeros weights in Convolutional and Fully-Connected layers in a non-structured way, so that zero values are randomly distributed inside the tensor. Most of the sparsity algorithms set the less important weights to zero but the criteria of how they do it is different. The framework contains several implementations of sparsity methods.
This section describes the Regularization-Based Sparsity (RB-Sparsity) algorithm implemented in this framework. The method is based on
We then reparametrize the network's weights as follows:
Here,
During training, we store and optimize
and reparametrize the sampling of
With this reparametrization, the probability of keeping a particular weight during the forward pass equals exactly to
The method requires a long schedule of the training process in order to minimize the accuracy drop.
NOTE: The known limitation of the method is that the sparsified CNN must include Batch Normalization layers which make the training process more stable.
After the compression-related changes in the model have been committed, the statistics of the batchnorm layers
(per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data
through the model before the fine-tuning starts. This allows to correct the compression-induced bias in the model
and reduce the corresponding accuracy drop even before model training. This option is common for quantization, magnitude
sparsity and filter pruning algorithms. It can be enabled by setting a non-zero value of num_bn_adaptation_samples
in the
batchnorm_adaptation
section of the initializer
configuration (see example below).
NOTE: In all our sparsity experiments, we used the Adam optimizer and initial learning rate
0.001
for model weights and sparsity mask.
The magnitude sparsity method implements a naive approach that is based on the assumption that the contribution of lower weights is lower so that they can be pruned. After each training epoch the method calculates a threshold based on the current sparsity ratio and uses it to zero weights which are lower than this threshold. And here there are two options:
- Weights are used as is during the threshold calculation procedure.
- Weights are normalized before the threshold calculation.
This special algorithm takes no additional parameters and is used when you want to load a checkpoint already trained with another sparsity algorithm and do other compression without changing the sparsity mask.
For the full list of the algorithm configuration parameters via config file, see the corresponding section in the NNCF config schema.
- Apply magnitude sparsity with default parameters (0 to 90% sparsity over 90 epochs of training, sparsity increased polynomially with each epoch):
{
"input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary
"compression": {
"algorithm": "magnitude_sparsity"
}
}
- Apply magnitude sparsity, increasing sparsity level step-wise from 0 to 70% in 3 steps at given training epoch indices:
{
"input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary
"compression": {
"algorithm": "magnitude_sparsity",
"params": {
"schedule": "multistep",
"multistep_steps": [10, 20],
"multistep_sparsity_levels": [0, 0.35, 0.7], // first level applied immediately (epoch 0), 0.35 - at epoch 10, 0.7 - at epoch 20
"sparsity_target": 0.5,
"sparsity_target_epoch": 20 // "sparsity_target" fully reached at the beginning of epoch 20
}
}
}
- Apply magnitude sparsity, immediately setting sparsity level to 10%, performing batch-norm adaptation to potentially recover accuracy, and exponentially increasing sparsity to 50% over 30 epochs of training:
{
"input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary
"compression": {
"algorithm": "magnitude_sparsity",
"sparsity_init": 0.1, // set already before the beginning of epoch 0 of training
"params": {
"schedule": "exponential",
"sparsity_target": 0.5,
"sparsity_target_epoch": 30 // "sparsity_target" fully reached at the beginning of epoch 20
},
"initializer": {
"batchnorm_adaptation": {
"num_bn_adaptation_samples": 100
}
}
}
}
- Apply RB-sparsity to UNet, increasing sparsity level exponentially from 1% to 60% over 100 epochs, keeping the sparsity mask trainable until epoch 110 (after which the mask is frozen and the model is allowed to fine-tune with a fixed sparsity level), and excluding parts of the model from sparsification:
{
"input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary
"compression": {
"algorithm": "rb_sparsity",
"sparsity_init": 0.01,
"params": {
"sparsity_target": 0.60,
"sparsity_target_epoch": 100,
"sparsity_freeze_epoch": 110
},
"ignored_scopes": [
// assuming PyTorch model
"{re}UNet/ModuleList\\[up_path\\].*", // do not sparsify decoder
"UNet/NNCFConv2d[last]/conv2d_0" // do not sparsify final convolution
]
}
}