FerKD: Surgical Label Adaptation for Efficient Distillation, Zhiqiang Shen, ICCV 2023.
🚀🚀 FerKD (Faster Knowledge Distillation) is a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. The proposed approach involves the processes of hard regions mining + calibration.
@inproceedings{shen2023ferkd,
title={FerKD: Surgical Label Adaptation for Efficient Distillation},
author={Zhiqiang Shen},
year={2023},
booktitle={ICCV}
}
Please check the soft labels generated from different giant teacher models.
FerKD follows FKD training code and procedure while using different preprocessed soft labels, please download the soft label for FerKD at link.
Method | Network | accuracy (Top-1) | weights |
---|---|---|---|
FerKD |
ResNet-50 | 81.2 | Download |
FerKD* |
ResNet-50 | 81.4 | Download |
We thank the High-Flyer AI for providing the deep learning platform and computational resources for this work. We'd like to especially thank Yanhong Xu, Xiaowen Sun, Wenjie Wu, and Le Su from High-Flyer AI for helping us organize computing resources.
Zhiqiang Shen (zhiqiangshen0214 at gmail.com)