Do CIFAR-10 Classifiers Generalize to CIFAR-10?

Benjamin Recht, Rebecca Roelofs, Ludwig Schmit, Vaishaal Shankar. Do CIFAR-10 Classifiers Generalize to CIFAR-10?. 2018.

tl;dr

To understand overfitting, authors augmented CIFAR-10 with new (but similar) data.
Large drop in accuracy from lots of deep learning models, but models with higher accuracy have smaller drop
Evidence that current accuracy numbers are brittle.

CIFAR-10

One of the original AI datasets, CIFAR-10 has images from 10 classes (e.g. dog) and multiple correponding keywords. CIFAR-10 was drawn from a larger dataset called Tiny Images. In collecting new images and labelling from Tiny Images, authors ensured that the keyword distribution was roughly similar with a bias on more common keywords.

Performance Results

Authors tested over 20 deep learning models spanning conventional (VGG, ResNet) to state-of-art (Shake-Drop) using publically available code. All models see a drop in accuracy (e.g. VGG saw 8% drop).

Explanations

The original CIFAR-10 dataset had near duplicates which may allow models to overfit and see test accuracy still remain high. This would only explain at most 1% difference. Another possible explanation is that hyperparameter tuning may have overfit to CIFAR-10. Even with hyperparameter tuning, the results still hold and no settings can be found to produce significantly higher accuracy.

The implication seems to be that years of CIFAR-10 training and testing have overfit to the CIFAR-10 test data.

What's next?

Certainly other common datasets (ImageNET, MIMIC-III) could be augmented and examined against existing algorithms. More broadly, we want to ensure that our models are not overfitting and are truly robust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecEtAl18.md

RecEtAl18.md

Do CIFAR-10 Classifiers Generalize to CIFAR-10?

tl;dr

CIFAR-10

Performance Results

Explanations

What's next?

Files

RecEtAl18.md

Latest commit

History

RecEtAl18.md

File metadata and controls

Do CIFAR-10 Classifiers Generalize to CIFAR-10?

tl;dr

CIFAR-10

Performance Results

Explanations

What's next?