This repository includes resources for the Caltech Fish Counting – Domain Adaptive Object Detection (CFC-DAOD) dataset introduced in Align and Distill: A Unified Framework for Domain Adaptive Object Detection. It is an extension of the Caltech Fish Counting Dataset (ECCV 2022) that includes additional data for unsupervised domain adaptation.
Below we provide download links for all data and annotations. Please see the Align and Distill (ALDI) codebase to train DAOD models on CFC-DAOD.
Like other DAOD benchmarks, CFC-DAOD consists of data from two domains, source and target.
-
Source data
- Train: In CFC-DAOD, the source-domain training set consists of training data from the original CFC data release, i.e., video frames from the 'Kenai left bank' location. We have used the 3-channel 'Baseline++' format introduced in the original CFC paper. For experiments in the ALDI paper, we subsampled empty frames to be around 10% of the total data, resulting in 76,619 training images. For reproducibility, we release the exact subsampled set below. When publishing results on CFC-DAOD, however, researchers are allowed to use the orignial CFC training set however they see fit and are not required to use our subsampled 'Baseline++' data.
- Validation The CFC-DAOD Kenai (source) validation set is the same as the original CFC validation set. We use the 3-channel 'Baseline++' format from the original CFC paper. There are 30,454 validation images.
-
Target data
- Train: In CFC-DAOD, the target-domain 'training' set consists of new data from the 'Kenai Channel' location in CFC. These frames should be treated as unlabeled for DAOD methods, but labeled for Oracle methods. We also use the 'Baseline++' format. There are 29,089 target train images.
- Test: The CFC-DAOD target-domain test set is the same as the 'Kenai Channel' test set from CFC. We use the 'Baseline++' format. There are 13,091 target test images. Researchers should publish final mAP@Iou=0.5 numbers on this data, and may use this data for model selection for fair comparison with prior methods.
Labels: All annotations are in COCO format.
Data can be downloaded from CaltechDATA using the following links.
Images:
CFC Kenai (source) train images (16 GB)
- Running
md5sum cfc_train.zip
should return935b4cd5ae5812035051f24e6707ee17 cfc_train.zip
CFC Kenai (source) val images (4.1 GB)
- Running
md5sum cfc_val.zip
should returne662ae8318621d1a636f0befadddaf48 cfc_val.zip
CFC Channel (target) train images (2.9 GB)
- Running
md5sum cfc_channel_train.zip
should returnd17e0485674327df3d7611a5d6b999b1 cfc_channel_train.zip
CFC Channel (target) test images (2.8 GB)
- Running
md5sum cfc_channel_test.zip
should return9c15b9c9dc6784cce9dba21e81cb514a cfc_channel_test.zip
Labels:
CFC Kenai (source) train labels
CFC Channel (target) train labels
CFC Channel (target) test labels
Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, and Grant Van Horn.
Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes → Foggy Cityscapes, +5.7 AP50 on Sim10k → Cityscapes (where ours is the only method to outperform a fair baseline), and +2.0 AP50 on CFC Kenai → Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research.
If you find our work useful in your research please consider citing our paper:
@misc{kay2024align,
title={Align and Distill: Unifying and Improving Domain Adaptive Object Detection},
author={Justin Kay and Timm Haucke and Suzanne Stathatos and Siqi Deng and Erik Young and Pietro Perona and Sara Beery and Grant Van Horn},
year={2024},
eprint={2403.12029},
archivePrefix={arXiv},
primaryClass={cs.CV}
}