Skip to content

Latest commit

 

History

History
246 lines (202 loc) · 8.82 KB

DATASETS.md

File metadata and controls

246 lines (202 loc) · 8.82 KB

How to install datasets

Acknowledgement: This readme file for installing datasets has been borrowed directly from CoOp's official repository with a few modifications.

We suggest putting all datasets under the same folder (say $DATA, default root is ./data) to ease management and following the instructions below to organize datasets to avoid modifying the source code. The file structure looks like

$DATA/
|–– imagenet/
|–– caltech-101/
|–– oxford_pets/
|–– stanford_cars/

If you have some datasets already installed somewhere else, you can create symbolic links in $DATA/dataset_name that point to the original data to avoid duplicate download.

Datasets list:

The instructions to prepare each dataset are detailed below. To ensure reproducibility and fair comparison for future work, we provide fixed train/val/test splits for all datasets except ImageNet where the validation set is used as test set. The fixed splits are either from the original datasets (if available) or created by us.

ImageNet

  • Create a folder named imagenet/ under $DATA.
  • Create images/ under imagenet/.
  • Download the dataset from the official website and extract the training and validation sets to $DATA/imagenet/images. The directory structure should look like
imagenet/
|–– images/
|   |–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
|   |–– val/
  • If you had downloaded the ImageNet dataset before, you can create symbolic links to map the training and validation sets to $DATA/imagenet/images.
  • Download the classnames.txt to $DATA/imagenet/ from this link. The class names are copied from CLIP.

Caltech101

The directory structure should look like

caltech-101/
|–– 101_ObjectCategories/
|–– split_zhou_Caltech101.json

OxfordPets

The directory structure should look like

oxford_pets/
|–– images/
|–– annotations/
|–– split_zhou_OxfordPets.json

StanfordCars

The directory structure should look like

stanford_cars/
|–– cars_test\
|–– cars_test_annos_withlabels.mat
|–– cars_train\
|–– devkit\
|–– split_zhou_StanfordCars.json

Flowers102

The directory structure should look like

oxford_flowers/
|–– cat_to_name.json
|–– imagelabels.mat
|–– jpg/
|–– split_zhou_OxfordFlowers.json

Food101

The directory structure should look like

food-101/
|–– images/
|–– license_agreement.txt
|–– meta/
|–– README.txt
|–– split_zhou_Food101.json

FGVCAircraft

The directory structure should look like

fgvc_aircraft/
|–– images/
|–– ... # a bunch of .txt files

SUN397

The directory structure should look like

sun397/
|–– SUN397/
|–– split_zhou_SUN397.json
|–– ... # a bunch of .txt files

DTD

The directory structure should look like

dtd/
|–– images/
|–– imdb/
|–– labels/
|–– split_zhou_DescribableTextures.json

EuroSAT

The directory structure should look like

eurosat/
|–– 2750/
|–– split_zhou_EuroSAT.json

UCF101

  • Create a folder named ucf101/ under $DATA.
  • Download the zip file UCF-101-midframes.zip from here and extract it to $DATA/ucf101/. This zip file contains the extracted middle video frames.
  • Download split_zhou_UCF101.json from this link.

The directory structure should look like

ucf101/
|–– UCF-101-midframes/
|–– split_zhou_UCF101.json

PACS

Download link: google drive.

File structure:

pacs/
|–– images/
|–– splits/

You do not necessarily have to manually download this dataset. Once you run tools/train.py, the code will detect if the dataset exists or not and automatically download the dataset to $DATA if missing. This also applies to VLCS, Office-Home-DG, and Digits-DG.

VLCS

Download link: google drive (credit to https://github.com/fmcarlucci/JigenDG#vlcs)

File structure:

VLCS/
|–– CALTECH/
|–– LABELME/
|–– PASCAL/
|–– SUN/

Office-Home-DG

Download link: google drive.

File structure:

office_home_dg/
|–– art/
|–– clipart/
|–– product/
|–– real_world/

DomainNet

Download link: http://ai.bu.edu/M3SDA/. (Please download the cleaned version of split files)

File structure:

domainnet/
|–– clipart/
|–– infograph/
|–– painting/
|–– quickdraw/
|–– real/
|–– sketch/
|–– splits/
|   |–– clipart_train.txt
|   |–– clipart_test.txt
|   |–– ...