Can't get the same acc with the pytorch transform in iNaturalist2018 dataset #5721

lyyi599 · 2024-11-21T15:26:22Z

Thanks for this awesome tool.

Describe the question.

For my experiments, I need to modify a piece of open-source code. Due to the time-consuming nature of the transformation and the large size of the iNaturalist2018 dataset (over 400,000 images), I plan to switch to using DALI for data loading.
Below is the original open-source code snippet.(from:https://github.com/shijxcs/LIFT/blob/661ead9b78368f05ba79abe4672d63154467f823/trainer.py#L102）

    transform_train = transforms.Compose([
        transforms.RandomResizedCrop(resolution),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std),
    ])
    if cfg.tte:
        if cfg.tte_mode == "fivecrop":
            transform_test = transforms.Compose([
                transforms.Resize(resolution + expand),
                transforms.FiveCrop(resolution),
                transforms.Lambda(lambda crops: torch.stack([transforms.ToTensor()(crop) for crop in crops])),
                transforms.Normalize(mean, std),
            ])
        elif cfg.tte_mode == "tencrop":
            transform_test = transforms.Compose([
                transforms.Resize(resolution + expand),
                transforms.TenCrop(resolution),
                transforms.Lambda(lambda crops: torch.stack([transforms.ToTensor()(crop) for crop in crops])),
                transforms.Normalize(mean, std),
            ])
        elif cfg.tte_mode == "randaug":
            _resize_and_flip = transforms.Compose([
                transforms.RandomResizedCrop(resolution),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
            ])
            transform_test = transforms.Compose([
                transforms.Lambda(lambda image: torch.stack([_resize_and_flip(image) for _ in range(cfg.randaug_times)])),
                transforms.Normalize(mean, std),
            ])
    else:
        transform_test = transforms.Compose([
            transforms.Resize(resolution * 8 // 7),
            transforms.CenterCrop(resolution),
            transforms.Lambda(lambda crop: torch.stack([transforms.ToTensor()(crop)])),
            transforms.Normalize(mean, std),
        ])
    self.train_loader = DataLoader(train_dataset,
        batch_size=cfg.micro_batch_size, shuffle=True,
        num_workers=cfg.num_workers, pin_memory=True)

    self.test_loader = DataLoader(test_dataset,
        batch_size=64, shuffle=False,
        num_workers=cfg.num_workers, pin_memory=True)

Referring to the DALI example（

DALI/docs/examples/use_cases/pytorch/resnet50/main.py

Line 275 in 4562f15

train_pipe = create_dali_pipeline(batch_size=args.batch_size,

)I have successfully adapted the code for ImageNet_LT and achieved results comparable to the original paper (77.0 with LA loss). The modified code is as follows:

  def create_dali_pipeline(data_dir, data_list_dir, crop, size, shard_id, num_shards, dali_cpu=False, is_training=True):
      images, labels = fn.readers.file(file_root=data_dir,
                                       file_list=data_list_dir,
                                       shard_id=shard_id,
                                       num_shards=num_shards,
                                       random_shuffle=is_training,
                                       pad_last_batch=True,
                                       name="Reader")
      dali_device = 'cpu' if dali_cpu else 'gpu'
      decoder_device = 'cpu' if dali_cpu else 'mixed'
      # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
      preallocate_width_hint = 5980 if decoder_device == 'mixed' else 0
      preallocate_height_hint = 6430 if decoder_device == 'mixed' else 0
      if is_training:
          images = fn.decoders.image_random_crop(images,
                                                 device=decoder_device, output_type=types.RGB,
                                                 preallocate_width_hint=preallocate_width_hint,
                                                 preallocate_height_hint=preallocate_height_hint,
                                                 random_aspect_ratio=[0.8, 1.25],
                                                 random_area=[0.1, 1.0],
                                                 num_attempts=100)
          images = fn.resize(images,
                             device=dali_device,
                             resize_x=crop,
                             resize_y=crop,
                             interp_type=types.INTERP_TRIANGULAR)
          mirror = fn.random.coin_flip(probability=0.5)
      else:
          images = fn.decoders.image(images,
                                     device=decoder_device,
                                     output_type=types.RGB)
          images = fn.resize(images,
                             device=dali_device,
                             size=size,
                             mode="not_smaller",
                             interp_type=types.INTERP_TRIANGULAR)
          mirror = False
  
      images = fn.crop_mirror_normalize(images.gpu(),
                                        dtype=types.FLOAT,
                                        output_layout="CHW",
                                        crop=(crop, crop),
                                        mean=[0.485 * 255,0.456 * 255,0.406 * 255],
                                        std=[0.229 * 255,0.224 * 255,0.225 * 255],
                                        mirror=mirror)
      
      # for tte
      if not is_training:
          images = fn.stack(images, axis=0)
      labels = labels.gpu()
      return images, labels
  
              crop_size = 224
              val_size = 256 #five crop的时候需要进一步修改
              train_pipe = create_dali_pipeline(
                  batch_size=cfg.batch_size,
                  num_threads=cfg.num_workers,
                  device_id=torch.cuda.current_device(),
                  seed=12 + torch.cuda.current_device(),
                  data_dir=cfg.root,
                  data_list_dir = train_root_lt,
                  crop=crop_size,
                  size=val_size,
                  dali_cpu=cfg.dali_cpu,
                  shard_id=torch.cuda.current_device(),
                  num_shards=torch.cuda.device_count(),
                  is_training=True
              )
              train_pipe.build()
              self.train_loader = DALIClassificationIterator(train_pipe, reader_name="Reader",
                                                    last_batch_policy=LastBatchPolicy.PARTIAL,
                                                    auto_reset=True)
              
              test_pipe = create_dali_pipeline(
                  batch_size=cfg.batch_size,
                  num_threads=cfg.num_workers,
                  device_id=torch.cuda.current_device(),
                  seed=12 + torch.cuda.current_device(),
                  data_dir=cfg.root,
                  data_list_dir = test_root_lt,
                  crop=crop_size,
                  size=val_size,
                  dali_cpu=cfg.dali_cpu,
                  shard_id=torch.cuda.current_device(),
                  num_shards=torch.cuda.device_count(),
                  is_training=False)
              test_pipe.build()
              self.test_loader = DALIClassificationIterator(test_pipe, reader_name="Reader",
                                                  last_batch_policy=LastBatchPolicy.PARTIAL,
                                                  auto_reset=True)

As mentioned earlier, the code has been validated on the ImageNet_LT dataset. However, when applied to the iNaturalist2018 dataset, it results in significant differences, specifically:

Under the same settings, the DALI pipeline achieves only 67.6% accuracy, while using PyTorch transforms achieves the paper's reported result of 79.1%.
In the experiment logs using PyTorch transforms, the model initially learns samples from head classes (categories with more samples that are easier to collect), as observed in the first 50 batches.

epoch [1/20] batch [10/3419] time 0.761 (2.635) data 0.433 (2.208) loss 6.9438 (7.0018) acc 1.5625 (2.2394) (mean 0.2032 many 1.4897 med 0.0981 few 0.0000) lr 1.0000e-02 eta 2 days, 2:02:49
epoch [1/20] batch [20/3419] time 0.334 (1.935) data 0.000 (1.556) loss 6.7982 (6.7163) acc 7.0312 (4.1451) (mean 0.4500 many 2.4381 med 0.3216 few 0.0931) lr 1.0000e-02 eta 1 day, 12:44:06
epoch [1/20] batch [30/3419] time 0.331 (1.696) data 0.000 (1.332) loss 6.1816 (6.5292) acc 9.3750 (5.8511) (mean 0.6444 many 3.3323 med 0.5228 few 0.0962) lr 1.0000e-02 eta 1 day, 8:11:31
epoch [1/20] batch [40/3419] time 0.331 (1.587) data 0.000 (1.232) loss 6.1855 (6.3015) acc 5.4688 (6.2589) (mean 0.7358 many 3.6615 med 0.5883 few 0.1582) lr 1.0000e-02 eta 1 day, 6:08:02
epoch [1/20] batch [50/3419] time 0.329 (1.721) data 0.000 (1.370) loss 5.4283 (6.0245) acc 12.5000 (8.0597) (mean 0.9444 many 4.2423 med 0.7868 few 0.2823) lr 1.0000e-02 eta 1 day, 8:39:37

In contrast, using the DALI pipeline prioritizes learning from tail classes (categories with fewer samples that are harder to collect). The corresponding log is as follows:

epoch [1/20] batch [10/3419] time 0.327 (0.372) data 0.001 (0.001) loss 9.2174 (9.8390) acc 0.0000 (0.0461) (mean 0.0123 many 0.0000 med 0.0000 few 0.0310) lr 1.0000e-02 eta 7:03:39
epoch [1/20] batch [20/3419] time 0.326 (0.349) data 0.001 (0.001) loss 8.7870 (9.2437) acc 0.0000 (0.2400) (mean 0.0167 many 0.0000 med 0.0000 few 0.0422) lr 1.0000e-02 eta 6:37:59
epoch [1/20] batch [30/3419] time 0.326 (0.342) data 0.001 (0.001) loss 8.3188 (8.8392) acc 0.0000 (0.1540) (mean 0.0163 many 0.0000 med 0.0000 few 0.0413) lr 1.0000e-02 eta 6:29:29
epoch [1/20] batch [40/3419] time 0.327 (0.338) data 0.001 (0.001) loss 8.4082 (8.6090) acc 0.0000 (0.1240) (mean 0.0164 many 0.0000 med 0.0000 few 0.0415) lr 1.0000e-02 eta 6:25:10
epoch [1/20] batch [50/3419] time 0.326 (0.336) data 0.001 (0.001) loss 8.5320 (8.5095) acc 0.0000 (0.2166) (mean 0.0190 many 0.0000 med 0.0000 few 0.0480) lr 1.0000e-02 eta 6:22:21

It is evident that the learning processes of the two methods differ significantly. However, such a large discrepancy is not observed on the ImageNet_LT dataset.

Of course, after a certain number of epochs, the logs for both PyTorch transforms and the DALI pipeline eventually show lower accuracy for head classes and higher accuracy for tail classes. However, the overall accuracy drops by over 10%, which is an unacceptable difference.

Possible Explanations

The sampling process of the pipeline and the dataloader is different. However, I don't think it would cause such a significant discrepancy.

Heeeelp

Thank you for any suggestions!

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

JanuszL · 2024-11-21T18:14:49Z

Hi @lyyi599,

Thank you for reaching out and using DALI.

Under the same settings, the DALI pipeline achieves only 67.6% accuracy, while using PyTorch transforms achieves the paper's reported result of 79.1%.
In the experiment logs using PyTorch transforms, the model initially learns samples from head classes (categories with more samples that are easier to collect), as observed in the first 50 batches.

DALI file reader does the following:

initially mix the whole data set to make sure that it doesn't sample only from the first class first
read N samples into shuffling buffer, where N is 1024 by default
randomly sample the buffer

So it should not favor less-represented samples. Can you gather statistics of the classes DALI reader returns? Do they match the sample distribution in the dataset?

lyyi599 · 2024-11-22T09:50:25Z

Hi @lyyi599,

Thank you for reaching out and using DALI.

Under the same settings, the DALI pipeline achieves only 67.6% accuracy, while using PyTorch transforms achieves the paper's reported result of 79.1%.
In the experiment logs using PyTorch transforms, the model initially learns samples from head classes (categories with more samples that are easier to collect), as observed in the first 50 batches.

DALI file reader does the following:

initially mix the whole data set to make sure that it doesn't sample only from the first class first

read N samples into shuffling buffer, where N is 1024 by default

randomly sample the buffer

So it should not favor less-represented samples. Can you gather statistics of the classes DALI reader returns? Do they match the sample distribution in the dataset?

Hi @JanuszL ,

Thank you for your reply. After my verification, it is indeed the case that each epoch samples the data from the train set once for training, and they match the sample distribution in the dataset. However, the code still produces the results mentioned earlier. For example, in first epoch, the accuracy obtained on val dataset using DALI is as follows: many: 20.0%, med: 21.0%, few: 22.2%, average: 25.4%, while the results using PyTorch transforms are: many: 3.4%, med: 9.3%, few: 16.5%, average: 11.5%.

This is very confusing to me. The main training code I'm currently using is as follows (including the pipeline and transform). Simply switching the dali_dataset flag to include iNaturalist2018 (which corresponds to using the pipeline and transform for data loading) results in the significant accuracy drop mentioned above. Could you help me check if I am using DALI correctly?trainer.py.txt

It is worth mentioning that the iNaturalist2018 dataset uses a .txt file for indexing, so the create_dali_pipeline function includes the corresponding data_list_dir parameter for reading it.

Thank you for any suggestions you may have.

mzient · 2024-11-22T16:29:04Z

Hello @lyyi599 ,
Where do the iNaturalist18_train.txt and iNaturalist18_val.txt come from? They're not part of the original dataset. Perhaps they are simply generated incorrectly and the samples are mislabeled in training?

lyyi599 · 2024-11-22T17:00:04Z

Hi @mzient ,

Thank you for your reply, and I apologize for not explaining the issue more clearly. Let me provide some context for the code: this is about long-tail recognition. Many real-world problems exhibit a long-tail distribution, meaning that in the training process, the number of samples per class varies. The iNaturalist2018 dataset is an example of this, and it has been widely studied. You can find existing examples here: https://github.com/shijxcs/LIFT/tree/661ead9b78368f05ba79abe4672d63154467f823/datasets/iNaturalist18. Therefore, the issue is likely not related to the .txt file used for indexing.

In the trainer.py code above, the only difference is how iNaturalist18 is loaded—specifically, the use of the pipeline and dataloader. This difference is causing the accuracy drop, so I suspect that there might be an issue with how I’m using DALI, or there could be some bugs in DALI that I haven’t identified. For comparison, the Imagenet_LT dataset is also indexed using a .txt file, and it uses a similar pipeline and dataloader approach, but I am able to obtain comparable results.

JanuszL · 2024-11-22T17:21:20Z

Therefore, the issue is likely not related to the .txt file used for indexing.

In the trainer.py code above, the only difference is how iNaturalist18 is loaded—specifically, the use of the pipeline and dataloader. This difference is causing the accuracy drop, so I suspect that there might be an issue with how I’m using DALI, or there could be some bugs in DALI that I haven’t identified. For comparison, the Imagenet_LT dataset is also indexed using a .txt file, and it uses a similar pipeline and dataloader approach, but I am able to obtain comparable results.

We are not saying it is the cause, but we want to make sure we are looking at the same things. If the index file misses some samples then DALI will not return them underrepresenting some classes and overrepresenting others. If you can also provide a way you generated these or the files itself it would be great.

lyyi599 · 2024-11-22T18:06:42Z

Therefore, the issue is likely not related to the .txt file used for indexing.
In the trainer.py code above, the only difference is how iNaturalist18 is loaded—specifically, the use of the pipeline and dataloader. This difference is causing the accuracy drop, so I suspect that there might be an issue with how I’m using DALI, or there could be some bugs in DALI that I haven’t identified. For comparison, the Imagenet_LT dataset is also indexed using a .txt file, and it uses a similar pipeline and dataloader approach, but I am able to obtain comparable results.

We are not saying it is the cause, but we want to make sure we are looking at the same things. If the index file misses some samples then DALI will not return them underrepresenting some classes and overrepresenting others. If you can also provide a way you generated these or the files itself it would be great.

I get you. I download the files from the repo: https://github.com/shijxcs/LIFT/tree/661ead9b78368f05ba79abe4672d63154467f823/datasets/iNaturalist18 , and I ensured that the file properly indexes the corresponding data.

mzient · 2024-11-25T13:08:31Z

Hello @lyyi599
I've tried to run the attached trainer.py.txt but it doesn't work out of the box. Can you please post a link to your branch with the code you use with DALI?

lyyi599 · 2024-11-25T20:57:45Z

Hello @lyyi599 I've tried to run the attached trainer.py.txt but it doesn't work out of the box. Can you please post a link to your branch with the code you use with DALI?

Thank you for your reply. Sorry for my late response because I made many changes these days for debugging. In order to ensure it runs better, I retested the code, and it is now complete.

First, I would like to point out that the iNaturalist 2018 dataset is quite large, and I think you may encounter difficulties running the code due to the potentially long time it takes to download the dataset. Therefore, I have kept the log files that were mentioned in our previous discussion, which encountered issues, so you can review them for reference.

[file0] MLA\output\inat2018_clip_vit_b16_adaptformer_True_num_epochs_20\log.txt-2024-11-20-16-35-42：The complete log of training with PyTorch transforms，accuracy: 79.4%

[file1] MLA\output\inat2018_clip_vit_b16_adaptformer_True_loss_type_LA_num_epochs_20_lr_0.01\log.txt-2024-11-19-20-24-00：The complete log of training with DALI，many: 40.7% med: 67.0% few: 75.3% average: 67.6%

[file2] MLA\output\inat2018_clip_vit_b16_adaptformer_True_loss_type_LA_num_epochs_20_lr_0.01\log.txt-2024-11-22-14-51-57：The one-epoch log of training with PyTorch transforms，many: 20.0% med: 21.0% few: 22.2%

[file3] MLA\output\inat2018_clip_vit_b16_adaptformer_True_loss_type_LA_num_epochs_20_lr_0.01\log.txt-2024-11-22-16-21-59：The one-epoch log of training with DALI，many: 3.4% med: 9.3% few: 16.5%

From the log, we can see that file0 has a 10-point higher accuracy compared to file1. If only one epoch is run, file2 also performs significantly better than file3. Additionally, when looking at file3, there are substantial differences in the training process, with the performance on "many" categories and "few" categories showing initial growth. I appreciate everyone’s outstanding work, as the training time has been reduced from 2 days to 8 hours (using a single 3090 GPU). However, for some currently unknown reasons, there has been a significant drop in accuracy, which is troubling me.

If it's convenient for you to run the code, I have published the runnable code in the following repository [https://github.com/lyyi599/MLA]. You can refer to the repository's README for instructions on how to run it.

In any case, thank you for your help. If you have any good solutions, or if I am using DALI incorrectly, please let me know. Thank you again.

lyyi599 added the question Further information is requested label Nov 21, 2024

dali-automaton assigned mdabek-nvidia Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get the same acc with the pytorch transform in iNaturalist2018 dataset #5721

Can't get the same acc with the pytorch transform in iNaturalist2018 dataset #5721

lyyi599 commented Nov 21, 2024 •

edited

Loading

JanuszL commented Nov 21, 2024

lyyi599 commented Nov 22, 2024

mzient commented Nov 22, 2024 •

edited

Loading

lyyi599 commented Nov 22, 2024

JanuszL commented Nov 22, 2024

lyyi599 commented Nov 22, 2024 •

edited

Loading

mzient commented Nov 25, 2024

lyyi599 commented Nov 25, 2024 •

edited

Loading

Can't get the same acc with the pytorch transform in iNaturalist2018 dataset #5721

Can't get the same acc with the pytorch transform in iNaturalist2018 dataset #5721

Comments

lyyi599 commented Nov 21, 2024 • edited Loading

Describe the question.

Possible Explanations

Heeeelp

Check for duplicates

JanuszL commented Nov 21, 2024

lyyi599 commented Nov 22, 2024

mzient commented Nov 22, 2024 • edited Loading

lyyi599 commented Nov 22, 2024

JanuszL commented Nov 22, 2024

lyyi599 commented Nov 22, 2024 • edited Loading

mzient commented Nov 25, 2024

lyyi599 commented Nov 25, 2024 • edited Loading

lyyi599 commented Nov 21, 2024 •

edited

Loading

mzient commented Nov 22, 2024 •

edited

Loading

lyyi599 commented Nov 22, 2024 •

edited

Loading

lyyi599 commented Nov 25, 2024 •

edited

Loading