Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Key Code/Files for Training & Validation on Vimeo-90K and REDS Datasets #16

Open
Walnutes opened this issue Dec 3, 2024 · 6 comments

Comments

@Walnutes
Copy link

Walnutes commented Dec 3, 2024

Hi,

Thank you for sharing your excellent work! I encountered a few issues while trying to train on the Vimeo-90K and REDS datasets:

  1. Vimeo-90K Dataset Process and Training Code/Scripts:
    The dataset directory only contains reds_dataset.py, and train.py seems to import this class exclusively. Could you kindly provide the complete data loading code for Vimeo-90K, as well as any relevant configuration files like config_vimeo.yaml and metadata.txt?
  2. Validation Output for REDS Dataset:
    While validation_steps is defined, I couldn’t locate the corresponding log, output images (e.g., super-resolved images), or evaluation metrics mentioned in your paper. Could you clarify this part?
  3. Training Parameters and Key Files:
    In the train.sh script, VALIDATION_IMAGE points to the train/gt directory. Should this be corrected to train/lq? Also, the required metadata.txt file for this appears to be missing. I attempted to create it based on REDS_train_metadata.txt and the training settings mentioned in your paper as follows:
    000 100\n 011 100\n 015 100\n 020 100
    Could you confirm if this matches your setup? If not, could you kindly upload the correct file? Additionally, if Vimeo-90K also requires such these files/configs, could you provide it as well?

Thank you in advance for your support and for providing additional details. Your assistance would be invaluable in reproducing your results!

@claudiom4sir
Copy link
Owner

claudiom4sir commented Dec 3, 2024

Hi @Walnutes,

Vimeo-90K Dataset Process and Training Code/Scripts:
The dataset directory only contains reds_dataset.py, and train.py seems to import this class exclusively. Could you kindly provide the complete data loading code for Vimeo-90K, as well as any relevant configuration files like config_vimeo.yaml and metadata.txt?

As this project was part of an internship, I don't have access to the old files anymore, so I probably won't be able to provide the necessary code for Vimeo-90-K. I apologize for this.

Validation Output for REDS Dataset:
While validation_steps is defined, I couldn’t locate the corresponding log, output images (e.g., super-resolved images), or evaluation metrics mentioned in your paper. Could you clarify this part?

The validation output is on the Tensorboard. Here, you can observe only a sequence of frames for visual inspection to observe how results change (and hopefully, improve) for a random video clip (in this case sequence 020 was chosen). If you want, you can monitor multiple sequences with a slight code modification. No metrics were computed at this point. This is because computing metrics only on a single (and short) sequence is quite useless, and running a full validation involving multiple and long sequences is very time-consuming.

Training Parameters and Key Files:
In the train.sh script, VALIDATION_IMAGE points to the train/gt directory. Should this be corrected to train/lq?

For VALIDATION_IMAGE, you are right, the correct path is train/lq. I probably forgot to change this when I cleaned the code for publication.

Also, the required metadata.txt file for this appears to be missing. I attempted to create it based on REDS_train_metadata.txt and the training settings mentioned in your paper as follows:
000 100\n 011 100\n 015 100\n 020 100
Could you confirm if this matches your setup? If not, could you kindly upload the correct file? Additionally, if Vimeo-90K also requires such these files/configs, could you provide it as well?

For validation, you don't need any metadata.txt, since images are loaded directly (see log_validation function at line 77 in train.py). You just need to use a sequence of frame paths separated by ; as in train.sh

@Walnutes
Copy link
Author

Walnutes commented Dec 3, 2024

Thank you very much for your prompt and detailed response!

Could you please provide some reliable references/template code (e.g., sr_vimeo90k_multiple_gt_dataset.py) from other official repositories (such as BasicVSR++, which seems to use a different configuration methods based on mmedit) that are similar to your implementation to help me reproduce the missing data loading code for Vimeo-90K?

Any guidance or pointers to relevant parts of other repositories would be incredibly helpful for replicating your setup.

Looking forward to your further response!

@claudiom4sir
Copy link
Owner

claudiom4sir commented Dec 4, 2024

From BasicSR, you need the file vimeo90k_dataset.py. You have to modify the class Vimeo90KRecurrentDataset (check my reds_dataset.py) . You can find metadata.txt for Vimeo here. You might remove the image size from the file for some reason. The config_vimeo90k.yaml should be quite easy to setup, check config_reds.yaml. Finally, in train.py you have to change the dataset to import and modify train.sh accordingly.

@Walnutes
Copy link
Author

Walnutes commented Dec 4, 2024

Thank you so much for your detailed guidance! It has been incredibly helpful.

I’ve modified vimeo90k_dataset.py based on reds_dataset.py and successfully implemented and run the training code on the Vimeo-90K dataset! But I still have a few key questions regarding specific parameters and configurations to ensure accurate reproduction:

  1. num_frame and interval_list:
    In the REDS dataset, these values are set to 3 and 1, respectively, to generate the neighbor_list parameter, which determines the number of frames (t) per batch. However, in the Vimeo-90K dataloader, the default self.neighbor_list = [1, 2, 3, 4, 5, 6, 7]. Should this be modified to match the REDS setting, or is the default configuration suitable for Vimeo-90K?

  2. Data augmentation details:

    • Crop size (gt_size):
      In the REDS dataset, is the cropping size (gt_size=256) specified for the ground truth (img_gts) and low-quality images (img_lqs) based on their resolution of 1280x720? It's worth noting that the original resolution of Vimeo-90K images is 448x256. Should the crop size be adjusted accordingly, or could you kindly provide a reasonable value setting?
    • Sequence flipping (flip_sequence):
      In the Vimeo-90K dataloader, there is an additional self.flip_sequence parameter that flips the sequence, effectively doubling the frames from 7 to 14. It seems to be similar with the parameter self.opt['use_hflip']? This is implemented as follows:
      img_lqs = torch.cat([img_lqs, img_lqs.flip(0)], dim=0)
      img_gts = torch.cat([img_gts, img_gts.flip(0)], dim=0)
      Could you clarify whether this parameter should be used in training, and if so, under what conditions?

Thank you again for your time and support! Looking forward to your response.

@Walnutes
Copy link
Author

Walnutes commented Dec 5, 2024

Besides the training-related questions mentioned earlier, I also have a issue about the model weights you uploaded to HuggingFace. Due to the fact that both Vimeo-90K and REDS datasets require separate training as mentioned in #15. However, the StableVSR weights on HuggingFace only provide a single set/type of weights. Could you clarify:

  1. Are these weights the final results trained on either the Vimeo-90K dataset?
  2. Or the REDS dataset?
  3. Or just pre-trained weights on other datasets without fine-tuning on Vimeo-90K or REDS?

If it’s the third case, could you kindly share the final weights specifically fine-tuned on the Vimeo-90K and REDS datasets?
This would greatly help in reproducing the results as described in your paper. Thank you once again for your time and support!

@claudiom4sir
Copy link
Owner

claudiom4sir commented Dec 5, 2024

In the REDS dataset, these values are set to 3 and 1, respectively, to generate the neighbor_list parameter, which determines the number of frames (t) per batch. However, in the Vimeo-90K dataloader, the default self.neighbor_list = [1, 2, 3, 4, 5, 6, 7]. Should this be modified to match the REDS setting, or is the default configuration suitable for Vimeo-90K?

You can use 3 and 1 for Vimeo-90K as well, in this case you are considering just the central frame of the sequence, the previous and next ones.

In the REDS dataset, is the cropping size (gt_size=256) specified for the ground truth (img_gts) and low-quality images (img_lqs) based on their resolution of 1280x720? It's worth noting that the original resolution of Vimeo-90K images is 448x256. Should the crop size be adjusted accordingly, or could you kindly provide a reasonable value setting?

256 is ok for Vimeo-90K, too. You are doing a crop in width but not in height. You can use a smaller value, like 128 or 192, for more data augmentation, but the dataset is quite large so I think 256 is ok.

In the Vimeo-90K dataloader, there is an additional self.flip_sequence parameter that flips the sequence, effectively doubling the frames from 7 to 14.

I used flip_sequence: False. The .yaml file options are almost equal to REDS but without val_partition: 'REDS4'.

Besides the training-related questions mentioned earlier, I also have a issue about the model weights you uploaded to HuggingFace. Due to the fact that both Vimeo-90K and REDS datasets require separate training as mentioned in #15. However, the StableVSR weights on HuggingFace only provide a single set/type of weights. Could you clarify:
Are these weights the final results trained on either the Vimeo-90K dataset?
Or the REDS dataset?
Or just pre-trained weights on other datasets without fine-tuning on Vimeo-90K or REDS?
If it’s the third case, could you kindly share the final weights specifically fine-tuned on the Vimeo-90K and REDS datasets?
This would greatly help in reproducing the results as described in your paper. Thank you once again for your time and support!

Option 2 is correct. The pre-trained model uploaded on HuggingFace allows you to replicate the REDS4 results reported in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants