Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning on a large dataset #50

Open
szjyoo opened this issue May 27, 2024 · 4 comments
Open

Transfer learning on a large dataset #50

szjyoo opened this issue May 27, 2024 · 4 comments

Comments

@szjyoo
Copy link

szjyoo commented May 27, 2024

Hello author. I tried to train CAT-Net on the DocTamper dataset (120000 images). I look forward to your answer as to whether I should change self.smallest = 1869 to self.smallest = 120000 in the data_core.py, or should I train with a subset of the full dataset in each round.

@CauchyComplete
Copy link
Collaborator

CauchyComplete commented May 27, 2024

Hello :)

If you are adding the new DocTamper dataset (120k images) to the existing dataset setup, the smallest dataset is still IMD, so self.smallest should be 1869 (the number of images in IMD).
If you are using only the DocTamper dataset without any other datasets, then it would be correct to set self.smallest to 120k. However, this would mean that 120k images are used in one epoch, which would take too long. Since the original training method of CAT-Net uses 1869*10 images per epoch, it might be a good idea to set self.smallest to 1869*10.

@szjyoo
Copy link
Author

szjyoo commented May 27, 2024

Thank you very much for your answer. I'm only using DocTamper as a dataset. My validation set and testing set are 10,000 and 30,000 images respectively, considering the training efficiency and training performance, i want to kown whether I set self.smallest to 10,000 or 1869*10 will get better results.Looking forward to your answer.

@1513691610
Copy link

Thank you very much for your answer. I'm only using DocTamper as a dataset. My validation set and testing set are 10,000 and 30,000 images respectively, considering the training efficiency and training performance, i want to kown whether I set self.smallest to 10,000 or 1869*10 will get better results.Looking forward to your answer.

Hello, I am also training Catnet with Doctamper. Can you leave me a contact information to discuss together? Thank you

@Ridha15
Copy link

Ridha15 commented Jul 31, 2024

Thank you very much for your answer. I'm only using DocTamper as a dataset. My validation set and testing set are 10,000 and 30,000 images respectively, considering the training efficiency and training performance, i want to kown whether I set self.smallest to 10,000 or 1869*10 will get better results.Looking forward to your answer.

Hello, I am also training Catnet with Doctamper. Can you leave me a contact information to discuss together? Thank you

Hey. I want to try the same. Can we connect to discuss?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants