Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing MIDAS on very large/complex datasets #20

Open
neuro30 opened this issue Mar 26, 2022 · 0 comments
Open

Optimizing MIDAS on very large/complex datasets #20

neuro30 opened this issue Mar 26, 2022 · 0 comments

Comments

@neuro30
Copy link

neuro30 commented Mar 26, 2022

In very large datasets (~30,000 samples x 1,000,000 features) with complex relationships (e.g. cancer omics data), the runtime for MIDAS can take a very long time (days?), even on a single GPU. However, I would like to take advantage of the 'overimpute' feature for hyperparameter tuning. This is prohibitive since this very useful feature runs the algorithm multiple times to evaluate various settings.

Would random downsampling of samples (columns) and/or features (rows) generalize the optimal hyperparameters to the larger dataset? For instance, a random subset of 500-1,000 samples with 5,000-10,000 features. This would be to specifically determine the optimal number of: nodes, layers, learning rate, and training epochs. I would think batch size (which can speed up training) is a function of the dataset size, so this would not generalize.

Any help would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant