Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference results for 11.03.23 #15

Merged
merged 14 commits into from
Dec 2, 2023
Merged

Conversation

bkotzen
Copy link
Contributor

@bkotzen bkotzen commented Nov 3, 2023

No description provided.

@bkotzen
Copy link
Contributor Author

bkotzen commented Nov 3, 2023

@martinjankowiak new inference results! had to subsample the GISAID metadata to get it to work in memory; hopefully there will be a more elegant solution later

@bkotzen
Copy link
Contributor Author

bkotzen commented Nov 17, 2023

@martinjankowiak updating with 11.17 as well!

@martinjankowiak
Copy link
Collaborator

martinjankowiak commented Nov 18, 2023

@bkotzen if you're doing data subsampling can you note that in the corresponding pair of metadata.txt with a new entry? something like
Percent of raw data included in inference: 93%

EDIT: actually why are the numbers of regions/sequences so different between the two??

@bkotzen
Copy link
Contributor Author

bkotzen commented Nov 28, 2023

Hey @martinjankowiak yes I am doing some significant downsampling to meet compute restrictions. However, I'm finalizing a fix right now and I will re-run results for dates where down-sampling occurred. I should be able to impute some better results by EOW and that will resolve this issue!

@bkotzen
Copy link
Contributor Author

bkotzen commented Nov 29, 2023

Just added in some more recent results - to make sure everything works. Now that I know it works on current data I can fix 11.17.2023

@martinjankowiak
Copy link
Collaborator

@bkotzen how does the subsampling work? when you update the PR can you please make sure to:

  • delete all irrelevant/stale files
  • update metadata.txt everywhere where necessary to indicate to indicate that data was subsampled

@bkotzen
Copy link
Contributor Author

bkotzen commented Nov 30, 2023

The subsampling randomly selected a proportion of the samples in the GISAID metadata and ran PyR0 and BVAS only on those samples.

I have now made some tweaks to my compute infrastructure and the preprocessing code in order to handle the large number of sequences.

The timeline is:
11/03 started subsampling, now I'm re-doing this run
11/17 significant subsampling, but I just deleted the original results (5c68de7) and replaced them with non-subsampled results (e10667a)
11/28 by this point I had created a solution where I didn't need to subsample anymore so these results are fine

@martinjankowiak
Copy link
Collaborator

@bkotzen i see so we're waiting for you to replace the 11/03 run?

@bkotzen
Copy link
Contributor Author

bkotzen commented Dec 1, 2023

Yep! 11/03 is almost done - will put it here ASAP

@bkotzen
Copy link
Contributor Author

bkotzen commented Dec 1, 2023

Ok @martinjankowiak! Everything is all set now 💪🏼 😤

Copy link
Collaborator

@martinjankowiak martinjankowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks @bkotzen !

@martinjankowiak martinjankowiak merged commit 8460e2f into broadinstitute:main Dec 2, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants