Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Use a consistent country name across usvi samples both submitted to genbank and not yet #41

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Feb 26, 2024

Description of proposed changes

Use a consistent country name across usvi samples both submitted to genbank and not yet.

Related issue(s)

Checklist

  • Checks pass

@j23414 j23414 requested a review from a team February 26, 2024 20:26
KY328289 date 2016-05-15
MW165884 country USVI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a request for changes

I was expecting a change in the (local) geolocation rules, but this works too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my expectation too, since this is overwriting the geolocation rules from ncov-ingest, which include

$ grep -i 'virgin islands' source-data/gisaid_geoLocationRules.tsv 
*/Virgin Islands/St. Thomas/	North America/USA/Virgin Islands/St. Thomas
North America/USA/Vi/*	North America/USA/Virgin Islands/*
North America/USA/Virgin Islands of the U.S./*	North America/USA/Virgin Islands/*
North America/USA/U.S. Virgin Islands/*	North America/USA/Virgin Islands/*
North America/USA/Us Virgin Islands/*	North America/USA/Virgin Islands/*
North America/USA/Virgin Islands Of The U.S./*	North America/USA/Virgin Islands/*
North America/Caribbean/British Virgin Islands/	Europe/United Kingdom/British Virgin Islands/
North America/British Virgin Islands/*/*	Europe/United Kingdom/British Virgin Islands/*
North America/U.S. Virgin Islands//	North America/USA/Virgin Islands/
North America/U.S. Virgin Islands/U.S. Virgin Islands/	North America/USA/Virgin Islands/
North America/U.S. Virgin Islands/St. Thomas/	North America/USA/Virgin Islands/St. Thomas
North America/U.S. Virgin Islands/St. Croix/	North America/USA/Virgin Islands/St. Croix
South America/British Virgin Islands//	Europe/United Kingdom/British Virgin Islands/
South America/British Virgin Islands/British Virgin Islands/	Europe/United Kingdom/British Virgin Islands/
South America/British Virgin Islands/Caribbean/	Europe/United Kingdom/British Virgin Islands/

If we want to special case zika to use USVI, I think this can be done with the local geolocation rule:

North America/USA/Virgin Islands/*	North America/USVI/USVI/*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that we want to label all "Virgin Islands" records as "USVI" not just the ones associated with this https://github.com/blab/zika-usvi/ study?

If so, is there a reason the zika build should use "USVI" while ncov, mpox, and other pathogens use "Virgin Islands"? I'm happy to add it here, just thinking through if this is a good design decision.

Alternatively, I'm happy to reverse this change and use "Virgin Islands" instead, as suggested by the 2nd solution here as being the better solution going forward.

  1. Change the spiked-in sequences to "Virgin Islands" to match GenBank.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my understanding based upon the language used in the paper and the country strings we were using while maintaining the Zika tree during the outbreak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all! Switched to the local geolocation rule, and tried to summarize the rational in the commit message: 927222f

Please let me know if you catch anything else

@j23414 j23414 force-pushed the 40-use-a-consistent-country-name-across-usvi-samples-both-submitted-to-genbank-and-not-yet branch from 6215812 to 927222f Compare March 1, 2024 21:26
j23414 added 2 commits March 4, 2024 13:20
Label all "Virgin Islands" records as "USVI" to be consistent with text from publication:

Black, A. et al., 2017. Genetic characterization of the Zika virus epidemic in the US Virgin Islands. bioRxiv, p.113100.
https://www.biorxiv.org/content/10.1101/113100v2
@j23414 j23414 force-pushed the 40-use-a-consistent-country-name-across-usvi-samples-both-submitted-to-genbank-and-not-yet branch from 927222f to 04387cd Compare March 4, 2024 21:21
@j23414 j23414 merged commit a91e575 into main Mar 4, 2024
32 checks passed
@j23414 j23414 deleted the 40-use-a-consistent-country-name-across-usvi-samples-both-submitted-to-genbank-and-not-yet branch March 4, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a consistent "country" name across USVI samples (both submitted to GenBank and not-yet)
3 participants