-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Use a consistent country name across usvi samples both submitted to genbank and not yet #41
Conversation
ingest/config/annotations.tsv
Outdated
KY328289 date 2016-05-15 | ||
MW165884 country USVI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a request for changes
I was expecting a change in the (local) geolocation rules, but this works too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my expectation too, since this is overwriting the geolocation rules from ncov-ingest, which include
$ grep -i 'virgin islands' source-data/gisaid_geoLocationRules.tsv
*/Virgin Islands/St. Thomas/ North America/USA/Virgin Islands/St. Thomas
North America/USA/Vi/* North America/USA/Virgin Islands/*
North America/USA/Virgin Islands of the U.S./* North America/USA/Virgin Islands/*
North America/USA/U.S. Virgin Islands/* North America/USA/Virgin Islands/*
North America/USA/Us Virgin Islands/* North America/USA/Virgin Islands/*
North America/USA/Virgin Islands Of The U.S./* North America/USA/Virgin Islands/*
North America/Caribbean/British Virgin Islands/ Europe/United Kingdom/British Virgin Islands/
North America/British Virgin Islands/*/* Europe/United Kingdom/British Virgin Islands/*
North America/U.S. Virgin Islands// North America/USA/Virgin Islands/
North America/U.S. Virgin Islands/U.S. Virgin Islands/ North America/USA/Virgin Islands/
North America/U.S. Virgin Islands/St. Thomas/ North America/USA/Virgin Islands/St. Thomas
North America/U.S. Virgin Islands/St. Croix/ North America/USA/Virgin Islands/St. Croix
South America/British Virgin Islands// Europe/United Kingdom/British Virgin Islands/
South America/British Virgin Islands/British Virgin Islands/ Europe/United Kingdom/British Virgin Islands/
South America/British Virgin Islands/Caribbean/ Europe/United Kingdom/British Virgin Islands/
If we want to special case zika to use USVI, I think this can be done with the local geolocation rule:
North America/USA/Virgin Islands/* North America/USVI/USVI/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that we want to label all "Virgin Islands" records as "USVI" not just the ones associated with this https://github.com/blab/zika-usvi/ study?
If so, is there a reason the zika build should use "USVI" while ncov, mpox, and other pathogens use "Virgin Islands"? I'm happy to add it here, just thinking through if this is a good design decision.
Alternatively, I'm happy to reverse this change and use "Virgin Islands" instead, as suggested by the 2nd solution here as being the better solution going forward.
- Change the spiked-in sequences to "Virgin Islands" to match GenBank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's my understanding based upon the language used in the paper and the country strings we were using while maintaining the Zika tree during the outbreak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all! Switched to the local geolocation rule, and tried to summarize the rational in the commit message: 927222f
Please let me know if you catch anything else
6215812
to
927222f
Compare
Label all "Virgin Islands" records as "USVI" to be consistent with text from publication: Black, A. et al., 2017. Genetic characterization of the Zika virus epidemic in the US Virgin Islands. bioRxiv, p.113100. https://www.biorxiv.org/content/10.1101/113100v2
927222f
to
04387cd
Compare
Description of proposed changes
Use a consistent country name across usvi samples both submitted to genbank and not yet.
Related issue(s)
Checklist