-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid values #229
Comments
I think these come from the output of bacdiveR. @jwokaty, I've been using this spreadsheet, is there a newer version? Those from "biosafety level" seem to be incorrect parsing. I'll add the remaining values to the extdata/attributes.tsv file. |
@sdgamboa I've created a new spreadsheet and it seems that the biosafety level, country, and geographic location appear to be formatted correctly; however, I have not yet replaced the BacDive sheet yet. I wanted to give you the opportunity to look at it first: https://docs.google.com/spreadsheets/d/1P4Ic6-N9GVXcX1CdfoamFt6eozfHqt-sxfIRTBvYHWk/edit?usp=sharing. If it looks good, I want to upload it as a new version to the BacDive document. |
@jwokaty, thanks! Values for biosafety level seem fine now and I no longer get 'X' columns when parsing the file. I added the url to this code: Lines 21 to 29 in ed8b40f
library(bugphyzz)
bl <- physiologies('biosafety level')[[1]]
#> Finished biosafety level.
#> Warning: Missing columns in biosafety level. Missing columns are: Genome_ID,
#> Accession_ID
unique(bl$Attribute)
#> [1] "biosafety level 1" "biosafety level 2" "biosafety level 3"
#> [4] "biosafety level 1+" "biosafety level 3**" "biosafety level L1" Created on 2023-09-20 with reprex v2.0.2 |
@sdgamboa I'm glad that it's working better. I think that we should use the original URL as we can make use of Google Sheet versioning. It only keeps a version history of 30 days but it will allow us to upload a new version without changing the URL in bugphyzz. |
@jwokaty, agreed. I'll switch back to the original URL when the spreadsheet gets updated. |
I've updated the google sheet! |
The following line in bugphyzzExports is identifying invalid values and dropping them. @sdgamboa please raise such curation issues here and discuss whether they should be resolved by correcting the invalid values, adding to the allowed vocabulary, or continuing to drop these values. For some, dropping certainly does seem like the right choice for ASR, but for others (like aerophilicity and shapes) I'm not so sure.
https://github.com/waldronlab/bugphyzzExports/blob/a9fc18914cb3b1d9ea3a3d1c0121ccac5c8d482a/inst/scripts/export_bugphyzz.R#L126
The text was updated successfully, but these errors were encountered: