-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240
Conversation
@jwokaty, thanks! This URL is not being recognized: https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv library(bugphyzz)
bp <- importBugphyzz(version = "d3fd894", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#> web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#> local file path: '/home/user/.cache/R/bugphyzz/a41c3436b429_bugphyzz_multistate.csv'
#> reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#> rid: BFC31
#> fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#> reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings()
bp <- bugphyzz:::.downloadResource(version = "devel", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#> web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#> local file path: '/home/user/.cache/R/bugphyzz/a41c9912ae5_bugphyzz_multistate.csv'
#> reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#> rid: BFC32
#> fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#> reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings() Created on 2024-03-29 with reprex v2.1.0 |
Also, I noticed a few differences between the typicialMicrobiomeSignaturesExports The first is a zip file, which I think is created automatically with each new release. The second one has individual files which I think are added manually after a release is made in bugsigdbExports (is this correct?). I think bugphyzzExports has been set as the first case, so probably we'll need to download a zip instead of individual files. |
Thanks for noticing this and the issue with the multistate file. I don't remember how TMSE was created. I've been manually creating the BugSigDBExport releases because it didn't work automatically. I will add back the functionality that was in |
@jwokaty, I think the github API could be used for devel or hash: gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
base_url <-
"https://api.github.com/repos/waldronlab/bugphyzzExports/contents/"
if (version != "devel") {
hash <- paste0("?ref=", version)
base_url <- paste0(base_url, hash)
}
req <- httr2::request(base_url = base_url) |>
httr2::req_headers("Accept" = "application/vnd.github.raw+json")
res <- httr2::req_perform(req = req)
l <- httr2::resp_body_json(res)
urls <- purrr::map_chr(l, ~ .x[["_links"]][["html"]]) |>
purrr::discard(is.na)
urls |>
grep(
pattern = "bugphyzz_(multistate|binary|numeric)\\.csv",
x = _, value = TRUE
) |>
sub("/blob/", "/raw/", x = _)
}
links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv"
temp_file <- tempfile()
download.file(url = links[[1]], destfile = temp_file)
dat <- read.csv(temp_file, skip = 1)
dplyr::glimpse(dat)
#> Rows: 327,877
#> Columns: 11
#> $ NCBI_ID <int> 1042312, 1042312, 117743, 117743, 117747, 11774…
#> $ Taxon_name <chr> "Armatimonadia", "Armatimonadia", "Flavobacteri…
#> $ Rank <chr> "class", "class", "class", "class", "class", "c…
#> $ Attribute <chr> "animal pathogen", "animal pathogen", "animal p…
#> $ Attribute_value <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, T…
#> $ Evidence <chr> "asr", "asr", "asr", "asr", "asr", "asr", "asr"…
#> $ Frequency <chr> "sometimes", "rarely", "sometimes", "rarely", "…
#> $ Score <dbl> 0.6085182, 0.3914818, 0.7517647, 0.2482353, 0.3…
#> $ Attribute_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Confidence_in_curation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Attribute_type <chr> "binary", "binary", "binary", "binary", "binary… Created on 2024-03-29 with reprex v2.1.0 |
Or maybe no need for the API. gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
if (version == "devel") {
version <- "main"
} else {
version <- version
}
file_suffix <- c("binary", "multistate", "numeric")
paste0("https://github.com/waldronlab/bugphyzzExports/raw/",
version, "/bugphyzz_", file_suffix, ".csv"
)
}
links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv" Created on 2024-03-29 with reprex v2.1.0 |
We can do what is easier or what you prefer. Maybe there is more stability with the api? How about we change main to devel in the Exports repository so that we don't have to change the code? I'll change the github action. |
Yeah, sounds good changing main to devel in bugphyzzExports. I think the URLs could be stable if the name of main is changed to devel. We could try without the API and see how things work out. |
…m Zenodo (not tested yet).
I added the code discussed above in this commit 54949ee. importBugphyzz can use a hash, devel, or zenodo (zenodo not tested yet) for downloading. I already have some tests for importBugphyzz, which should ensure the output is correct: https://github.com/waldronlab/bugphyzz/blob/devel/tests/testthat/test-importBugphyzz.R. I'll add some more. @jwokaty, are these the tests you use for bugsigdb? |
@sdgamboa Great! Yes, I was going to suggest you add to this PR. For bugsigdbr, I ran all the tests available in the package. Since we can now import data, we can run all the tests in the bugphyzz package. |
…ata, so it's included with the package
I'm adding a Github action workflow to test the package. When running the tests, I noticed that growth temp, coding genes, and genome size have NAs so they fail the tests in 170 and 180 in tests/testthat/test-importBugphyzz.R: (I can probably remove the pkgdown action since the bioc-check action can also do it.) |
bugphyzz is passing tests on linux and windows. The mac environment is using R 4.5 so we can disregard it. I'm going to remove the pkgdown workflow as I mentioned. (I am testing on my fork.) If you think bugphyzzExports is in good shape, maybe we should consider making the Zenodo release, setting the version to the DOI, and merging this PR. Then you could follow up with Lori that changes have been made to use data from Zenodo so that bugphyzz can be assigned a reviewer. We can also talk about it at the lab meeting on Thursday. |
@jwokaty, thanks for setting the GAs. I'm making some final reviews and then we can release the first version to zenodo. |
I saw the comment about data being stored in GitHub, so I tried to set up the patterns used in https://github.com/waldronlab/bugsigdbr/blob/devel/R/bugsigdb.R to import by a Zenodo DOI by default but also have the capability to download devel and GitHub hashes. It's currently set to the hash of the prerelease, which shouldn't be in the merged code but we can use that to evaluate the release. Before this is merged, the release should be made on Zenodo and at least that hash should be changed to the Zenodo DOI. Then I think we could ask someone to assess if the package can be evaluated again.
Probably it should give attribution that this draws from code in bugsigdbr and maybe tests could be written to ensure that it can download version is "devel", a GitHub hash, or a Zenodo DOI.