Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

Merged
merged 24 commits into from
Apr 17, 2024

Conversation

jwokaty
Copy link
Contributor

@jwokaty jwokaty commented Mar 29, 2024

I saw the comment about data being stored in GitHub, so I tried to set up the patterns used in https://github.com/waldronlab/bugsigdbr/blob/devel/R/bugsigdb.R to import by a Zenodo DOI by default but also have the capability to download devel and GitHub hashes. It's currently set to the hash of the prerelease, which shouldn't be in the merged code but we can use that to evaluate the release. Before this is merged, the release should be made on Zenodo and at least that hash should be changed to the Zenodo DOI. Then I think we could ask someone to assess if the package can be evaluated again.

Probably it should give attribution that this draws from code in bugsigdbr and maybe tests could be written to ensure that it can download version is "devel", a GitHub hash, or a Zenodo DOI.

@jwokaty jwokaty requested a review from sdgamboa March 29, 2024 13:35
@jwokaty jwokaty changed the title Set up import from Zenodo, GitHub hash; remove .downloadZ WIP Set up import from Zenodo, GitHub hash; remove .downloadZ Mar 29, 2024
@sdgamboa
Copy link
Contributor

@jwokaty, thanks!

This URL is not being recognized: https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv

library(bugphyzz)
bp <- importBugphyzz(version = "d3fd894", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#>   web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   local file path: '/home/user/.cache/R/bugphyzz/a41c3436b429_bugphyzz_multistate.csv'
#>   reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#>   rid: BFC31
#>   fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings()
bp <- bugphyzz:::.downloadResource(version = "devel", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#>   web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   local file path: '/home/user/.cache/R/bugphyzz/a41c9912ae5_bugphyzz_multistate.csv'
#>   reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#>   rid: BFC32
#>   fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings()

Created on 2024-03-29 with reprex v2.1.0

@sdgamboa
Copy link
Contributor

Also, I noticed a few differences between the typicialMicrobiomeSignaturesExports
and the bugsigdExports.

The first is a zip file, which I think is created automatically with each new release. The second one has individual files which I think are added manually after a release is made in bugsigdbExports (is this correct?).

I think bugphyzzExports has been set as the first case, so probably we'll need to download a zip instead of individual files.

@jwokaty
Copy link
Contributor Author

jwokaty commented Mar 29, 2024

Thanks for noticing this and the issue with the multistate file. I don't remember how TMSE was created. I've been manually creating the BugSigDBExport releases because it didn't work automatically. I will add back the functionality that was in .downloadZ and fix the issue with the multistate file.

@sdgamboa
Copy link
Contributor

@jwokaty, I think the github API could be used for devel or hash:

gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
    base_url <- 
        "https://api.github.com/repos/waldronlab/bugphyzzExports/contents/"
    if (version != "devel") {
        hash <- paste0("?ref=", version)
        base_url <- paste0(base_url, hash)
    }
    req <- httr2::request(base_url = base_url) |> 
        httr2::req_headers("Accept" = "application/vnd.github.raw+json")
    res <- httr2::req_perform(req = req)
    l <- httr2::resp_body_json(res)
    urls <- purrr::map_chr(l, ~ .x[["_links"]][["html"]]) |> 
        purrr::discard(is.na)
    urls |> 
        grep(
            pattern = "bugphyzz_(multistate|binary|numeric)\\.csv",
            x = _, value = TRUE
        ) |> 
        sub("/blob/", "/raw/", x = _)
}

links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv"
temp_file <- tempfile()
download.file(url = links[[1]], destfile = temp_file)
dat <- read.csv(temp_file, skip = 1)
dplyr::glimpse(dat)
#> Rows: 327,877
#> Columns: 11
#> $ NCBI_ID                <int> 1042312, 1042312, 117743, 117743, 117747, 11774…
#> $ Taxon_name             <chr> "Armatimonadia", "Armatimonadia", "Flavobacteri…
#> $ Rank                   <chr> "class", "class", "class", "class", "class", "c…
#> $ Attribute              <chr> "animal pathogen", "animal pathogen", "animal p…
#> $ Attribute_value        <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, T…
#> $ Evidence               <chr> "asr", "asr", "asr", "asr", "asr", "asr", "asr"…
#> $ Frequency              <chr> "sometimes", "rarely", "sometimes", "rarely", "…
#> $ Score                  <dbl> 0.6085182, 0.3914818, 0.7517647, 0.2482353, 0.3…
#> $ Attribute_source       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Confidence_in_curation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Attribute_type         <chr> "binary", "binary", "binary", "binary", "binary…

Created on 2024-03-29 with reprex v2.1.0

@sdgamboa
Copy link
Contributor

Or maybe no need for the API.

gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
    if (version == "devel") {
        version <- "main"
    } else {
        version <- version
    }
    file_suffix <- c("binary", "multistate", "numeric")
    paste0("https://github.com/waldronlab/bugphyzzExports/raw/",
           version, "/bugphyzz_", file_suffix, ".csv"
    )
}
links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv"

Created on 2024-03-29 with reprex v2.1.0

@jwokaty
Copy link
Contributor Author

jwokaty commented Mar 29, 2024

We can do what is easier or what you prefer. Maybe there is more stability with the api? How about we change main to devel in the Exports repository so that we don't have to change the code? I'll change the github action.

@sdgamboa
Copy link
Contributor

Yeah, sounds good changing main to devel in bugphyzzExports. I think the URLs could be stable if the name of main is changed to devel. We could try without the API and see how things work out.

@sdgamboa
Copy link
Contributor

sdgamboa commented Apr 1, 2024

I added the code discussed above in this commit 54949ee. importBugphyzz can use a hash, devel, or zenodo (zenodo not tested yet) for downloading. I already have some tests for importBugphyzz, which should ensure the output is correct: https://github.com/waldronlab/bugphyzz/blob/devel/tests/testthat/test-importBugphyzz.R. I'll add some more. @jwokaty, are these the tests you use for bugsigdb?

@jwokaty
Copy link
Contributor Author

jwokaty commented Apr 1, 2024

@sdgamboa Great! Yes, I was going to suggest you add to this PR.

For bugsigdbr, I ran all the tests available in the package. Since we can now import data, we can run all the tests in the bugphyzz package.

R/bugphyzz.R Outdated Show resolved Hide resolved
R/bugphyzz.R Outdated Show resolved Hide resolved
inst/extdata/README.md Outdated Show resolved Hide resolved
@jwokaty
Copy link
Contributor Author

jwokaty commented Apr 2, 2024

I'm adding a Github action workflow to test the package. When running the tests, I noticed that growth temp, coding genes, and genome size have NAs so they fail the tests in 170 and 180 in tests/testthat/test-importBugphyzz.R: expect_true(all(map_lgl(bp, checkNAs))).

(I can probably remove the pkgdown action since the bioc-check action can also do it.)

@jwokaty
Copy link
Contributor Author

jwokaty commented Apr 3, 2024

bugphyzz is passing tests on linux and windows. The mac environment is using R 4.5 so we can disregard it. I'm going to remove the pkgdown workflow as I mentioned. (I am testing on my fork.)

If you think bugphyzzExports is in good shape, maybe we should consider making the Zenodo release, setting the version to the DOI, and merging this PR. Then you could follow up with Lori that changes have been made to use data from Zenodo so that bugphyzz can be assigned a reviewer. We can also talk about it at the lab meeting on Thursday.

@sdgamboa
Copy link
Contributor

sdgamboa commented Apr 3, 2024

@jwokaty, thanks for setting the GAs. I'm making some final reviews and then we can release the first version to zenodo.

@sdgamboa sdgamboa merged commit 4b18dc3 into devel Apr 17, 2024
6 of 8 checks passed
@sdgamboa sdgamboa deleted the import-other-resources branch April 19, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants