WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

jwokaty · 2024-03-29T13:27:17Z

I saw the comment about data being stored in GitHub, so I tried to set up the patterns used in https://github.com/waldronlab/bugsigdbr/blob/devel/R/bugsigdb.R to import by a Zenodo DOI by default but also have the capability to download devel and GitHub hashes. It's currently set to the hash of the prerelease, which shouldn't be in the merged code but we can use that to evaluate the release. Before this is merged, the release should be made on Zenodo and at least that hash should be changed to the Zenodo DOI. Then I think we could ask someone to assess if the package can be evaluated again.

Probably it should give attribution that this draws from code in bugsigdbr and maybe tests could be written to ensure that it can download version is "devel", a GitHub hash, or a Zenodo DOI.

sdgamboa · 2024-03-29T16:44:57Z

@jwokaty, thanks!

This URL is not being recognized: https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv

library(bugphyzz)
bp <- importBugphyzz(version = "d3fd894", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#>   web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   local file path: '/home/user/.cache/R/bugphyzz/a41c3436b429_bugphyzz_multistate.csv'
#>   reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#>   rid: BFC31
#>   fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings()
bp <- bugphyzz:::.downloadResource(version = "devel", force_download = TRUE)
#> Importing multistate data...
#> Downloading, bugphyzz_multistate.tsv.
#> Warning: download failed
#>   web resource path: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   local file path: '/home/user/.cache/R/bugphyzz/a41c9912ae5_bugphyzz_multistate.csv'
#>   reason: Not Found (HTTP 404).
#> Warning: bfcadd() failed; resource removed
#>   rid: BFC32
#>   fpath: 'https://media.github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv'
#>   reason: download failed
#> Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...): bfcadd() failed; see warnings()

^{Created on 2024-03-29 with reprex v2.1.0}

sdgamboa · 2024-03-29T16:58:17Z

Also, I noticed a few differences between the typicialMicrobiomeSignaturesExports
and the bugsigdExports.

The first is a zip file, which I think is created automatically with each new release. The second one has individual files which I think are added manually after a release is made in bugsigdbExports (is this correct?).

I think bugphyzzExports has been set as the first case, so probably we'll need to download a zip instead of individual files.

jwokaty · 2024-03-29T20:34:22Z

Thanks for noticing this and the issue with the multistate file. I don't remember how TMSE was created. I've been manually creating the BugSigDBExport releases because it didn't work automatically. I will add back the functionality that was in .downloadZ and fix the issue with the multistate file.

sdgamboa · 2024-03-29T21:16:36Z

@jwokaty, I think the github API could be used for devel or hash:

gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
    base_url <- 
        "https://api.github.com/repos/waldronlab/bugphyzzExports/contents/"
    if (version != "devel") {
        hash <- paste0("?ref=", version)
        base_url <- paste0(base_url, hash)
    }
    req <- httr2::request(base_url = base_url) |> 
        httr2::req_headers("Accept" = "application/vnd.github.raw+json")
    res <- httr2::req_perform(req = req)
    l <- httr2::resp_body_json(res)
    urls <- purrr::map_chr(l, ~ .x[["_links"]][["html"]]) |> 
        purrr::discard(is.na)
    urls |> 
        grep(
            pattern = "bugphyzz_(multistate|binary|numeric)\\.csv",
            x = _, value = TRUE
        ) |> 
        sub("/blob/", "/raw/", x = _)
}

links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv"
temp_file <- tempfile()
download.file(url = links[[1]], destfile = temp_file)
dat <- read.csv(temp_file, skip = 1)
dplyr::glimpse(dat)
#> Rows: 327,877
#> Columns: 11
#> $ NCBI_ID                <int> 1042312, 1042312, 117743, 117743, 117747, 11774…
#> $ Taxon_name             <chr> "Armatimonadia", "Armatimonadia", "Flavobacteri…
#> $ Rank                   <chr> "class", "class", "class", "class", "class", "c…
#> $ Attribute              <chr> "animal pathogen", "animal pathogen", "animal p…
#> $ Attribute_value        <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, T…
#> $ Evidence               <chr> "asr", "asr", "asr", "asr", "asr", "asr", "asr"…
#> $ Frequency              <chr> "sometimes", "rarely", "sometimes", "rarely", "…
#> $ Score                  <dbl> 0.6085182, 0.3914818, 0.7517647, 0.2482353, 0.3…
#> $ Attribute_source       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Confidence_in_curation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Attribute_type         <chr> "binary", "binary", "binary", "binary", "binary…

^{Created on 2024-03-29 with reprex v2.1.0}

sdgamboa · 2024-03-29T21:24:03Z

Or maybe no need for the API.

gh_hash <- "d3fd894"
.downloadGH <- function(version = "devel") {
    if (version == "devel") {
        version <- "main"
    } else {
        version <- version
    }
    file_suffix <- c("binary", "multistate", "numeric")
    paste0("https://github.com/waldronlab/bugphyzzExports/raw/",
           version, "/bugphyzz_", file_suffix, ".csv"
    )
}
links <- .downloadGH(version = "devel")
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/main/bugphyzz_numeric.csv"
links <- .downloadGH(version = gh_hash)
links
#> [1] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_binary.csv"    
#> [2] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_multistate.csv"
#> [3] "https://github.com/waldronlab/bugphyzzExports/raw/d3fd894/bugphyzz_numeric.csv"

^{Created on 2024-03-29 with reprex v2.1.0}

jwokaty · 2024-03-29T21:29:48Z

We can do what is easier or what you prefer. Maybe there is more stability with the api? How about we change main to devel in the Exports repository so that we don't have to change the code? I'll change the github action.

sdgamboa · 2024-03-29T21:53:32Z

Yeah, sounds good changing main to devel in bugphyzzExports. I think the URLs could be stable if the name of main is changed to devel. We could try without the API and see how things work out.

…m Zenodo (not tested yet).

sdgamboa · 2024-04-01T15:15:08Z

I added the code discussed above in this commit 54949ee. importBugphyzz can use a hash, devel, or zenodo (zenodo not tested yet) for downloading. I already have some tests for importBugphyzz, which should ensure the output is correct: https://github.com/waldronlab/bugphyzz/blob/devel/tests/testthat/test-importBugphyzz.R. I'll add some more. @jwokaty, are these the tests you use for bugsigdb?

jwokaty · 2024-04-01T16:16:00Z

@sdgamboa Great! Yes, I was going to suggest you add to this PR.

For bugsigdbr, I ran all the tests available in the package. Since we can now import data, we can run all the tests in the bugphyzz package.

R/bugphyzz.R

…ata, so it's included with the package

inst/extdata/README.md

jwokaty · 2024-04-02T01:38:25Z

I'm adding a Github action workflow to test the package. When running the tests, I noticed that growth temp, coding genes, and genome size have NAs so they fail the tests in 170 and 180 in tests/testthat/test-importBugphyzz.R: expect_true(all(map_lgl(bp, checkNAs))).

(I can probably remove the pkgdown action since the bioc-check action can also do it.)

jwokaty · 2024-04-03T00:01:11Z

bugphyzz is passing tests on linux and windows. The mac environment is using R 4.5 so we can disregard it. I'm going to remove the pkgdown workflow as I mentioned. (I am testing on my fork.)

If you think bugphyzzExports is in good shape, maybe we should consider making the Zenodo release, setting the version to the DOI, and merging this PR. Then you could follow up with Lori that changes have been made to use data from Zenodo so that bugphyzz can be assigned a reviewer. We can also talk about it at the lab meeting on Thursday.

sdgamboa · 2024-04-03T00:07:37Z

@jwokaty, thanks for setting the GAs. I'm making some final reviews and then we can release the first version to zenodo.

Set up import from Zenodo, GitHub hash; remove .downloadZ

8ec8443

jwokaty requested a review from sdgamboa March 29, 2024 13:35

jwokaty changed the title ~~Set up import from Zenodo, GitHub hash; remove .downloadZ~~ WIP Set up import from Zenodo, GitHub hash; remove .downloadZ Mar 29, 2024

jwokaty mentioned this pull request Mar 29, 2024

First release + release protocol waldronlab/bugphyzzExports#5

Closed

Add function for downloading specific hash from GH and devel, and fro…

54949ee

…m Zenodo (not tested yet).

add a few more tests for imrpotBugphyzz

ba67696

jwokaty commented Apr 1, 2024

View reviewed changes

R/bugphyzz.R Outdated Show resolved Hide resolved

jwokaty commented Apr 1, 2024

View reviewed changes

R/bugphyzz.R Outdated Show resolved Hide resolved

sdgamboa added 11 commits April 1, 2024 13:06

update test with checkNAs

47973b5

combine testing for devel and hash in a single if statement

cff6102

remove default version in unexported functions used by importBugphyzz

114eabd

update attributes talbe according to the tests

1451df3

update curation tests

57b2b2f

update test physiologies

2c69d8f

update README with TODOs, for other branches

79b8ed4

fix quotation in attributes.tsv

9c4325a

remove unnecessary LICENSE file

25c5fce

Validation data was hosted on github. The data was downloaded to extd…

e08266a

…ata, so it's included with the package

Remove line of reference to GitHub when importing validation data

30f8a92

jwokaty commented Apr 1, 2024

View reviewed changes

inst/extdata/README.md Outdated Show resolved Hide resolved

Add check-bioc

43872f6

sdgamboa and others added 3 commits April 2, 2024 11:37

add description for the files in extdata

606cedf

update hash

4477e42

Remove separate pkgdown workflow

d7d785b

sdgamboa added 6 commits April 4, 2024 22:35

Update hash in test

4040b28

fix indents and length of lines

338836a

update hash of github resource

701073c

update PICRUst2 reference for NSTI definition

2f3439d

update tests

8e189a9

update importBugphyzz and tests with Zenodo DOI

6c63337

sdgamboa merged commit 4b18dc3 into devel Apr 17, 2024
6 of 8 checks passed

sdgamboa deleted the import-other-resources branch April 19, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

jwokaty commented Mar 29, 2024 •

edited

Loading

sdgamboa commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

jwokaty commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

jwokaty commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

sdgamboa commented Apr 1, 2024 •

edited

Loading

jwokaty commented Apr 1, 2024

jwokaty commented Apr 2, 2024

jwokaty commented Apr 3, 2024 •

edited

Loading

sdgamboa commented Apr 3, 2024

WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

WIP Set up import from Zenodo, GitHub hash; remove .downloadZ #240

Conversation

jwokaty commented Mar 29, 2024 • edited Loading

sdgamboa commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

jwokaty commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

jwokaty commented Mar 29, 2024

sdgamboa commented Mar 29, 2024

sdgamboa commented Apr 1, 2024 • edited Loading

jwokaty commented Apr 1, 2024

jwokaty commented Apr 2, 2024

jwokaty commented Apr 3, 2024 • edited Loading

sdgamboa commented Apr 3, 2024

jwokaty commented Mar 29, 2024 •

edited

Loading

sdgamboa commented Apr 1, 2024 •

edited

Loading

jwokaty commented Apr 3, 2024 •

edited

Loading