Allow the use of a custom datafiles in db_load_* functions #72

arendsee · 2024-01-24T01:11:32Z

Description

This commit adds a "path" keyword argument to the db_download_* files that allows the user to bypass the taxonomy database download step and use a custom local file.

Related Issue

Fixes #71, or at least provides a workaround.

Example

# This command will
#  1. download the NCBI taxonomy dump as "taxdmp.zip" using curl and a hard-coded URL
#  2. unzip the file and build the sqlite database
db_download_ncbi()

# This command bypasses the first download step. This allows
# the user to retrieve the data from a different source, modify it,
# or use a different tool to retrieve it (e.g., wget, rsync, or whatever).
db_download_ncbi(path="taxdmp.zip")

arendsee · 2024-01-24T02:31:25Z

The issue reported in #71 about curl timing out is happening in the tests here. It appears to be unrelated to my new code.

On my personal Linux machine this works:

> db_url <- 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip'
> db_path_file = "z.zip"
> curl::curl_download(db_url, db_path_file, quiet = TRUE)

This also works:

> taxizedb::db_download_ncbi()

Using libcurl 8.5.0, OpenSSL/3.2.0, R 4.3.2, and curl_5.2.0

On my server at work I encountered the same time out problem as the check here and @bergalu in #71.

stitam

Thanks @arendsee for opening this PR. I thought about the issue some more and suggested an alternative solution that does not require an additional argument. Just a thought, I'm okay with your implementation as well.

If you want to go with your version, then I think overwrite rules should be more explicit for file.copy(). Let me know what you think.

stitam · 2024-02-05T19:52:07Z

R/db_download.R

+    # download data
+    mssg(verbose, 'downloading...')
+    curl::curl_download(db_url, db_path_file, quiet = TRUE)
+  } else {


Thanks @arendsee for this. I am wondering if it is possible to implement a solution that does not require an extra argument. How do you feel about something like this?

if (!file.exists(db_path_file) | overwrite) { # download data mssg(verbose, 'downloading...') curl::curl_download(db_url, db_path_file, quiet = TRUE) }

This approach is more complex for users because it requires them to copy the manually downloaded file to the right location and rename it (we could rename db_path_file in line 50 to taxdmp.zip to eliminate the need to rename). On the other hand it resolves the issue without changing/complicating the user interface. Maybe we could add a note/guide in the documentation for those who want to use their own, manually downloaded database. What do you think of this alternative?

stitam · 2024-02-05T21:31:48Z

R/db_download.R

+    mssg(verbose, 'downloading...')
+    curl::curl_download(db_url, db_path_file, quiet = TRUE)
+  } else {
+    file.copy(path, db_path_file)


If there is already a taxdump.zip in the cache directory, then file.copy() will not do anything because the default behaviour of the function is overwrite = FALSE. However if we call db_download_ncbi() with overwrite = TRUE and a manually downloaded taxdump.zip, we might expect to overwrite the old cache file with the new, manually downloaded file. Maybe the overwrite argument should be called explicitly?

file.copy(path, db_path_file, overwrite = overwrite)

This way if db_download_ncbi() is called with overwrite = TRUE then the manually downloaded taxdump.zip will overwrite matching files in the cache directory.

Allow the use of a custom datafiles in db_load_* functions

a2cedd5

stitam requested changes Feb 5, 2024

View reviewed changes

KaiAragaki mentioned this pull request Mar 29, 2024

update ncbi endpoint #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the use of a custom datafiles in db_load_* functions #72

Allow the use of a custom datafiles in db_load_* functions #72

arendsee commented Jan 24, 2024

arendsee commented Jan 24, 2024

stitam left a comment

stitam Feb 5, 2024

stitam Feb 5, 2024

Allow the use of a custom datafiles in db_load_* functions #72

Are you sure you want to change the base?

Allow the use of a custom datafiles in db_load_* functions #72

Conversation

arendsee commented Jan 24, 2024

Description

Related Issue

Example

arendsee commented Jan 24, 2024

stitam left a comment

Choose a reason for hiding this comment

stitam Feb 5, 2024

Choose a reason for hiding this comment

stitam Feb 5, 2024

Choose a reason for hiding this comment