-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the use of a custom datafiles in db_load_* functions #72
base: master
Are you sure you want to change the base?
Conversation
The issue reported in #71 about curl timing out is happening in the tests here. It appears to be unrelated to my new code. On my personal Linux machine this works: > db_url <- 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip'
> db_path_file = "z.zip"
> curl::curl_download(db_url, db_path_file, quiet = TRUE) This also works: > taxizedb::db_download_ncbi() Using libcurl 8.5.0, OpenSSL/3.2.0, R 4.3.2, and curl_5.2.0 On my server at work I encountered the same time out problem as the check here and @bergalu in #71. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arendsee for opening this PR. I thought about the issue some more and suggested an alternative solution that does not require an additional argument. Just a thought, I'm okay with your implementation as well.
If you want to go with your version, then I think overwrite rules should be more explicit for file.copy()
. Let me know what you think.
# download data | ||
mssg(verbose, 'downloading...') | ||
curl::curl_download(db_url, db_path_file, quiet = TRUE) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arendsee for this. I am wondering if it is possible to implement a solution that does not require an extra argument. How do you feel about something like this?
if (!file.exists(db_path_file) | overwrite) {
# download data
mssg(verbose, 'downloading...')
curl::curl_download(db_url, db_path_file, quiet = TRUE)
}
This approach is more complex for users because it requires them to copy the manually downloaded file to the right location and rename it (we could rename db_path_file
in line 50 to taxdmp.zip
to eliminate the need to rename). On the other hand it resolves the issue without changing/complicating the user interface. Maybe we could add a note/guide in the documentation for those who want to use their own, manually downloaded database. What do you think of this alternative?
mssg(verbose, 'downloading...') | ||
curl::curl_download(db_url, db_path_file, quiet = TRUE) | ||
} else { | ||
file.copy(path, db_path_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is already a taxdump.zip
in the cache directory, then file.copy()
will not do anything because the default behaviour of the function is overwrite = FALSE
. However if we call db_download_ncbi()
with overwrite = TRUE
and a manually downloaded taxdump.zip
, we might expect to overwrite the old cache file with the new, manually downloaded file. Maybe the overwrite
argument should be called explicitly?
file.copy(path, db_path_file, overwrite = overwrite)
This way if db_download_ncbi()
is called with overwrite = TRUE
then the manually downloaded taxdump.zip
will overwrite matching files in the cache directory.
Description
This commit adds a "path" keyword argument to the
db_download_*
files that allows the user to bypass the taxonomy database download step and use a custom local file.Related Issue
Fixes #71, or at least provides a workaround.
Example