Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install pangolin-assignment from UCSC download server instead of github #430

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

AngieHinrichs
Copy link
Member

Up to this point, all data dependencies have been github cov-lineages repositories. The cache file in pangolin-assignment exceeded the github file size limit so we changed the pangolin-assignment repository to use git-lfs. Thanks @pvanheus for pointing out that github has storage and bandwidth quotas for Git LFS usage, and that by default the pangolin-assignment release tarball from github does not include the cache file; it can be added to the release tarball, but will count further against the storage and bandwidth quotas.

Since the cache file is generated at UCSC which has ample web server storage and bandwidth, this adds a new mechanism to search for the latest versioned tarball in a web directory (instead of querying the github API), compare its version to the locally installed package if present (using the same pip/__init__.py __version__ mechanism), and install the tarball from the web directory (instead of github).

Synchronizing release with pangolin-data

Ideally, releases of pangolin-data and pangolin-assignment will be synchronized, i.e. released at the same time with the same pango-designation version. When both are github repositories, that is pretty straightforward to manage using the github interface. However, when pangolin-assignment changes to a UCSC web directory, "release" for pangolin-assignment means that a new file is copied into https://hgdownload.gi.ucsc.edu/goldenPath/wuhCor1/pangolin-assignment/ (which usually must be done by sys admins at UCSC although I can request special privileges to add files there). If this PR is merged, then @aineniamh and I will have to coordinate more closely to make sure that updates to pangolin-data and pangolin-assignment are released at the same time.

…git-lfs.

Up to this point, all data dependencies have been github cov-lineages repositories.  The cache file in pangolin-assignment exceeded the github file size limit so we changed the pangolin-assignment repository to use git-lfs.  Thanks @pvanheus for pointing out that github has storage and bandwidth quotas for Git LFS usage, and that by default the pangolin-assignment release tarball from github does not include the cache file; it can be added to the release tarball, but will count further against the storage and bandwidth quotas.
Since the cache file is generated at UCSC which has ample web server storage and bandwidth, this adds a new mechanism to search for the latest versioned tarball in a web directory (instead of querying the github API), compare its version to the locally installed package if present (using the same pip/__init__.py __version__ mechanism), and install the tarball from the web directory (instead of github).
Note: currently the URL for pangolin-assignment uses the hgdownload-test server; this will need to be changed to hgwdownload after some testing and before release.
…ssignment versions. There may be patch releases that make sense for pangolin-data but not pangolin-assignment (e.g. pangoLEARN patch update), and the suggestion to run --update-data is not helpful because that's how the versions came to be installed in the first place.
Also fix option name typo in github query exception message.
@AngieHinrichs AngieHinrichs requested a review from aineniamh April 12, 2022 19:36
@AngieHinrichs AngieHinrichs marked this pull request as draft May 4, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant