Install pangolin-assignment from UCSC download server instead of github #430
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Up to this point, all data dependencies have been github cov-lineages repositories. The cache file in pangolin-assignment exceeded the github file size limit so we changed the pangolin-assignment repository to use git-lfs. Thanks @pvanheus for pointing out that github has storage and bandwidth quotas for Git LFS usage, and that by default the pangolin-assignment release tarball from github does not include the cache file; it can be added to the release tarball, but will count further against the storage and bandwidth quotas.
Since the cache file is generated at UCSC which has ample web server storage and bandwidth, this adds a new mechanism to search for the latest versioned tarball in a web directory (instead of querying the github API), compare its version to the locally installed package if present (using the same pip/
__init__.py
__version__
mechanism), and install the tarball from the web directory (instead of github).Synchronizing release with pangolin-data
Ideally, releases of pangolin-data and pangolin-assignment will be synchronized, i.e. released at the same time with the same pango-designation version. When both are github repositories, that is pretty straightforward to manage using the github interface. However, when pangolin-assignment changes to a UCSC web directory, "release" for pangolin-assignment means that a new file is copied into https://hgdownload.gi.ucsc.edu/goldenPath/wuhCor1/pangolin-assignment/ (which usually must be done by sys admins at UCSC although I can request special privileges to add files there). If this PR is merged, then @aineniamh and I will have to coordinate more closely to make sure that updates to pangolin-data and pangolin-assignment are released at the same time.