Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow assignment cache to be saved to and loaded from datadir #444

Merged

Conversation

pvanheus
Copy link
Contributor

@pvanheus pvanheus commented May 5, 2022

  • Support download assignment cache to datadir and using from datadir
  • Switch to using pip for all installs (thanks for tip from Wolfgang Maier)
  • Change logic for finding constellations files to use latest version

pvanheus added 3 commits May 5, 2022 11:37
* Switch to using pip for all installs (thanks for tip from Wolfgang Maier)
* Change logic for finding constellations files to use latest version
Copy link
Contributor

@wm75 wm75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool work @pvanheus !

pangolin/utils/update.py Outdated Show resolved Hide resolved
pangolin/utils/initialising.py Outdated Show resolved Hide resolved
pangolin/utils/initialising.py Outdated Show resolved Hide resolved
@wm75
Copy link
Contributor

wm75 commented May 5, 2022

One general remark: I know you didn't touch this in this PR, but I'm somewhat unhappy with the earlier decision to accept a --datadir only if the pango data in it is newer than the package-level one.

I think if a user specifies --datadir, they are fully responsible for its contents. If they want to use an older version of data, let them do so. In fact, for Galaxy the current version check creates some extra effort because with data managers we want to be able to enforce use of an old data version for reproducibility of prior jobs. Right now we're cheating by prepending the datadir version with a "v" to make it appear more recent to the check:
https://github.com/galaxyproject/tools-iuc/blob/bfc2dbccaec2ebeaf235ac833edbcead2758d792/data_managers/data_manager_pangolearn/data_manager/pangolearn_dm.py#L77
;)

With the changes in this PR, it could also make sense to use whatever data components are found in --datadir even if there is no pangolin-data in it. E.g. a user might want to use only a datadir constellations, but go with the installed versions of everything else.

@AngieHinrichs
Copy link
Member

AngieHinrichs commented May 5, 2022

Thanks so much for this @pvanheus!

I agree with @wm75's comments (yikes about prepending the "v"). I think it would be fine to simply accept what the user provides in datadir; also fine to warn the user if the datadir version is older than the installed version; but heavy-handed to ignore datadir if the version is equal to or less than the installed version, and not OK to ignore without warning the user, as I think is done for constellations unless I'm missing something.

@aineniamh
Copy link
Member

I also agree, I can see cases where people would want to use historical data. My only concern is that there isn't always going to be backwards compatibility, but I guess if it finds the correct files then it'll work!

@pvanheus
Copy link
Contributor Author

pvanheus commented May 6, 2022

I also agree, I can see cases where people would want to use historical data. My only concern is that there isn't always going to be backwards compatibility, but I guess if it finds the correct files then it'll work!

I have added a flag (--use_old_datadir) to allow data to be used from datadir even if it is older than the pre-installed data. This lets people use custom versions if they like and otherwise get the latest version.

In terms of backwards compatibility, can a __schema_version__ be added to the various data modules (pangolin_data, constellations and pangolin_assignments) that can be incremented when the data schema changes?

Copy link
Member

@AngieHinrichs AngieHinrichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the changes, @pvanheus, this is looking much cleaner now!

I have one more question and a couple nitpicky requests, but it's looking really good now with the more uniform version-finding and pip usage.

pangolin/command.py Outdated Show resolved Hide resolved
pangolin/utils/update.py Outdated Show resolved Hide resolved
pangolin/utils/update.py Show resolved Hide resolved
@AngieHinrichs AngieHinrichs merged commit bafc1d8 into cov-lineages:master May 7, 2022
@AngieHinrichs
Copy link
Member

This looks terrific now, thanks so much Peter for all of your work on this and Wolfgang for the helpful tips!

@aineniamh
Copy link
Member

Thanks so much @pvanheus, @wm75 and @AngieHinrichs! I think this will be really useful for pangolin users. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants