Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update zika instructions #147

Merged
merged 3 commits into from
Nov 29, 2023
Merged

Update zika instructions #147

merged 3 commits into from
Nov 29, 2023

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Nov 21, 2023

Description of proposed changes

The old instructions were written for ViPR which became obsolete and was replaced by BV-BRC. The old instructions no longer work and we have since adopted NCBI datasets for downloading sequences and metadata files.

The filtering by length and collection year are already part of the phylogenetic build steps so are no longer a consideration during ingest.

This PR updates our instructions on how to ingest recent zika data and push to the nextstrain data endpoint. This PR also points to the current phylogenetic build steps.

Related issue(s)

Checklist

  • Checks pass

Sorry, something went wrong.

@j23414 j23414 requested a review from a team November 21, 2023 23:55
--fstem zika
git clone https://github.com/nextstrain/zika.git
cd zika
git checkout persephone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm out of the loop. What's persephone and why use it instead of the default branch?

Copy link
Contributor Author

@j23414 j23414 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The reasoning was that the default branch doesn't include any ingest rules. Since we're still in the midst of designing the golden path but to meet the need to keep our zika build up-to-date, I created a persephone branch to "live long enough to update the build" but will be replaced by the official path in the future. The persephone branch also includes changes to the build parameters (e.g. indexing by "accession" instead of strain name) which were a necessary divergence from the default branch.

This PR is documenting the "reality" of updating the build, but—I agree—should point at the default branch when the default branch includes ingest and subsequent build parameter modifications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. Was the latest update to nextstrain.org/zika made using persephone or the default branch? If using persephone, I'd think to merge that into the default branch even if the golden path is not yet set in stone. Otherwise, what's seen at github.com/nextstrain/zika would be outdated.

Copy link
Contributor Author

@j23414 j23414 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest update was indeed using persephone, the old method (ViPR) doesn't work. The changes to ingest and phylogenetic are here:

Feel free to glance through, I’m certainly willing to submit a persephone PR but am certain it will encounter a block, given that some of the modifications overlap with an ongoing Dengue PR that is currently delayed and unmerged (nextstrain/dengue#13). Additionally, there are specific changes related to zika, such as the inclusion of fauna-zika-data processing steps and the merging of USVI data. And given that some people might be pointed to zika as a “template for new pathogens”, it might be more prudent to cherry-pick generally accepted changes from persephone as smaller PRs later. (Open to suggestions on how to split these out.)

The current github.com/nextstrain/zika is outdated but so is the current github.com/nextstrain/dengue .

You make good points! (And may yet be incorporated in subsequent PRs.) But I'd still scope this fauna PR as replacing the non-working ViPR-ingest documentation with some currently working zika pipeline documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m certainly willing to submit a persephone PR but am certain it will encounter a block

I'm sorry if the review process has been a hindrance to your work! I did not realize you were running builds off of branches. I would make a PR and merge it so that the default branch is not outdated. We can do post-merge reviews and incremental updates later. For reproducibility and keeping everyone on the same page, I think it's prudent that the production builds are made using the default branch.

Copy link
Contributor Author

@j23414 j23414 Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that some people might be pointed to zika as a “template for new pathogens”, it might be more prudent to cherry-pick generally accepted changes from persephone as smaller PRs later

I still think we can cherry-pick the phylogenetic modifications in smaller PRs later, the review process is fine as long as the live builds can be updated concurrently and those steps are documented somewhere.

Could I at least merge this fauna PR? This way we have appropriately documented the status quo?

I can submit a persephone zika PR separately but don’t want it to be a pre-requisite to this PR. And if persephone is eventually merged, I'm happy to make the subsequent change to builds/ZIKA.md by basically deleting a few persephones from link urls and removing a git checkout statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I at least merge this fauna PR? This way we have appropriately documented the status quo?

Sure, I'm not against merging this. For future reference, I think it would be less likely to get out-of-sync and/or forgotten if documented within the zika repo itself.

builds/ZIKA.md Outdated Show resolved Hide resolved
j23414 and others added 3 commits November 28, 2023 11:23
The old instructions were written for ViPR which became obsolete and was replaced
by BV-BRC. The old instructions no longer work and we have since moved to using
NCBI datasets for downloading sequences and metadata files.

The filtering steps are already part of the phylogenetic build steps so are
no longer a consideration during ingest. Point team members to how to ingest
recent zika data and push to the nextstrain data endpoint. Point team members
to the current phylogenetic build steps.
Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new instructions no longer use or refer to fauna, I'd be inclined to just add link to the zika repo's README and add the detailed instructions there. This way we only have to maintain a single set of instructions.

@j23414
Copy link
Contributor Author

j23414 commented Nov 28, 2023

Agree, my rewording basically refer to the READMEs in zika/ingest and zika/phylogenetic:

After thinking about the audience and purpose of this document, I envisioned a new hire pointed at builds/ZIKA.md to update our public zika build. The purpose of the page is to provide a soft on-boarding for a specific use case: allowing an internal person to update our public builds.

Therefore, the instructions should walk that new hire through pulling the appropriate branch, ingesting the data, uploading that data, and updating the build. Each section provides commands they can copy and paste, while also linking to the official READMEs for more detail.

In comparison, the official zika README is more general purpose, for an external user to run their own zika builds or modify it for their pathogen or biological question of interest.

@j23414 j23414 merged commit 656100e into master Nov 29, 2023
@j23414 j23414 deleted the update_zika branch November 29, 2023 01:55
@victorlin
Copy link
Member

victorlin commented Nov 29, 2023

This PR has a well-defined small scope – to update outdated instructions – and I think great to have it merged. The outdated instructions prompted two broader discussion points which we'll continue on Slack:

  1. Should we impose a rule that live builds (e.g. nextstrain.org/zika) must reflect the outcome from running on the default branch? This has been a personal assumption of mine but I can see how live builds might want to benefit from newer practices that have not yet gained consensus through peer review. (Slack thread)
  2. Where should we put instructions to run the core builds? Currently the majority are under nextstrain/fauna/builds, but that is becoming less accurate/useful and more historical (fauna/builds/NCOV.md is obviously outdated, and now fauna/builds/ZIKA.md is unrelated to fauna). (Slack thread)

@j23414 j23414 mentioned this pull request Jan 19, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants