Connect ingest and phylogenetic workflows to follow pathogen-repo-guide #19

kimandrews · 2024-03-11T18:25:47Z

Description of proposed changes

Connect ingest and phylogenetic workflows, following the pathogen-repo-guide:

Upload ingest output to S3
Download ingest output from S3 to phylogenetic directory
Use accession column as ID column for phylogenetic analysis (addresses this github issue: Use accession number in phylogenetic workflow #11)
Add color scheme matching the new region name format

This PR focuses on making changes that are necessary to move forward with creating a locus-specific workflow in a later PR.

Related issue(s)

#11

Checklist

Checks pass

jameshadfield

Thanks @kimandrews!

I skimmed through the code changes and they look alright, but didn't review in depth. Thanks for uploading the new metadata/sequences to S3.

I ran through the phylogenetic workflow and it works as expected. The steps to upload the dataset to s3://nextstrain-data (i.e. to make it live) aren't yet here, but with the new ingest & phylogenetics being part of the master branch I think the new dataset should be made live. You could do this manually now and add the code in a later PR, but we should try to keep the pipeline in-sync with the live dataset.

Auspice flags the following error, which is not part of your PR but should be fixed at some point:

[Genome annotation] V has length 899 which is not a multiple of 3

An aside I noticed here: We should allow augur tree and augur align to run with more than one thread. You can see an example of how to connect snakemake and augur's --nthreads argument here.

joverlee521

Changes here look good to me, I've only left a couple of non-blocking comments.

Feel free to merge and get other fixes in via separate PRs.

phylogenetic/defaults/config.yaml

phylogenetic/example_data/metadata.tsv

The same samples are used for example data as were previously used, but now they have been pulled from the ingest output. Ambiguous dates were manually fixed in the example data for samples JN635406.1, JN635408.1, and EF565859.1, based on dates found here: https://github.com/nextstrain/fauna/blob/dd20a1fad51433e0bdc206f4300635c0f93f8599/source-data/measles_date_fix.tsv

kimandrews added 2 commits March 8, 2024 15:52

Upload ingest output to S3

385b8d7

Download ingest output from S3

71a2744

kimandrews requested a review from a team March 11, 2024 18:28

jameshadfield reviewed Mar 11, 2024

View reviewed changes

joverlee521 approved these changes Mar 12, 2024

View reviewed changes

phylogenetic/defaults/config.yaml Outdated Show resolved Hide resolved

phylogenetic/example_data/metadata.tsv Outdated Show resolved Hide resolved

joverlee521 mentioned this pull request Mar 14, 2024

ingest/build-configs/nextstrain-automation: Add README nextstrain/pathogen-repo-guide#36

Merged

kimandrews force-pushed the use-new-ingest-output branch from 51c63ec to 5390f10 Compare March 14, 2024 21:47

kimandrews added 2 commits March 14, 2024 16:21

Add color scheme for new region name format

9e09647

kimandrews force-pushed the use-new-ingest-output branch from 5390f10 to 9e09647 Compare March 14, 2024 23:34

kimandrews added 2 commits March 15, 2024 16:50

Update maintainers

fc59560

Update Changelog

2cce6c9

kimandrews merged commit 0855c99 into main Mar 16, 2024
32 checks passed

kimandrews deleted the use-new-ingest-output branch March 16, 2024 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect ingest and phylogenetic workflows to follow pathogen-repo-guide #19

Connect ingest and phylogenetic workflows to follow pathogen-repo-guide #19

kimandrews commented Mar 11, 2024

jameshadfield left a comment •

edited

Loading

joverlee521 left a comment

Connect ingest and phylogenetic workflows to follow pathogen-repo-guide #19

Connect ingest and phylogenetic workflows to follow pathogen-repo-guide #19

Conversation

kimandrews commented Mar 11, 2024

Description of proposed changes

Related issue(s)

Checklist

jameshadfield left a comment • edited Loading

Choose a reason for hiding this comment

joverlee521 left a comment

Choose a reason for hiding this comment

jameshadfield left a comment •

edited

Loading