Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign genotypes using Nextclade dataset and visualize on tree #36

Merged
merged 6 commits into from
Jun 7, 2024

Conversation

kimandrews
Copy link
Contributor

@kimandrews kimandrews commented Jun 5, 2024

Description of proposed changes

This PR does the following:

  1. Uses the Nextclade dataset during ingest to assign MeV genotypes to each sequence. Genotype assignments and other Nextclade output are added as columns to the metadata output.
  2. Adds MeV genotype coloring to the N450 and genome trees (https://nextstrain.org/measles/N450 and https://nextstrain.org/measles/genome).
  3. Updates the metadata for the phylogenetic workflow example data to match the new ingest metadata output format.

The Nextclade dataset is currently on the "master" version of Nextclade dataset server. After it is released, we can remove --server=https://data.master.clades.nextstrain.org/v3 from the get_nextclade_dataset rule.

This PR addresses #32

Related issue(s)

#32

Checklist

  • Remove --server=https://data.master.clades.nextstrain.org/v3 from the get_nextclade_dataset rule after measles Nextclade dataset is released
  • Checks pass

@kimandrews kimandrews requested a review from a team June 5, 2024 18:25
--name={params.dataset_name:q} \
--output-zip={output.dataset} \
--verbose \
--server=https://data.master.clades.nextstrain.org/v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it's worth parameterizing this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the nextclade dataset is officially released, the --server argument will be completely removed, and so I think it's better not to parameterize this, unless I'm misunderstanding your comment.

@trvrb
Copy link
Member

trvrb commented Jun 6, 2024

Looks great to me. In testing, this just worked locally as well.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of non-blocking suggestions, but this LGTM!

Comment on lines 20 to 28
privateNucMutations.reversionSubstitutions private_reversionSubstitutions
privateNucMutations.labeledSubstitutions private_labeledSubstitutions
privateNucMutations.unlabeledSubstitutions private_unlabeledSubstitutions
privateNucMutations.totalReversionSubstitutions private_totalReversionSubstitutions
privateNucMutations.totalLabeledSubstitutions private_totalLabeledSubstitutions
privateNucMutations.totalUnlabeledSubstitutions private_totalUnlabeledSubstitutions
privateNucMutations.totalPrivateSubstitutions private_totalPrivateSubstitutions
qc.snpClusters.clusteredSNPs private_SNPclusters
qc.snpClusters.totalSNPs private_totalSNPclusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight nitpick here to either use all camelCase or all snake_case for the column names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 356f569

Comment on lines +32 to +33
nextclade="results/nextclade.tsv",
alignment="results/alignment.fasta",
translations="results/translations.zip",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are outputing these files, should they get uploaded to S3 in our automated workflow?

This should just involve adding them to the files_to_upload config parameter in ingest/build-configs/nextstrain-automation/config.yaml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nextclade? Do we do this for other pathogens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we upload some combination of Nextclade output files in ncov-ingest and mpox/ingest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course! I was reading this as the nextclade dataset creation outputs but it's not!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8c981b1

Call genotypes using Nextclade dataset and add them to the metadata output of ingest. After the measles Nextclade dataset is officially released, we can remove `--server=https://data.master.clades.nextstrain.org/v3` from the `get_nextclade_dataset` rule.
1. Add Nextclade genotype calls as coloring in auspice
2. Set Nextclade genotypes as the default coloring
3. Use auspice config parameter naming and ordering schemes that match those of the Nextclade dataset tree, following changes made here: 07bf737
The changes made to the ingest workflow in a previous commit result in new columns of data being added to the metadata output of ingest. This commit adds those columns to the metadata of the example data in the phylogenetic workflow.
The nextclade dataset has now been released, and so we no longer need to point to the master version of the dataset server.
@kimandrews kimandrews force-pushed the use-remote-nextclade-dataset branch from 47441f6 to 31a3fdf Compare June 7, 2024 20:48
@kimandrews kimandrews merged commit dc3cd4b into main Jun 7, 2024
32 checks passed
@kimandrews kimandrews deleted the use-remote-nextclade-dataset branch June 7, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants