-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assign genotypes using Nextclade dataset and visualize on tree #36
Conversation
ingest/rules/nextclade.smk
Outdated
--name={params.dataset_name:q} \ | ||
--output-zip={output.dataset} \ | ||
--verbose \ | ||
--server=https://data.master.clades.nextstrain.org/v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it's worth parameterizing this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the nextclade dataset is officially released, the --server
argument will be completely removed, and so I think it's better not to parameterize this, unless I'm misunderstanding your comment.
Looks great to me. In testing, this just worked locally as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of non-blocking suggestions, but this LGTM!
privateNucMutations.reversionSubstitutions private_reversionSubstitutions | ||
privateNucMutations.labeledSubstitutions private_labeledSubstitutions | ||
privateNucMutations.unlabeledSubstitutions private_unlabeledSubstitutions | ||
privateNucMutations.totalReversionSubstitutions private_totalReversionSubstitutions | ||
privateNucMutations.totalLabeledSubstitutions private_totalLabeledSubstitutions | ||
privateNucMutations.totalUnlabeledSubstitutions private_totalUnlabeledSubstitutions | ||
privateNucMutations.totalPrivateSubstitutions private_totalPrivateSubstitutions | ||
qc.snpClusters.clusteredSNPs private_SNPclusters | ||
qc.snpClusters.totalSNPs private_totalSNPclusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight nitpick here to either use all camelCase or all snake_case for the column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 356f569
nextclade="results/nextclade.tsv", | ||
alignment="results/alignment.fasta", | ||
translations="results/translations.zip", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are outputing these files, should they get uploaded to S3 in our automated workflow?
This should just involve adding them to the files_to_upload
config parameter in ingest/build-configs/nextstrain-automation/config.yaml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nextclade? Do we do this for other pathogens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we upload some combination of Nextclade output files in ncov-ingest and mpox/ingest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course! I was reading this as the nextclade dataset creation outputs but it's not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8c981b1
Call genotypes using Nextclade dataset and add them to the metadata output of ingest. After the measles Nextclade dataset is officially released, we can remove `--server=https://data.master.clades.nextstrain.org/v3` from the `get_nextclade_dataset` rule.
1. Add Nextclade genotype calls as coloring in auspice 2. Set Nextclade genotypes as the default coloring 3. Use auspice config parameter naming and ordering schemes that match those of the Nextclade dataset tree, following changes made here: 07bf737
The changes made to the ingest workflow in a previous commit result in new columns of data being added to the metadata output of ingest. This commit adds those columns to the metadata of the example data in the phylogenetic workflow.
The nextclade dataset has now been released, and so we no longer need to point to the master version of the dataset server.
47441f6
to
31a3fdf
Compare
Description of proposed changes
This PR does the following:
The Nextclade dataset is currently on the "master" version of Nextclade dataset server. After it is released, we can remove
--server=https://data.master.clades.nextstrain.org/v3
from theget_nextclade_dataset
rule.This PR addresses #32
Related issue(s)
#32
Checklist
--server=https://data.master.clades.nextstrain.org/v3
from theget_nextclade_dataset
rule after measles Nextclade dataset is released