Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotate mutations between the inferred root and the reference sequence #296

Closed
jameshadfield opened this issue Nov 20, 2024 · 6 comments
Closed
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

Feature request via WA-DoH (via slack): annotating mutations between the inferred root and the reference sequence is necessary for datasets to be used within Nextclade. Without this the masking leads to a the inferred root sequence having large runs of As which lead to incorrect mutations called against new sequences within Nextclade.

Adding a config flag which allows workflows to toggle this on/off is a solution.

Here's a diff of the changes (without the ability for a config to modulate this):

--- a/phylogenetic/rules/annotate_phylogeny.smk
+++ b/phylogenetic/rules/annotate_phylogeny.smk
@@ -27,6 +27,7 @@ rule ancestral:
     input:
         tree=build_dir + "/{build_name}/tree.nwk",
         alignment=build_dir + "/{build_name}/masked.fasta",
+        reference=config["reference"],
     output:
         node_data=build_dir + "/{build_name}/nt_muts.json",
     params:
@@ -36,6 +37,7 @@ rule ancestral:
         augur ancestral \
             --tree {input.tree} \
             --alignment {input.alignment} \
+            --root-sequence {input.reference} \
             --output-node-data {output.node_data} \
             --inference {params.inference}
         """
@jameshadfield jameshadfield added the enhancement New feature or request label Nov 20, 2024
@joverlee521
Copy link
Contributor

Adding a config flag which allows workflows to toggle this on/off is a solution.

I can add the config option to make this togglable in the workflow.

However, it's not clear to me if we need to be using this option for Nextstrain builds to support Nextclade use when the official Nextclade datasets already exist.

I just checked and the core all-clades build errors out on Nextclade

Screenshot 2024-11-20 at 3 35 16 PM

and the core clade-IIb build shows the extra mutations

Screenshot 2024-11-20 at 3 37 24 PM

@DOH-SML1303
Copy link

Hello! While I'm trying to maintain a WA-focused mpox build, I was wondering if you had an ETA for when this feature would be available? I don't think we need to rush; I'm just trying to figure out the best way to be able to keep the changes we made to the snakefile while also being able to pull new changes in order to maintain the build while we wait for this feature to become available

@joverlee521
Copy link
Contributor

Hey @DOH-SML1303, I've made the changes in #297. I'll wait for feedback and will most likely merge it later next week.

@DOH-SML1303
Copy link

Sounds good, thank you so much!

@joverlee521
Copy link
Contributor

Hi @DOH-SML1303, the PR has been merged! If you pull down the latest master branch, you can use the ancestral_root_seq config param to specify the root sequence file to pass to augur ancestral.

Here is an example in the CI workflow config:

reference: "defaults/reference.fasta"
ancestral_root_seq: "defaults/reference.fasta"

@DOH-SML1303
Copy link

Hi @joverlee521, amazing! Thank you so much. I will pull this down and run the build this week and let you know if I have any problems. I appreciate you all working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants