Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur align --method nextclade should wrap nextclade run #1462

Closed
genehack opened this issue May 15, 2024 · 5 comments
Closed

augur align --method nextclade should wrap nextclade run #1462

genehack opened this issue May 15, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@genehack
Copy link
Contributor

Context

Slack context

Description

Quoting from Slack:

Along these lines it would be nice to update augur align to take --method nextclade to make swapping / upgrading easier.

@genehack genehack added the enhancement New feature or request label May 15, 2024
@huddlej
Copy link
Contributor

huddlej commented May 15, 2024

Related issue from back when "nextclade" was called "nextalign": #706

@genehack genehack self-assigned this Jul 3, 2024
@genehack
Copy link
Contributor Author

genehack commented Aug 5, 2024

Tagging @rneher and @huddlej for input:

I started looking into this, and very quickly ran into the issue of what command line flags to support for nextclade run.

augur align right now runs mafft in two of two different ways, depending on whether it is extending an existing alignment or making a new one:

new alignment:

mafft --reorder --anysymbol --nomemsave --adjustdirection \
      --thread <nthreads> input_seqs.fasta 1> aligned.fasta 2> log

existing alignment:

mafft --add existing_align.fasta --keeplength \
      --reorder --anysymbol --nomemsave --adjustdirection \
      --thread <nthreads> input_seqs.fasta 1> aligned.fasta 2> log

(i.e., they are the same except for the first two additional arguments when expanding an alignment)

It doesn't seem like nextclade run supports extending an existing alignment, so obviously we should throw an error if somebody tries augur align --method nextclade --existing-alignment.

I have the impression (perhaps erroneous?) that it's okay for augur to be somewhat opinionated when wrapping another command -- in other words, we don't need to support the full range of nextclade run alignment options ...but we should probably support some, maybe? (Which would then need to be exclusive of --method mafft).

The "base" nextclade run command will look something like this:

nextclade run -r reference.fasta -o aligned.fasta sequences.fasta 2> log

What additional flags would we support? The ones that jump out at me are:

  • --in-order false, to match the --reorder option with mafft,
  • --retry-reverse-complement=true, to match the --adjustdirection option.

Are there any others that feel like "must have" options to be able to set? I am inclined towards keeping this set small to start out with, with expansion in the future based on requests for additional params based on real-world usage.

@huddlej
Copy link
Contributor

huddlej commented Aug 9, 2024

I know both @trvrb and @rneher have requested this functionality, but I don't fully agree that augur align should wrap nextclade. My main reasons are:

  1. augur align has historically produced a multiple sequence alignment with a third-party tool, while nextclade produces a reference-based alignment. These are two distinctly different approaches to alignment that the user needs to make a thoughtful decision about (in the context of all the other decisions they make for a phylogenetic workflow). I don't mean to say that the choice between MSA or reference alignment should be intentionally complicated by the existence of two different commands. The issue is more that augur align is short-hand for "MSA" and nextclade is short-hand for "reference alignment". If augur align supports both, users will need to be more careful about picking the type of alignment they want.

  2. nextclade has a really complicated interface with multiple input and output files and dozens of arguments to control alignment. It has sane defaults for SARS-CoV-2 alignment, but each new pathogen will require some thoughtful tuning of alignment parameters to get high-quality alignments. In practice, this requires users to get familiar with Nextclade parameters through the docs and/or the --help output on the command line and run nextclade many times until they settle on the best parameters. Wrapping nextclade in an augur align command will only complicate that tuning process by adding another layer between the user and the parameters they need to tune. This tuning process is in contrast to default augur align behavior with our mafft defaults which "just works" for most viral pathogens.

  3. Wrapper commands like augur align and augur tree exist mostly to provide a standard interface for alignments and trees that hides the tool-specific behavior of mafft, IQ-TREE, etc. We don't control the interface for those tools, so we have to wrap them to get the interface we want. We do control Nextclade's interface, so we can make it follow whatever standards we want for our bioinformatics tools without wrapping it.

@trvrb's comment on Slack was:

Along these lines it would be nice to update augur align to take --method nextclade to make swapping / upgrading easier.

My main point in the comments above is that I don't think swapping from mafft to Nextclade will actually be as easy as changing a --method argument and that wrapping Nextclade will require more work for the user to figure out how the corresponding augur align --alignment-builder-args argument maps to a nextclade command. As a specific example, the seasonal-flu workflow uses the following Nextclade command:

        nextclade3 run\
            {input.sequences} \
            -r {input.reference} \
            -m {input.annotation} \
            --gap-alignment-side right \
            --cds-selection {params.genes} \
            --jobs {threads} \
            --include-reference \
            --output-fasta {output.alignment} \
            --output-translations "{output.translations}/{{cds}}.fasta"

This command would translate to an augur align command like this (copying the pattern of --tree-builder-args from augur tree here):

augur align \
    --method nextclade \
    --sequences {input.sequences} \
    --reference-sequence {input.reference} \
    --nthreads {threads} \
    --output {output.alignment} \
    --alignment-builder-args="-m {input.annotation} --gap-alignment-side right --cds-selection {params.genes} --output-translations \"{output.translations}/{{cds}}.fasta\""

This seems slightly more complicated than the original Nextclade command above because it requires the user to know which augur align arguments map to nextclade arguments and which nextclade arguments have to be specified separately in the custom builder args argument.

@jameshadfield
Copy link
Member

I know both @trvrb and @rneher have requested this functionality, but I don't fully agree that augur align should wrap nextclade.

I had drafted a similar (but briefer) argument, but John's sentiments roughly align with my own. In terms of migrating from augur align ... to nextclade ... I'd think docs (including real world examples) is a better investment than building a wrapper.

@genehack
Copy link
Contributor Author

Okay, based on feedback from @huddlej and @jameshadfield I'm going to close this out as WONTFIX — thanks y'all.

@genehack genehack closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants