diff --git a/redirects.yml b/redirects.yml index 59be9d20..5150085b 100644 --- a/redirects.yml +++ b/redirects.yml @@ -159,14 +159,26 @@ from_url: /tutorials/quickstart.html to_url: /tutorials/running-a-workflow.html +- type: page + from_url: /tutorials/running-a-workflow.html + to_url: /tutorials/running-a-phylogenetic-workflow.html + - type: page from_url: /tutorials/zika.html to_url: /tutorials/creating-a-workflow.html +- type: page + from_url: /tutorials/creating-a-workflow.html + to_url: /tutorials/creating-a-phylogenetic-workflow.html + - type: page from_url: /tutorials/tb_tutorial.html to_url: /tutorials/creating-a-bacterial-pathogen-workflow.html +- type: page + from_url: /tutorials/creating-a-bacterial-pathogen-workflow.html + to_url: /tutorials/creating-a-bacterial-phylogenetic-workflow.html + - type: page from_url: /guides/share/nextstrain-groups.html to_url: /guides/share/groups/index.html diff --git a/src/guides/share/groups/index.rst b/src/guides/share/groups/index.rst index 28b21081..c55ad8ae 100644 --- a/src/guides/share/groups/index.rst +++ b/src/guides/share/groups/index.rst @@ -11,7 +11,7 @@ Share via Nextstrain Groups This how-to guide assumes familiarity with the :doc:`Nextstrain Groups ` feature and the :doc:`Nextstrain dataset files ` produced by :doc:`running a pathogen workflow - `. We recommend reading about those first + `. We recommend reading about those first if you're not familiar with them. Log in with the Nextstrain CLI diff --git a/src/index.rst b/src/index.rst index 4c96ad13..a2e5d160 100644 --- a/src/index.rst +++ b/src/index.rst @@ -52,10 +52,10 @@ team and other Nextstrain users provide assistance. For private inquiries, :hidden: Installing - tutorials/running-a-workflow - tutorials/creating-a-workflow + tutorials/running-a-phylogenetic-workflow + tutorials/creating-a-phylogenetic-workflow Exploring SARS-CoV-2 evolution - tutorials/creating-a-bacterial-pathogen-workflow + tutorials/creating-a-bacterial-phylogenetic-workflow tutorials/narratives-how-to-write Analyzing genomes with Nextclade diff --git a/src/install.rst b/src/install.rst index ad4f5ac9..b34ec0eb 100644 --- a/src/install.rst +++ b/src/install.rst @@ -325,7 +325,7 @@ Try running Augur and Auspice Next steps ========== -With Nextstrain installed, try :doc:`tutorials/running-a-workflow` next. +With Nextstrain installed, try :doc:`tutorials/running-a-phylogenetic-workflow` next. Alternate installation methods diff --git a/src/learn/augur-to-auspice.rst b/src/learn/augur-to-auspice.rst index 552e2cc9..134d28d7 100644 --- a/src/learn/augur-to-auspice.rst +++ b/src/learn/augur-to-auspice.rst @@ -24,7 +24,7 @@ Auspice (visualization) components It's helpful to start in Auspice and then work backwards to Augur. In this section, we will walk through various components of Auspice and how -they relate to the :term:`dataset JSON ` (sometimes called an Auspice JSON). +they relate to the :term:`dataset JSON ` (sometimes called an Auspice JSON). Phylogeny Tree Panel and Core Controls -------------------------------------- @@ -226,7 +226,7 @@ various components: .. image:: ../images/auspice-components-diversity-panel.png :alt: Annotated screenshot of Auspice's diversity (entropy) panel -The diversity panel is enabled by data in the :term:`dataset JSON `. +The diversity panel is enabled by data in the :term:`dataset JSON `. The top-level ``meta.genome_annotations`` provides the genome annotations displayed and the individual tree nodes provide the mutations via ``node.branch_attrs.mutations``, which are used to calculate the entropy @@ -337,9 +337,9 @@ Exporting data via Augur ======================== We now consider how information flows through Augur, specifically -``augur export v2`` which produces the :term:`dataset (Auspice) JSON ` +``augur export v2`` which produces the :term:`dataset (Auspice) JSON ` described above. This process combines data inputs with parameters configuring -aspects of the visualisation and produces :term:`dataset files ` for +aspects of the visualisation and produces :term:`dataset files ` for Auspice to visualise. .. graphviz:: diff --git a/src/learn/parts.rst b/src/learn/parts.rst index 4a412361..21a79a14 100644 --- a/src/learn/parts.rst +++ b/src/learn/parts.rst @@ -79,7 +79,7 @@ example, you visit `nextstrain.org/mumps/na Auspice displaying Mumps genomes from North America. -:term:`Datasets` are produced by Augur and +:term:`Datasets` are produced by Augur and visualized by Auspice. These files are often referred to as :term:`JSONs` colloquially because they use a generic data format called JSON. @@ -118,7 +118,7 @@ colloquially because they use a generic data format called JSON. Augur -> jsons -> Auspice; } -:term:`Builds` are recipes of code and data that produce these :term:`datasets`. +A :term:`build` is a recipe of several commands and data that produce a single :term:`dataset`. .. graphviz:: :align: center @@ -165,9 +165,13 @@ colloquially because they use a generic data format called JSON. metadata -> filter; } -Builds run several commands and are often automated by workflow managers such as `Snakemake `__, `Nextflow `__ and `WDL `__. A :term:`workflow` bundles one or more related :term:`builds` which each produce a :term:`dataset` for visualization with :term:`Auspice`. +A :term:`workflow` can bundle one or more related :term:`builds` and are often automated by workflow managers +such as `Snakemake `__, `Nextflow `__ +and `WDL `__. -As an example, our core workflows are organized as `Git repositories `__ hosted on `GitHub `__. Each contains a :doc:`Snakemake workflow ` using Augur, configuration, and data. +Our :term:`pathogen repositories` are organized as `Git repositories `__ +hosted on `GitHub `__. Each repository can contain +one or more workflows. .. graphviz:: :align: center @@ -176,7 +180,7 @@ As an example, our core workflows are organized as `Git repositories dataset0 - build1 -> dataset1 - build2 -> dataset2 - build3 -> dataset3 + build0 -> output0; + build1 -> output1; + build2 -> output2; + build3 -> output3; + build4 -> output4; + build5 -> output5; + build6 -> output6; + build7 -> output7; + build8 -> output8; + build9 -> output9; + build10 -> output10; + build11 -> output11; { - edge[style=invis] - dataset0 -> build1 // arrange clusters on same row - ellipses1 -> ellipses2 + edge[style=invis]; + output0 -> build3; // arrange clusters on same row + output3 -> build5; // arrange clusters on same row + ellipses1 -> ellipses2; } } @@ -242,5 +298,5 @@ quality checks, and phylogenetic placement. Nextclade can be used independently of other Nextstrain tools as well as integrated into workflows. With this overview, you'll be better prepared to :doc:`install Nextstrain -` and :doc:`run a workflow ` or :doc:`contribute +` and :doc:`run a workflow ` or :doc:`contribute to development `. diff --git a/src/reference/data-files.rst b/src/reference/data-files.rst index 72147ce3..3b80a4d5 100644 --- a/src/reference/data-files.rst +++ b/src/reference/data-files.rst @@ -26,14 +26,14 @@ Workflow files Files which correspond to several :term:`builds ` visible on nextstrain.org, e.g. all of builds under . These often include the full metadata table, sequences FASTA, titer matrix, etc. - We often call these "inputs" colloquially because they're often the top-level inputs to a :term:`workflow`, but some of the files are actually workflow-level outputs. + We often call these "inputs" colloquially because they're often the top-level inputs to a :term:`phylogenetic workflow`, but some of the files are actually workflow-level outputs. (Albeit, outputs that can be used as time-saving inputs in later workflow runs.) Build files Files which correspond to a specific single :term:`build` visible on nextstrain.org, e.g. <`nextstrain.org/ncov/open/global/6m `__>. - These often include the subsampled metadata table, sequences FASTA, and Newick tree as well as the final :term:`dataset` JSONs. + These often include the subsampled metadata table, sequences FASTA, and Newick tree as well as the final :term:`phylogenetic dataset` JSONs. - We often call these "outputs" colloquially because they're produced by running a :term:`workflow`, but some of the files are actually the specific, subsampled inputs that went into the specific build. + We often call these "outputs" colloquially because they're produced by running a :term:`phylogenetic workflow`, but some of the files are actually the specific, subsampled inputs that went into the specific build. Workflow and build files for public data are available from: diff --git a/src/reference/glossary.rst b/src/reference/glossary.rst index 780b7fb9..dacc3388 100644 --- a/src/reference/glossary.rst +++ b/src/reference/glossary.rst @@ -12,10 +12,24 @@ Glossary A web application used for phylogenetic visualization and analysis. :doc:`Documentation` + pathogen repository + + A version-controlled folder containing all files necessary to run a pathogen's :term:`workflows`. + + core repository + + A :term:`pathogen repository` maintained by the Nextstrain team. + workflow - also *pathogen workflow*, *pathogen analysis*, *Nextstrain workflow* - A reproducible process comprised of one or more :term:`builds` producing :term:`datasets`, which can be visualized by :term:`Auspice`. Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake. + A reproducible process comprised of one or more :term:`builds` producing :term:`datasets`. + Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake. + + A Nextstrain :term:`pathogen repository` typically consists of these different workflows + + 1. :term:`phylogenetic workflow` + 2. :term:`ingest workflow` + 3. :term:`Nextclade workflow` Our :term:`core workflows` can be divided into two types: @@ -26,19 +40,35 @@ Glossary The individual builds in a multi-build workflow are also "workflows" in the definition of workflow managers like Snakemake. - core workflow + phylogenetic workflow + also *Nextstrain workflow* + + A :term:`workflow` consisting of :term:`build(s)` that execute bioinformatic analyses with :term:`Augur` to generate + :term:`phylogenetic dataset(s)` for visualization with :term:`Auspice`. + + The phylogenetic workflow is often considered the primary workflow in a pathogen repository + (e.g. "the Zika workflow" typically means "the phylogenetic workflow in the Zika pathogen repository"). + + ingest workflow + + A :term:`workflow` consisting of :term:`build(s)` that curate public metadata and sequences to generate + :term:`ingest dataset(s)` that are typically used as input files for + :term:`phylogenetic workflows` and :term:`Nextclade workflows`. - A :term:`workflow` maintained by the Nextstrain team. + Nextclade workflow - workflow repository - also *pathogen workflow repository* + A :term:`workflow` consisting of :term:`build(s)` that generate :doc:`reference tree(s)` to be packaged with other + dataset files to create :term:`Nextclade dataset(s)`. - A version-controlled folder containing all files necessary to run a :term:`workflow`. + core workflow + + A default :term:`workflow` maintained by the Nextstrain team that can usually be run without additional + configurations or customizations. build - also *Nextstrain build* + also *Nextstrain build*, *phylogenetic build*, *ingest build*, *Nextclade build* - *(noun)* A sequence of commands, parameters and input files which work together to reproducibly execute bioinformatic analyses and generate a :term:`dataset` for visualization with :term:`Auspice`. + *(noun)* A sequence of commands, parameters and input files which work together to reproducibly generate a :term:`dataset`. build (verb) @@ -49,20 +79,49 @@ Glossary A modular instruction of a :term:`build` which can be run standalone (e.g. ``augur filter``), often with clear input and output files. dataset + + A collection of output files produced by a :term:`build`. + A Nextstrain :term:`pathogen repository` typically produces multiple types of datasets + + 1. :term:`phylogenetic dataset` + 2. :term:`ingest dataset` + 3. :term:`Nextclade dataset` + + phylogenetic dataset also *Auspice JSONs* - A collection of :term:`JSONs` produced by a :term:`build`. It is also the shared file prefix of the JSONs. For example ``flu/seasonal/h3n2/ha/2y`` identifies a dataset which corresponds to the files - : + A :term:`dataset` consisting of :term:`JSONs` produced by a :term:`build` of a :term:`phylogenetic workflow`. + It is also the shared file prefix of the JSONs. + For example ``flu/seasonal/h3n2/ha/2y`` identifies a dataset which corresponds to the files: - ``flu_seasonal_h3n2_ha_2y_meta.json`` - ``flu_seasonal_h3n2_ha_2y_tree.json`` - ``flu_seasonal_h3n2_ha_2y_tip-frequencies.json`` - Some :term:`workflows` produce a single, synonymous dataset, like Zika. Others, like seasonal flu, produce many datasets. + Some phylogenetic workflows produce a single, synonymous dataset, like Zika. Others, like seasonal flu, produce many datasets. + The phylogenetic dataset is often considered the primary dataset in a pathogen repository + (e.g. "the Zika dataset" typically means "the phylogenetic dataset from the Zika pathogen repository"). + + ingest dataset + + A :term:`dataset` consisting of curated files produced by a :term:`build` of an :term:`ingest workflow`. + Typically consists of the files: + + * metadata.tsv + * sequences.fasta + + If the ingest workflow includes Nextclade :term:`build steps`, then the dataset will typically include + :doc:`Nextclade output files` as well. + + Nextclade dataset + + A :term:`dataset` consisting of files required for a :doc:`Nextclade` analysis, usually produced + by a :term:`build` of a :term:`Nextclade workflow`. + See :doc:`documentation` for more details narrative - A method of data-driven storytelling with interactive views of :term:`datasets ` displayed alongside multiple pages (or slides) of text and images. + A method of data-driven storytelling with interactive views of :term:`phylogenetic datasets` displayed alongside multiple pages (or slides) of text and images. Saved as a Markdown file with extended syntax to support additional displays. Viewable on nextstrain.org or with :term:`Auspice` via the :doc:`cli:commands/view` or :doc:`auspice view ` commands. @@ -70,7 +129,7 @@ Glossary See also :doc:`/guides/communicate/narratives-intro` and :doc:`/tutorials/narratives-how-to-write`. JSONs - Special ``.json`` files produced by :term:`Augur` and visualized by :term:`Auspice`. These files make up a :term:`dataset`. + Special ``.json`` files produced by :term:`Augur` and visualized by :term:`Auspice`. These files make up a :term:`phylogenetic dataset`. See :doc:`data formats`. Nextstrain CLI diff --git a/src/tutorials/creating-a-bacterial-pathogen-workflow.rst b/src/tutorials/creating-a-bacterial-phylogenetic-workflow.rst similarity index 97% rename from src/tutorials/creating-a-bacterial-pathogen-workflow.rst rename to src/tutorials/creating-a-bacterial-phylogenetic-workflow.rst index 5d5747ee..21806b57 100644 --- a/src/tutorials/creating-a-bacterial-pathogen-workflow.rst +++ b/src/tutorials/creating-a-bacterial-phylogenetic-workflow.rst @@ -1,8 +1,8 @@ -====================================== -Creating a bacterial pathogen workflow -====================================== +========================================== +Creating a bacterial phylogenetic workflow +========================================== -This tutorial explains how to create a :term:`single-build Nextstrain workflow` for Tuberculosis sequences. However, much of it will be applicable to any run where you are starting with `VCF `_ files rather than `FASTA `_ files. We'll create a Snakefile step-by-step for each step of the analysis. +This tutorial explains how to create a :term:`single-build Nextstrain workflow` for Tuberculosis sequences. However, much of it will be applicable to any run where you are starting with `VCF `_ files rather than `FASTA `_ files. We'll create a Snakefile step-by-step for each step of the analysis. .. contents:: Table of Contents :local: @@ -12,12 +12,12 @@ Prerequisites ============= 1. :doc:`Install Nextstrain `. -2. Run through the :doc:`first tutorial`. This will verify your installation. +2. Run through the :doc:`first tutorial`. This will verify your installation. Setup ===== -1. Download the example :term:`workflow repository` and enter the new directory. +1. Download the example :term:`pathogen repository` and enter the new directory. .. code-block:: bash diff --git a/src/tutorials/creating-a-workflow.rst b/src/tutorials/creating-a-phylogenetic-workflow.rst similarity index 94% rename from src/tutorials/creating-a-workflow.rst rename to src/tutorials/creating-a-phylogenetic-workflow.rst index 77643af4..0cac4ba1 100644 --- a/src/tutorials/creating-a-workflow.rst +++ b/src/tutorials/creating-a-phylogenetic-workflow.rst @@ -1,8 +1,8 @@ -============================ -Creating a pathogen workflow -============================ +================================ +Creating a phylogenetic workflow +================================ -This tutorial dissects the :term:`single-build workflow` used in the previous tutorial. We will first make the build step-by-step. Then we will automate this stepwise process in a :term:`workflow`. +This tutorial dissects the :term:`single-build workflow` used in the previous tutorial. We will first make the build step-by-step. Then we will automate this stepwise process in a :term:`workflow`. .. note:: @@ -16,12 +16,12 @@ Prerequisites ============= 1. :doc:`Install Nextstrain `. -2. Run through the :doc:`previous tutorial`. This will verify your installation. +2. Run through the :doc:`previous tutorial`. This will verify your installation. Setup ===== -1. Change directory to the Zika :term:`workflow repository` downloaded in the previous tutorial. +1. Change directory to the Zika :term:`pathogen repository` downloaded in the previous tutorial. .. code-block:: bash @@ -278,7 +278,7 @@ To stop Auspice and return to the command line when you are done viewing your da Automate the Build with Snakemake ================================= -While it is instructive to run all of the above commands manually, it is more practical to automate their execution with a workflow manager. Nextstrain implements these automated builds with `Snakemake `_ by defining a ``Snakefile`` like `this Snakefile `_ used in the :doc:`previous tutorial `. +While it is instructive to run all of the above commands manually, it is more practical to automate their execution with a workflow manager. Nextstrain implements these automated builds with `Snakemake `_ by defining a ``Snakefile`` like `this Snakefile `_ used in the :doc:`previous tutorial `. From the ``zika-tutorial/`` directory, delete the previously generated results. @@ -301,4 +301,4 @@ Next steps - Learn more about :doc:`Augur commands `. - Learn more about :doc:`Auspice visualizations `. -- Fork the `Zika tutorial pathogen repository on GitHub `_, modify the Snakefile to make your own pathogen workflow, and learn :doc:`how to contribute to nextstrain.org `. +- Fork the `Zika tutorial pathogen repository on GitHub `_, modify the Snakefile to make your own phylogenetic workflow, and learn :doc:`how to contribute to nextstrain.org `. diff --git a/src/tutorials/narratives-how-to-write.rst b/src/tutorials/narratives-how-to-write.rst index 2d8291c0..ff3e3dc3 100644 --- a/src/tutorials/narratives-how-to-write.rst +++ b/src/tutorials/narratives-how-to-write.rst @@ -55,14 +55,14 @@ We'll introduce the basic functionality via an example Markdown file below, whic 3-slide narrative to introduce how to write narratives. This narrative is intended to be used as part out the [Writing a Narrative](https://docs.nextstrain.org/en/latest/tutorials/narratives-how-to-write.html) - tutorial. + tutorial. This opening slide is looking at monkeypox genomes focusing on the current outbreak. This view into the data is taken from the associated URL: https://nextstrain.org/monkeypox/hmpxv1?d=map&p=full&c=region " --- - + # [Monkeypox](https://nextstrain.org/monkeypox/hmpxv1?d=tree&p=full&c=region) We've now changed the view from the map to the phylogenetic tree. @@ -119,7 +119,7 @@ To introduce this functionality, You can see the titles of the three slides and their associated datasets. Hover over one to see the full appearance of the slide. -To the right of each title is the associate :term:`dataset` and a series of icons representing the main + sidecar :term:`JSONs` associated with the dataset. +To the right of each title is the associate :term:`phylogenetic dataset` and a series of icons representing the main + sidecar :term:`JSONs` associated with the dataset. The icons represent whether the dataset exists on nextstrain.org -- in this case, they are all green (success) or grey (not attempted). When writing a narrative, it's easy to make syntax errors that result in invalid datasets. To observe this, try changing a dataset URL in the Markdown file and dragging the file back onto the debugger. @@ -156,7 +156,7 @@ There are plenty of ways to approach the task, but we find the following workflo auspice [shape="tab" style="filled" fillcolor="#c7e9b4" label="nextstrain.org/...\nto choose desired \nview of data"] md [shape="note" style="filled" fillcolor="#41b6c4" label="Narrative file\nwe are writing\n(Markdown)"] debugger [shape="tab" style="filled" fontcolor="white" fillcolor="#225ea8" label="Narratives debugger\nto test narrative\nas we go"] - + auspice -> md [label="copy\nURL" fontcolor="#7fcdbb" fillcolor="#7fcdbb" color="#7fcdbb"] md -> auspice [label="repeat" fontcolor="#7fcdbb" fillcolor="#7fcdbb" color="#7fcdbb" splines=curved] md -> debugger [label="drag &\ndrop" fontcolor="#1d91c0" fillcolor="#1d91c0" color="#1d91c0"] diff --git a/src/tutorials/running-a-workflow.rst b/src/tutorials/running-a-phylogenetic-workflow.rst similarity index 74% rename from src/tutorials/running-a-workflow.rst rename to src/tutorials/running-a-phylogenetic-workflow.rst index 34d73bf9..35ec7a7c 100644 --- a/src/tutorials/running-a-workflow.rst +++ b/src/tutorials/running-a-phylogenetic-workflow.rst @@ -1,8 +1,8 @@ -=========================== -Running a pathogen workflow -=========================== +=============================== +Running a phylogenetic workflow +=============================== -This tutorial uses the :term:`Nextstrain CLI` to help you get started running :term:`pathogen workflows` and viewing the :term:`datasets` you see on `nextstrain.org `_. +This tutorial uses the :term:`Nextstrain CLI` to help you get started running :term:`phylogenetic workflows` and viewing the :term:`datasets` you see on `nextstrain.org `_. It assumes you are comfortable using the command line and installing software on your computer. If you need help when following this tutorial, please create a post at `discussion.nextstrain.org `_. @@ -17,10 +17,10 @@ Prerequisites 1. :doc:`Install Nextstrain `. These instructions will install all of the software you need to complete this tutorial and others. -Download the example Zika workflow repository +Download the example Zika pathogen repository ============================================= -:term:`Pathogen workflows` are stored in :term:`workflow repositories` (version-controlled folders) to track changes over time. Download the `example Zika workflow repository `_. +:term:`Pathogen workflows` are stored in :term:`pathogen repositories` (version-controlled folders) to track changes over time. Download the `example Zika pathogen repository `_. .. code-block:: @@ -33,7 +33,7 @@ When it's done, you'll have a new directory called ``zika-tutorial/``. Run the workflow ================ -:term:`Pathogen workflows` use the :term:`Augur` bioinformatics toolkit to subsample data, align sequences, build a phylogeny, estimate phylogeographic patterns, and save the results in a format suitable for visualization with :term:`Auspice`. +:term:`Phylogenetic workflows` use the :term:`Augur` bioinformatics toolkit to subsample data, align sequences, build a phylogeny, estimate phylogeographic patterns, and save the results in a format suitable for visualization with :term:`Auspice`. Run the workflow with the :term:`Nextstrain CLI`. @@ -51,7 +51,7 @@ Output files will be in the directories ``zika-tutorial/data/``, ``zika-tutorial Visualize results ================= -View the resulting :term:`dataset` using Nextstrain's visualizations. +View the resulting :term:`phylogenetic dataset` using Nextstrain's visualizations. .. code-block:: @@ -71,6 +71,6 @@ Next steps ========== * :doc:`Learn how to interpret Nextstrain's visualizations `. -* :doc:`Learn how to create the workflow in this tutorial `. +* :doc:`Learn how to create the workflow in this tutorial `. * Learn more about the CLI by running ``nextstrain --help`` and ``nextstrain --help``. * Explore the :term:`Nextstrain runtime` by running ad-hoc commands inside it using ``nextstrain shell zika-tutorial/``.