Skip to content

Commit

Permalink
Merge pull request #238 from nextstrain/update-phylo-to-template
Browse files Browse the repository at this point in the history
Phylogenetic updates to match pathogen-repo-guide
  • Loading branch information
joverlee521 authored Feb 26, 2024
2 parents 3f2a361 + ce5c2d5 commit e439235
Show file tree
Hide file tree
Showing 41 changed files with 684 additions and 745 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
run: |
nextstrain build \
phylogenetic \
--configfile profiles/ci/builds.yaml
--configfiles build-configs/ci/config.yaml
artifact-name: output-${{ matrix.runtime }}
artifact-paths: |
phylogenetic/auspice/
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1-big.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-mpxv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
53 changes: 21 additions & 32 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ for Nextstrain's suite of software tools.
## Usage

If you're unfamiliar with Nextstrain builds, you may want to follow our
[Running a Pathogen Workflow guide][] first and then come back here.
[Running a Pathogen Workflow guide](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html) first and then come back here.

The easiest way to run this pathogen build is using the Nextstrain
command-line tool from within the `phylogenetic/` directory:
Expand All @@ -28,7 +28,7 @@ Once you've run the build, you can view the results with:
You can run an example build using the example data provided in this repository via:

```
nextstrain build . --configfile profiles/ci/builds.yaml
nextstrain build . --configfile build-configs/ci/config.yaml
```

When the build has finished running, view the output Auspice trees via:
Expand Down Expand Up @@ -61,43 +61,21 @@ nextstrain build . data/sequences.fasta data/metadata.tsv
Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:

```bash
nextstrain build . --configfile config/mpxv/config.yaml
nextstrain build . --configfile defaults/mpxv/config.yaml
```

Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:

```bash
nextstrain build . --configfile config/hmpxv1/config.yaml
nextstrain build . --configfile defaults/hmpxv1/config.yaml
```

Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:

```bash
nextstrain build . --configfile config/hmpxv1_big/config.yaml
nextstrain build . --configfile defaults/hmpxv1_big/config.yaml
```

### Deploy

⚠️ The below is outdated and needs to be adjusted for the new build names (mpox instead of monkeypox, etc.)

<details>

Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.

```bash
python scripts/deploy.py --build-names hmpxv1 mpxv
```

If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.

To deploy a locally built build to staging, use the `--staging` flag.

To not deploy a dated build to production, add the `--no-dated` flag.

</details>

### Visualize results

View results with:
Expand All @@ -108,19 +86,30 @@ nextstrain view .

## Configuration

Configuration takes place in `config/*/config.yaml` files for each build.
The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
The default configuration takes place in `defaults/*/config.yaml` files for each build.
The analysis pipeline is contained in `rules/core.smk`.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
There is little redirection and each rule should be able to be reasoned with on its own.

### Custom build configs

The build-configs directory contains configs and customizations that override and/or extend the default workflow.

- [chores](build-configs/chores/) - internal Nextstrain chores such as [updating the example data](#update-example-data).
- [ci](build-configs/ci/) - CI build that run the [example build](#example-build) with the [example data](example_data/).
- [nextstrain-automation](build-configs/nextstrain-automation/) - internal Nextstrain automated builds

## Update example data

[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/mpox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.
[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/mpox/actions/workflows/ci.yaml).
It can also be used as a small subset of real-world data.

Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:
Example data should be updated every time metadata schema is changed or a new clade/lineage emerges.
To update, run:

```sh
nextstrain build . update_example_data -F
nextstrain build . update_example_data -F \
--configfiles build-configs/ci/config.yaml build-configs/chores/config.yaml
```

## Data use
Expand Down
28 changes: 7 additions & 21 deletions phylogenetic/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,16 @@ if version.parse(augur_version) < version.parse(min_augur_version):

if not config:

configfile: "config/hmpxv1/config.yaml"
configfile: "defaults/hmpxv1/config.yaml"


build_dir = "results"


auspice_dir = "auspice"

prefix = config.get("auspice_prefix", None)
AUSPICE_PREFIX = ("trial_" + prefix + "_") if prefix is not None else ""
AUSPICE_FILENAME = AUSPICE_PREFIX + config.get("auspice_name")

# Defaults to the `build_name` if no `auspice_name` is provided in the config
AUSPICE_FILENAME = AUSPICE_PREFIX + config.get("auspice_name", config["build_name"])

rule all:
input:
Expand All @@ -39,22 +37,10 @@ rule all:
"""


if config.get("data_source", None) == "lapis":

include: "workflow/snakemake_rules/download_via_lapis.smk"

else:

include: "workflow/snakemake_rules/prepare.smk"


include: "workflow/snakemake_rules/chores.smk"
include: "workflow/snakemake_rules/core.smk"


if config.get("deploy_url", False):

include: "workflow/snakemake_rules/nextstrain_automation.smk"
include: "rules/prepare_sequences.smk"
include: "rules/construct_phylogeny.smk"
include: "rules/annotate_phylogeny.smk"
include: "rules/export.smk"


# Include custom rules defined in the config.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# I was hoping to use the Snakemake `default_target` directive to make this the
# default target when including this rule via `custom_rules`, but that is
# currently not possible: https://github.com/snakemake/snakemake/issues/2056
rule update_example_data:
"""This updates the files under example_data/ based on latest available data from data.nextstrain.org.
Expand Down
2 changes: 2 additions & 0 deletions phylogenetic/build-configs/chores/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
custom_rules:
- build-configs/chores/chores.smk
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
custom_rules:
- profiles/ci/copy_example_data.smk
- build-configs/ci/copy_example_data.smk

reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
Expand All @@ -20,7 +20,7 @@ build_name: "hmpxv1"
auspice_name: "mpox_clade-IIb"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2017
min_length: 100000

Expand Down Expand Up @@ -81,4 +81,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Optional configs to include for automated Nextstrain builds
# Intended to be used internally by the Nextstrain team

custom_rules:
- build-configs/nextstrain-automation/nextstrain-automation.smk

# deploy
deploy_url: "s3://nextstrain-data"
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Our bioinformatic processing workflow can be found at [github.com/nextstrain/mpo
- masking several regions of the genome, including the first 1350 and last 6422 base pairs and multiple repetitive regions of variable length
- phylogenetic reconstruction using [IQTREE-2](http://www.iqtree.org/)
- ancestral state reconstruction and temporal inference using [TreeTime](https://github.com/neherlab/treetime)
- clade assignment via [clade definitions defined here](https://github.com/nextstrain/mpox/blob/master/config/clades.tsv), to label broader MPXV clades I, IIa and IIb and to label hMPXV1 lineages A, A.1, A.1.1, etc...
- clade assignment via [clade definitions defined here](https://github.com/nextstrain/mpox/blob/master/defaults/clades.tsv), to label broader MPXV clades I, IIa and IIb and to label hMPXV1 lineages A, A.1, A.1.1, etc...

#### Underlying data
We curate sequence data and metadata from the [NCBI Datasets command line tools](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/),
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "hmpxv1"
auspice_name: "mpox_clade-IIb"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2017
min_length: 100000

Expand Down Expand Up @@ -78,4 +78,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1_big/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1_big/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1_big/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1_big/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "hmpxv1_big"
auspice_name: "mpox_lineage-B.1"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2022
min_length: 180000

Expand Down Expand Up @@ -57,4 +57,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
auspice_config: "config/mpxv/auspice_config.json"
include: "config/mpxv/include.txt"
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
lat_longs: "config/lat_longs.tsv"
description: "config/description.md"
clades: "config/clades.tsv"
tree_mask: "config/tree_mask.tsv"
auspice_config: "defaults/mpxv/auspice_config.json"
include: "defaults/mpxv/include.txt"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
lat_longs: "defaults/lat_longs.tsv"
description: "defaults/description.md"
clades: "defaults/clades.tsv"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "mpxv"
auspice_name: "mpox_all-clades"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 1950
min_length: 100000

Expand Down Expand Up @@ -74,4 +74,4 @@ recency: true
mask:
from_beginning: 1350
from_end: 6422
maskfile: "config/mask_overview.bed"
maskfile: "defaults/mask_overview.bed"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit e439235

Please sign in to comment.