Move phylogenetic workflow from top-level to folder phylogenetic (#198

) * Move phylogenetic workflow from top-level to folder `phylogenetic` * wip: use the experimental workflow from nextstrain/.github#57
nextstrain · Sep 26, 2023 · 03f9a25 · 03f9a25
1 parent dddf628
commit 03f9a25
Show file tree

Hide file tree

Showing 51 changed files with 125 additions and 86 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -9,7 +9,10 @@ on:
 
 jobs:
   pathogen-ci:
-    uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@master
+    uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@dec0880059017dac7facf100435c5737bf1386c8
+    with:
+      workflow-root: phylogenetic
+
 
   lint:
      runs-on: ubuntu-latest

diff --git a/.github/workflows/rebuild-hmpxv1-big.yaml b/.github/workflows/rebuild-hmpxv1-big.yaml
@@ -43,7 +43,7 @@ jobs:
             --env GITHUB_RUN_ID \
             --env SLACK_TOKEN \
             --env SLACK_CHANNELS \
-            . \
+            phylogenetic \
               notify_on_deploy \
                 --configfiles config/hmpxv1_big/config.yaml config/nextstrain_automation.yaml \
                 --config auspice_prefix=$TRIAL_NAME
diff --git a/.github/workflows/rebuild-hmpxv1.yaml b/.github/workflows/rebuild-hmpxv1.yaml
@@ -43,7 +43,7 @@ jobs:
           --env GITHUB_RUN_ID \
           --env SLACK_TOKEN \
           --env SLACK_CHANNELS \
-          . \
+          phylogenetic \
             notify_on_deploy \
               --configfiles config/hmpxv1/config.yaml config/nextstrain_automation.yaml \
               --config auspice_prefix=$TRIAL_NAME
diff --git a/.github/workflows/rebuild-mpxv.yaml b/.github/workflows/rebuild-mpxv.yaml
@@ -43,7 +43,7 @@ jobs:
           --env GITHUB_RUN_ID \
           --env SLACK_TOKEN \
           --env SLACK_CHANNELS \
-          . \
+          phylogenetic \
             notify_on_deploy \
               --configfiles config/mpxv/config.yaml config/nextstrain_automation.yaml \
               --config auspice_prefix=$TRIAL_NAME
diff --git a/README.md b/README.md
@@ -1,106 +1,40 @@
-# nextstrain.org/monkeypox
+# Nextstrain repository for mpox virus
 
-This is the [Nextstrain](https://nextstrain.org) build for MPXV (mpox virus). Output from this build is visible at [nextstrain.org/monkeypox](https://nextstrain.org/monkeypox).
-The lineages within the recent mpox outbreaks in humans are defined in a separate [lineage-designation repository](https://github.com/mpxv-lineages/lineage-designation).
+This repository contains two workflows for the analysis of mpox virus (MPXV) data:
 
-## Software requirements
+- `ingest/` - Download data from GenBank, clean and curate it and upload it to S3
+- `phylogenetic/` - Make phylogenetic trees for nextstrain.org
 
-Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.
+Each folder contains a README.md with more information.
 
-## Usage
+## CI
 
-### Provision input data
+This repository uses GitHub Actions for CI. The workflows are defined in `.github/workflows/`.
 
-Input sequences and metadata can be retrieved from data.nextstrain.org
-
-* [sequences.fasta.xz](https://data.nextstrain.org/files/workflows/monkeypox/sequences.fasta.xz)
-* [metadata.tsv.gz](https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz)
-
-Note that these data are generously shared by many labs around the world.
-If you analyze and plan to publish using these data, please contact these labs first.
-
-Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to `data/` with:
-
-```bash
-nextstrain build . data/sequences.fasta data/metadata.tsv
-```
-
-### Run analysis pipeline
-
-Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:
-
-```bash
-nextstrain build . --configfile config/mpxv/config.yaml
-```
+## Development
 
-Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:
+### Pre-commit
 
-```bash
-nextstrain build . --configfile config/hmpxv1/config.yaml
-```
+This repository uses [pre-commit](https://pre-commit.com/) to run checks on the code before committing.
 
-Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:
+To install pre-commit on macOS, run:
 
 ```bash
-nextstrain build . --configfile config/hmpxv1_big/config.yaml
+brew install pre-commit
 ```
 
-### Deploy
-
-⚠️ The below is outdated and needs to be adjusted for the new build names (mpxv instead of monkeypox, etc.)
-
-<details>
-
-Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.
-
-This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.
+To install pre-commit on Ubuntu, run:
 
 ```bash
-python scripts/deploy.py --build-names hmpxv1 mpxv
+sudo apt install pre-commit
 ```
 
-If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.
-
-To deploy a locally built build to staging, use the `--staging` flag.
-
-To not deploy a dated build to production, add the `--no-dated` flag.
-
-</details>
-
-### Visualize results
-
-View results with:
+To activate pre-commit, run:
 
 ```bash
-nextstrain view .
-```
-
-## Configuration
-
-Configuration takes place in `config/*/config.yaml` files for each build.
-The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
-This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
-There is little redirection and each rule should be able to be reasoned with on its own.
-
-## Update example data
-
-[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/monkeypox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.
-
-Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:
-
-```sh
-nextstrain build . update_example_data -F
+pre-commit install
 ```
 
-## Data use
-
-We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
-sequences and metadata for sharing their work. Please note that although data generators have
-generously shared data in an open fashion, that does not mean there should be free license to
-publish on this data. Data generators should be cited where possible and collaborations should be
-sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
-uncertain.
-
 ## Development
 
 [![pre-commit.ci status](https://results.pre-commit.ci/badge/github/nextstrain/monkeypox/master.svg)](https://results.pre-commit.ci/latest/github/nextstrain/monkeypox/master)

diff --git a/phylogenetic/README.md b/phylogenetic/README.md
@@ -0,0 +1,102 @@
+# nextstrain.org/monkeypox
+
+This is the [Nextstrain](https://nextstrain.org) build for MPXV (mpox virus). Output from this build is visible at [nextstrain.org/monkeypox](https://nextstrain.org/monkeypox).
+The lineages within the recent mpox outbreaks in humans are defined in a separate [lineage-designation repository](https://github.com/mpxv-lineages/lineage-designation).
+
+## Software requirements
+
+Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.
+
+## Usage
+
+### Provision input data
+
+Input sequences and metadata can be retrieved from data.nextstrain.org
+
+* [sequences.fasta.xz](https://data.nextstrain.org/files/workflows/monkeypox/sequences.fasta.xz)
+* [metadata.tsv.gz](https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz)
+
+Note that these data are generously shared by many labs around the world.
+If you analyze and plan to publish using these data, please contact these labs first.
+
+Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to `data/` with:
+
+```bash
+nextstrain build . data/sequences.fasta data/metadata.tsv
+```
+
+### Run analysis pipeline
+
+Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:
+
+```bash
+nextstrain build . --configfile config/mpxv/config.yaml
+```
+
+Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:
+
+```bash
+nextstrain build . --configfile config/hmpxv1/config.yaml
+```
+
+Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:
+
+```bash
+nextstrain build . --configfile config/hmpxv1_big/config.yaml
+```
+
+### Deploy
+
+⚠️ The below is outdated and needs to be adjusted for the new build names (mpxv instead of monkeypox, etc.)
+
+<details>
+
+Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.
+
+This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.
+
+```bash
+python scripts/deploy.py --build-names hmpxv1 mpxv
+```
+
+If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.
+
+To deploy a locally built build to staging, use the `--staging` flag.
+
+To not deploy a dated build to production, add the `--no-dated` flag.
+
+</details>
+
+### Visualize results
+
+View results with:
+
+```bash
+nextstrain view .
+```
+
+## Configuration
+
+Configuration takes place in `config/*/config.yaml` files for each build.
+The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
+This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
+There is little redirection and each rule should be able to be reasoned with on its own.
+
+## Update example data
+
+[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/monkeypox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.
+
+Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:
+
+```sh
+nextstrain build . update_example_data -F
+```
+
+## Data use
+
+We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
+sequences and metadata for sharing their work. Please note that although data generators have
+generously shared data in an open fashion, that does not mean there should be free license to
+publish on this data. Data generators should be cited where possible and collaborations should be
+sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
+uncertain.
diff --git a/Snakefile → phylogenetic/Snakefile b/Snakefile → phylogenetic/Snakefile
diff --git a/bin/notify-on-deploy → phylogenetic/bin/notify-on-deploy b/bin/notify-on-deploy → phylogenetic/bin/notify-on-deploy
diff --git a/bin/notify-on-error → phylogenetic/bin/notify-on-error b/bin/notify-on-error → phylogenetic/bin/notify-on-error
diff --git a/bin/notify-on-start → phylogenetic/bin/notify-on-start b/bin/notify-on-start → phylogenetic/bin/notify-on-start
diff --git a/bin/notify-on-success → phylogenetic/bin/notify-on-success b/bin/notify-on-success → phylogenetic/bin/notify-on-success
diff --git a/bin/set-branch-ingest-config → phylogenetic/bin/set-branch-ingest-config b/bin/set-branch-ingest-config → phylogenetic/bin/set-branch-ingest-config
diff --git a/config/clades.tsv → phylogenetic/config/clades.tsv b/config/clades.tsv → phylogenetic/config/clades.tsv
diff --git a/config/color_ordering.tsv → phylogenetic/config/color_ordering.tsv b/config/color_ordering.tsv → phylogenetic/config/color_ordering.tsv
diff --git a/config/color_schemes.tsv → phylogenetic/config/color_schemes.tsv b/config/color_schemes.tsv → phylogenetic/config/color_schemes.tsv
diff --git a/config/description.md → phylogenetic/config/description.md b/config/description.md → phylogenetic/config/description.md
diff --git a/config/exclude_accessions.txt → phylogenetic/config/exclude_accessions.txt b/config/exclude_accessions.txt → phylogenetic/config/exclude_accessions.txt
diff --git a/config/genemap.gff → phylogenetic/config/genemap.gff b/config/genemap.gff → phylogenetic/config/genemap.gff
diff --git a/config/hmpxv1/auspice_config.json → ...genetic/config/hmpxv1/auspice_config.json b/config/hmpxv1/auspice_config.json → ...genetic/config/hmpxv1/auspice_config.json
diff --git a/config/hmpxv1/config.yaml → phylogenetic/config/hmpxv1/config.yaml b/config/hmpxv1/config.yaml → phylogenetic/config/hmpxv1/config.yaml
diff --git a/config/hmpxv1/include.txt → phylogenetic/config/hmpxv1/include.txt b/config/hmpxv1/include.txt → phylogenetic/config/hmpxv1/include.txt
diff --git a/config/hmpxv1_big/auspice_config.json → ...tic/config/hmpxv1_big/auspice_config.json b/config/hmpxv1_big/auspice_config.json → ...tic/config/hmpxv1_big/auspice_config.json
diff --git a/config/hmpxv1_big/config.yaml → phylogenetic/config/hmpxv1_big/config.yaml b/config/hmpxv1_big/config.yaml → phylogenetic/config/hmpxv1_big/config.yaml
diff --git a/config/hmpxv1_big/include.txt → phylogenetic/config/hmpxv1_big/include.txt b/config/hmpxv1_big/include.txt → phylogenetic/config/hmpxv1_big/include.txt
diff --git a/config/lat_longs.tsv → phylogenetic/config/lat_longs.tsv b/config/lat_longs.tsv → phylogenetic/config/lat_longs.tsv
diff --git a/config/mask.bed → phylogenetic/config/mask.bed b/config/mask.bed → phylogenetic/config/mask.bed
diff --git a/config/mask_overview.bed → phylogenetic/config/mask_overview.bed b/config/mask_overview.bed → phylogenetic/config/mask_overview.bed
diff --git a/config/mpxv/auspice_config.json → phylogenetic/config/mpxv/auspice_config.json b/config/mpxv/auspice_config.json → phylogenetic/config/mpxv/auspice_config.json
diff --git a/config/mpxv/config.yaml → phylogenetic/config/mpxv/config.yaml b/config/mpxv/config.yaml → phylogenetic/config/mpxv/config.yaml
diff --git a/config/mpxv/include.txt → phylogenetic/config/mpxv/include.txt b/config/mpxv/include.txt → phylogenetic/config/mpxv/include.txt
diff --git a/config/nextstrain_automation.yaml → ...genetic/config/nextstrain_automation.yaml b/config/nextstrain_automation.yaml → ...genetic/config/nextstrain_automation.yaml
diff --git a/config/reference.fasta → phylogenetic/config/reference.fasta b/config/reference.fasta → phylogenetic/config/reference.fasta
diff --git a/config/reference.gb → phylogenetic/config/reference.gb b/config/reference.gb → phylogenetic/config/reference.gb
diff --git a/config/tree_mask.tsv → phylogenetic/config/tree_mask.tsv b/config/tree_mask.tsv → phylogenetic/config/tree_mask.tsv
diff --git a/example_data/metadata.tsv → phylogenetic/example_data/metadata.tsv b/example_data/metadata.tsv → phylogenetic/example_data/metadata.tsv
diff --git a/example_data/sequences.fasta → phylogenetic/example_data/sequences.fasta b/example_data/sequences.fasta → phylogenetic/example_data/sequences.fasta
diff --git a/profiles/default/config.yaml → phylogenetic/profiles/default/config.yaml b/profiles/default/config.yaml → phylogenetic/profiles/default/config.yaml
diff --git a/scripts/assign-colors.py → phylogenetic/scripts/assign-colors.py b/scripts/assign-colors.py → phylogenetic/scripts/assign-colors.py
diff --git a/scripts/clades_renaming.py → phylogenetic/scripts/clades_renaming.py b/scripts/clades_renaming.py → phylogenetic/scripts/clades_renaming.py
diff --git a/...construct-recency-from-submission-date.py → ...construct-recency-from-submission-date.py b/...construct-recency-from-submission-date.py → ...construct-recency-from-submission-date.py
diff --git a/scripts/deploy.py → phylogenetic/scripts/deploy.py b/scripts/deploy.py → phylogenetic/scripts/deploy.py
diff --git a/scripts/fix_tree.py → phylogenetic/scripts/fix_tree.py b/scripts/fix_tree.py → phylogenetic/scripts/fix_tree.py
diff --git a/scripts/mutation_context.py → phylogenetic/scripts/mutation_context.py b/scripts/mutation_context.py → phylogenetic/scripts/mutation_context.py
diff --git a/scripts/remove_timeinfo.py → phylogenetic/scripts/remove_timeinfo.py b/scripts/remove_timeinfo.py → phylogenetic/scripts/remove_timeinfo.py
diff --git a/scripts/reverse_reversed_sequences.py → ...tic/scripts/reverse_reversed_sequences.py b/scripts/reverse_reversed_sequences.py → ...tic/scripts/reverse_reversed_sequences.py
diff --git a/scripts/set_final_strain_name.py → ...ogenetic/scripts/set_final_strain_name.py b/scripts/set_final_strain_name.py → ...ogenetic/scripts/set_final_strain_name.py
diff --git a/workflow/snakemake_rules/chores.smk → ...netic/workflow/snakemake_rules/chores.smk b/workflow/snakemake_rules/chores.smk → ...netic/workflow/snakemake_rules/chores.smk
diff --git a/workflow/snakemake_rules/core.smk → ...genetic/workflow/snakemake_rules/core.smk b/workflow/snakemake_rules/core.smk → ...genetic/workflow/snakemake_rules/core.smk
diff --git a/...ow/snakemake_rules/download_via_lapis.smk → ...ow/snakemake_rules/download_via_lapis.smk b/...ow/snakemake_rules/download_via_lapis.smk → ...ow/snakemake_rules/download_via_lapis.smk
diff --git a/...snakemake_rules/nextstrain_automation.smk → ...snakemake_rules/nextstrain_automation.smk b/...snakemake_rules/nextstrain_automation.smk → ...snakemake_rules/nextstrain_automation.smk
diff --git a/workflow/snakemake_rules/prepare.smk → ...etic/workflow/snakemake_rules/prepare.smk b/workflow/snakemake_rules/prepare.smk → ...etic/workflow/snakemake_rules/prepare.smk