Skip to content

Commit

Permalink
Merge pull request #172 from nextstrain/trial-builds
Browse files Browse the repository at this point in the history
Allow trial builds through github action
  • Loading branch information
corneliusroemer authored Sep 20, 2023
2 parents 46a6891 + f2bed24 commit e4f2672
Show file tree
Hide file tree
Showing 16 changed files with 262 additions and 131 deletions.
7 changes: 6 additions & 1 deletion .github/workflows/rebuild-hmpxv1-big.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ on:

workflow_dispatch:
inputs:
trial_name:
description: "If set, result will be at nextstrain.org/staging/trial/trial_name/monkeypox/mpxv"
required: false
image:
description: 'Specific container image to use for build (will override the default of "nextstrain build")'
required: false

env:
TRIAL_NAME: ${{ github.event.inputs.trial_name }}
NEXTSTRAIN_DOCKER_IMAGE: ${{ github.event.inputs.image }}

jobs:
Expand Down Expand Up @@ -45,7 +49,8 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles config/config_hmpxv1_big.yaml config/nextstrain_automation.yaml
--configfiles config/config_hmpxv1_big.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=trial_$TRIAL_NAME
env:
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/rebuild-hmpxv1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ on:

workflow_dispatch:
inputs:
trial_name:
description: "If set, result will be at nextstrain.org/staging/trial/trial_name/monkeypox/mpxv"
required: false
image:
description: 'Specific container image to use for build (will override the default of "nextstrain build")'
required: false

env:
TRIAL_NAME: ${{ github.event.inputs.trial_name }}
NEXTSTRAIN_DOCKER_IMAGE: ${{ github.event.inputs.image }}

jobs:
Expand Down Expand Up @@ -45,7 +49,8 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles config/config_hmpxv1.yaml config/nextstrain_automation.yaml
--configfiles config/config_hmpxv1.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=trial_$TRIAL_NAME
env:
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand All @@ -54,4 +59,3 @@ jobs:
- name: notify_pipeline_failed
if: ${{ failure() }}
run: ./bin/notify-on-error

7 changes: 6 additions & 1 deletion .github/workflows/rebuild-mpxv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ on:

workflow_dispatch:
inputs:
trial_name:
description: "If set, result will be at nextstrain.org/staging/trial/trial_name/monkeypox/mpxv"
required: false
image:
description: 'Specific container image to use for build (will override the default of "nextstrain build")'
required: false

env:
TRIAL_NAME: ${{ github.event.inputs.trial_name }}
NEXTSTRAIN_DOCKER_IMAGE: ${{ github.event.inputs.image }}

jobs:
Expand Down Expand Up @@ -45,7 +49,8 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles config/config_mpxv.yaml config/nextstrain_automation.yaml
--configfiles config/config_mpxv.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=trial_$TRIAL_NAME
env:
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand Down
36 changes: 23 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,27 +18,35 @@ If you analyze and plan to publish using these data, please contact these labs f
Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to `data/` with:

```bash
nextstrain build --docker . data/sequences.fasta data/metadata.tsv
nextstrain build . data/sequences.fasta data/metadata.tsv
```

### Run analysis pipeline

Run pipeline to produce "overview" tree for `/monkeypox/mpxv` with:
Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:

```bash
nextstrain build --docker --cpus 1 . --configfile config/config_mpxv.yaml
nextstrain build . --configfile config/config_mpxv.yaml
```

Run pipeline to produce "outbreak" tree for `/monkeypox/hmpxv1` with:
Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:

```bash
nextstrain build --docker --cpus 1 . --configfile config/config_hmpxv1.yaml
nextstrain build . --configfile config/config_hmpxv1.yaml
```

Adjust the number of CPUs to what your machine has available if you want to perform alignment and tree building a bit faster.
Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:

```bash
nextstrain build . --configfile config/config_hmpxv1_big.yaml
```

### Deploying

⚠️ The below is outdated and needs to be adjusted for the new build names (mpxv instead of monkeypox, etc.)

<details>

Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.
Expand All @@ -53,17 +61,19 @@ To deploy a locally built build to staging, use the `--staging` flag.

To not deploy a dated build to production, add the `--no-dated` flag.

</details>

### Visualize results

View results with:

```bash
nextstrain view auspice/
nextstrain view .
```

## Configuration

Configuration takes place in `config/config.yml` by default.
Configuration takes place in `config/config_*.yaml` files for each build..
The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
There is little redirection and each rule should be able to be reasoned with on its own.
Expand All @@ -84,7 +94,7 @@ Follow the [standard installation instructions](https://docs.nextstrain.org/en/l
If you don't use the `nextstrain` CLI but a custom conda environment, make sure that you have `tsv-utils` and `seqkit` installed, e.g. using:

```sh
conda install -c bioconda tsv-utils seqkit
mamba install -c bioconda tsv-utils seqkit
```

### Nextstrain build vs Snakemake
Expand All @@ -93,11 +103,11 @@ The above commands use the Nextstrain CLI and `nextstrain build` along with Dock
Alternatively, if you [install Nextalign/Nextclade v2 locally](https://github.com/nextstrain/nextclade/releases) you can run the pipeline with:

```bash
snakemake -j 1 -p --configfile config/config_mpxv.yaml
snakemake -j 1 -p --configfile config/config_hmpxv1.yaml
snakemake --configfile config/config_mpxv.yaml
snakemake --configfile config/config_hmpxv1.yaml
snakemake --configfile config/config_hmpxv1_big.yaml
```


### Update colors to include new countries

Update `colors_hmpxv1.tsv` to group countries by region based on countries present in its `metadata.tsv`:
Expand All @@ -121,5 +131,5 @@ python3 scripts/update_colours.py --colors config/colors_mpxv.tsv \
Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:

```sh
nextstrain build --docker . update_example_data -F
nextstrain build . update_example_data -F
```
21 changes: 7 additions & 14 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,24 @@ if version.parse(augur_version) < version.parse(min_augur_version):
sys.exit(1)


# Use default configuration values. Override with Snakemake's --configfile/--config options.
configfile: "config/defaults.yaml"


build_dir = "results"


auspice_dir = "auspice"


rule all:
input:
auspice_json=auspice_dir + f"/{config.get('auspice_name','tree')}.json",
root_sequence_json=auspice_dir
+ f"/{config.get('auspice_name','')}_root-sequence.json",
AUSPICE_PREFIX = config.get("auspice_prefix", "")
AUSPICE_PREFIX = AUSPICE_PREFIX + "_" if AUSPICE_PREFIX else AUSPICE_PREFIX
AUSPICE_NAME = config.get("auspice_name", "tree")
AUSPICE_FILENAME = AUSPICE_PREFIX + AUSPICE_NAME


rule rename:
rule all:
input:
auspice_json=build_dir + f"/{config['build_name']}/tree.json",
root_sequence=build_dir + f"/{config['build_name']}/tree_root-sequence.json",
output:
auspice_json=auspice_dir + f"/{config.get('auspice_name','tree')}.json",
root_sequence_json=auspice_dir
+ f"/{config.get('auspice_name','')}_root-sequence.json",
auspice_json=f"{auspice_dir}/{AUSPICE_FILENAME}.json",
root_sequence_json=f"{auspice_dir}/{AUSPICE_FILENAME}_root-sequence.json",
shell:
"""
cp {input.auspice_json} {output.auspice_json}
Expand Down
65 changes: 61 additions & 4 deletions config/config_hmpxv1.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,79 @@
exclude: "config/exclude_accessions_mpxv.txt"
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
colors: "config/colors_hmpxv1.tsv"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/auspice_config_hmpxv1.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

build_name: "hmpxv1"
auspice_name: "monkeypox_hmpxv1"
auspice_name: "mpox_clade-IIb"

## filter
min_date: 2017
min_length: 100000
sequences_per_group: "--sequences-per-group 40"
group_by: "--group-by year month country"
filters: "--exclude-where outbreak!=hMPXV-1"


### Set 1: Non-B.1 sequences: use all
### Set 2: B.1 sequences: small sample across year/country, maybe month
filter:
non_b1:
group_by: "--group-by lineage year country"
sequences_per_group: "--sequences-per-group 50"
other_filters: "outbreak!=hMPXV-1 clade!=IIb"
exclude_lineages:
- B.1
- B.1.1
- B.1.2
- B.1.3
- B.1.4
- B.1.5
- B.1.6
- B.1.7
- B.1.8
- B.1.9
- B.1.10
- B.1.11
- B.1.12
- B.1.13
- B.1.14
- B.1.15
- B.1.16
- B.1.17
- B.1.18
- B.1.19
- B.1.20
- C.1
b1:
group_by: "--group-by country year"
sequences_per_group: "--subsample-max-sequences 100"
other_filters: "--exclude-where outbreak!=hMPXV-1 clade!=IIb"

## align
max_indel: 10000
seed_spacing: 1000

## treefix
fix_tree: true
treefix_root: "--root MK783032"

## refine
timetree: true
root: "MK783032 MK783030"
clock_rate: 5.7e-5
clock_std_dev: 2e-5

## recency
recency: true

mask:
from_beginning: 800
from_end: 6422
Expand Down
49 changes: 42 additions & 7 deletions config/config_hmpxv1_big.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,57 @@
exclude: "config/exclude_accessions_mpxv.txt"
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
colors: "config/colors_hmpxv1.tsv"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/auspice_config_hmpxv1_big.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

build_name: "hmpxv1_big"
auspice_name: "monkeypox_hmpxv1_big"
auspice_name: "mpox_lineage-B.1"

## filter
min_date: 2017
min_length: 100000
sequences_per_group: ""
group_by: ""
filters: "--exclude-where outbreak!=hMPXV-1"
min_date: 2022
min_length: 180000
filter:
b1:
group_by: "--group-by year month country"
sequences_per_group: "--subsample-max-sequences 5000"
other_filters: "outbreak!=hMPXV-1 clade!=IIb"
exclude_lineages:
- A
- A.1
- A.1.1
- A.2
- A.2.1
- A.2.2
- A.2.3
- A.3

## align
max_indel: 10000
seed_spacing: 1000

## treefix
fix_tree: true
treefix_root: "--root OP890401"

## refine
timetree: true
root: "MK783032 MK783030"
root: "OP890401"
clock_rate: 5.7e-5
clock_std_dev: 2e-5

## recency
recency: true

mask:
from_beginning: 800
from_end: 6422
Expand Down
Loading

0 comments on commit e4f2672

Please sign in to comment.