Skip to content

Commit

Permalink
Merge pull request #254 from nextstrain/james/clade-i
Browse files Browse the repository at this point in the history
Add Clade-I build
  • Loading branch information
jameshadfield authored Jun 21, 2024
2 parents 3315ff1 + 6f92ff9 commit e6bc293
Show file tree
Hide file tree
Showing 7 changed files with 202 additions and 0 deletions.
59 changes: 59 additions & 0 deletions .github/workflows/rebuild-clade-i.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Rebuild clade-I

on:
repository_dispatch:
types:
- rebuild
- rebuild_clade-i

workflow_dispatch:
inputs:
trial_name:
description: "If set, result will be at nextstrain.org/staging/trial/${trial_name}/${auspice_name}"
required: false

jobs:
set_config_overrides:
runs-on: ubuntu-latest
steps:
- id: config
name: Set config overrides
env:
TRIAL_NAME: ${{ inputs.trial_name }}
run: |
config=""
if [[ "$TRIAL_NAME" ]]; then
config+="--config"
config+=" deploy_url='s3://nextstrain-staging/'"
config+=" auspice_prefix='"$TRIAL_NAME"'"
fi
echo "config=$config" >> "$GITHUB_OUTPUT"
outputs:
config_overrides: ${{ steps.config.outputs.config }}

rebuild_clade_i:
needs: [set_config_overrides]
permissions:
id-token: write
uses: nextstrain/.github/.github/workflows/pathogen-repo-build.yaml@master
secrets: inherit
with:
# We can migrate to AWS Batch when/if we need to for more resources,
# but at the time of writing the clade-I build is small & quick
runtime: docker
env: |
CONFIG_OVERRIDES: ${{ needs.set_config_overrides.outputs.config_overrides }}
GITHUB_RUN_ID: ${{ github.run_id }}
SLACK_CHANNELS: ${{ inputs.trial_name && vars.TEST_SLACK_CHANNEL || vars.SLACK_CHANNELS }}
BUILD_DIR: phylogenetic
BUILD_NAME: clade-i
run: |
nextstrain build \
--env GITHUB_RUN_ID \
--env SLACK_TOKEN \
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
6 changes: 6 additions & 0 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:
nextstrain build . --configfile defaults/hmpxv1_big/config.yaml
```

Run pipeline to produce the "clade I" tree for `/mpox/clade-I` with:

```bash
nextstrain build . --configfile defaults/clade-i/config.yaml
```

### Visualize results

View results with:
Expand Down
70 changes: 70 additions & 0 deletions phylogenetic/defaults/clade-i/auspice_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"title": "Genomic epidemiology of mpox clade I viruses",
"maintainers": [
{"name": "Nextstrain team", "url": "http://nextstrain.org"}
],
"data_provenance": [
{
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
"build_url": "https://github.com/nextstrain/mpox",
"colorings": [
{
"key": "region",
"title": "Region",
"type": "categorical"
},
{
"key": "country",
"title": "Country",
"type": "categorical"
},
{
"key": "host",
"title": "Host",
"type": "categorical"
},
{
"key": "GA_CT_fraction",
"title": "G→A or C→T fraction",
"type": "continuous"
},
{
"key": "dinuc_context_fraction",
"title": "NGA/TCN context of G→A/C→T mutations",
"type": "continuous"
},
{
"key": "recency",
"title": "Submission Recency",
"type": "categorical"
},
{
"key": "date_submitted",
"title": "Release Date",
"type": "categorical"
},
{
"key": "date",
"title": "Collection date",
"type": "categorical"
}
],
"geo_resolutions": [
"country"
],
"display_defaults": {
"color_by": "country",
"map_triplicate": true,
"distance_measure": "num_date",
"transmission_lines": false
},
"filters": [
"country",
"region",
"recency",
"host"
]
}
58 changes: 58 additions & 0 deletions phylogenetic/defaults/clade-i/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
include: "defaults/clade-i/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/clade-i/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
strain_id_field: "accession"
display_strain_field: "strain"

build_name: "clade-i"
auspice_name: "mpox_clade-I"

filter:
min_date: 1900
min_length: 100000
exclude_where: 'clade!=I'


### We don't want to subsample, so specify a config which is essentially a no-op
subsample:
everything:
group_by: ""
sequences_per_group: ""

## align
max_indel: 10000
seed_spacing: 1000

## treefix
fix_tree: true
treefix_root: "" # without a root we'll midpoint root which should work great for clade I

## refine
timetree: true
root: "best"
# Clock rate chosen via treetime inference on Clade-I data excluding Clade-Ib seqs (n=73)
# TODO: update this once more public data is available.
clock_rate: 1.465e-06
clock_std_dev: 6.7e-07
divergence_units: "mutations"

traits:
columns: "country"
sampling_bias_correction: 3

## recency
recency: true

mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/mask.bed"
Empty file.
3 changes: 3 additions & 0 deletions phylogenetic/defaults/clades.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ outgroup nuc 179226 T
clade I nuc 87560 T
clade I nuc 136015 A

clade Ib nuc 6014 G
clade Ib nuc 108966 T

clade II nuc 86502 G
clade II nuc 150970 A
clade II nuc 35352 C
Expand Down
6 changes: 6 additions & 0 deletions phylogenetic/rules/prepare_sequences.smk
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ rule filter:
min_date=config["filter"]["min_date"],
min_length=config["filter"]["min_length"],
strain_id=config["strain_id_field"],
exclude_where=lambda w: (
f"--exclude-where {config['filter']['exclude_where']}"
if "exclude_where" in config["filter"]
else ""
),
shell:
"""
augur filter \
Expand All @@ -74,6 +79,7 @@ rule filter:
--output-sequences {output.sequences} \
--output-metadata {output.metadata} \
--exclude {input.exclude} \
{params.exclude_where} \
--min-date {params.min_date} \
--min-length {params.min_length} \
--query "(QC_rare_mutations == 'good' | QC_rare_mutations == 'mediocre')" \
Expand Down

0 comments on commit e6bc293

Please sign in to comment.