-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from nextstrain/add-nextclade-build-2
Add `nextclade` workflow [#2]
- Loading branch information
Showing
20 changed files
with
6,569 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
{ | ||
// long lines are okay | ||
"MD013":{ | ||
"line_length": 100, | ||
"tables": false | ||
}, | ||
// don't require top-level heading on L1 | ||
"MD041": false | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Yellow Fever Virus Nextclade Dataset Tree | ||
|
||
This workflow creates a phylogenetic tree that can be used as part of | ||
a Nextclade dataset to assign genotypes to yellow fever virus samples | ||
based on [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and | ||
[Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75). | ||
|
||
* Build a tree using samples from the `ingest` output, with the following | ||
sampling criteria: | ||
* Force-include the following samples: | ||
* genotype reference strains from the 2 papers cited above | ||
* Assign genotypes to each sample and internal nodes of the tree with | ||
`augur clades`, using clade-defining mutations in `defaults/clades.tsv` | ||
* Provide the following coloring options on the tree: | ||
* Genotype assignment from `augur clades` | ||
|
||
## How to create a new tree | ||
|
||
* Run the workflow: `nextstrain build .` | ||
* Inspect the output tree by comparing genotype assignments from the following sources: | ||
* `augur clades` output | ||
* If unwanted samples are present in the tree, add them to | ||
`defaults/dropped_strains.tsv` and re-run the workflow | ||
* If any changes are needed to the clade-defining mutations, add | ||
changes to `defaults/clades.tsv` and re-run the workflow | ||
* Repeat as needed | ||
|
||
[Mutebi et al.]: https://pubmed.ncbi.nlm.nih.gov/11435580/ | ||
[Bryant et al.]: https://pubmed.ncbi.nlm.nih.gov/17511518/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
configfile: "defaults/config.yaml" | ||
|
||
rule all: | ||
input: | ||
auspice_json = config["files"]["auspice_json"], | ||
nextclade_dataset = "dataset/tree.json", | ||
test_dataset = "test_output", | ||
|
||
include: "rules/prepare_sequences.smk" | ||
include: "rules/construct_phylogeny.smk" | ||
include: "rules/annotate_phylogeny.smk" | ||
include: "rules/export.smk" | ||
include: "rules/assemble_dataset.smk" | ||
|
||
rule clean: | ||
params: | ||
targets = [ | ||
".snakemake", | ||
"auspice", | ||
"benchmarks", | ||
"data", | ||
"dataset", | ||
"logs", | ||
"results", | ||
"test_output", | ||
] | ||
shell: | ||
""" | ||
rm -rfv {params.targets} | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
{ | ||
"title": "Real-time tracking of yellow fever virus full genome virus evolution", | ||
"maintainers": [ | ||
{"name": "John SJ Anderson", "url": "https://bedford.io/team/john-sj-anderson/"}, | ||
{"name": "the Nextstrain team", "url": "https://nextstrain.org/team"} | ||
], | ||
"data_provenance": [ | ||
{ | ||
"name": "GenBank", | ||
"url": "https://www.ncbi.nlm.nih.gov/genbank/" | ||
} | ||
], | ||
"build_url": "https://github.com/nextstrain/yellow-fever", | ||
"colorings": [ | ||
{ | ||
"key": "gt", | ||
"title": "Genotype", | ||
"type": "categorical" | ||
}, | ||
{ | ||
"key": "region", | ||
"title": "Region", | ||
"type": "categorical" | ||
}, | ||
{ | ||
"key": "country", | ||
"title": "Country", | ||
"type": "categorical" | ||
} | ||
], | ||
"geo_resolutions": [ | ||
"country", | ||
"region" | ||
], | ||
"display_defaults": { | ||
"map_triplicate": true, | ||
"color_by": "clade_membership" | ||
}, | ||
"filters": [ | ||
"clade_membership", | ||
"region", | ||
"country", | ||
"author", | ||
"host" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
clade gene site alt | ||
Angola nuc 111 G | ||
Angola nuc 219 T | ||
Angola nuc 240 C | ||
Angola nuc 246 A | ||
Angola nuc 252 A | ||
Angola nuc 255 A | ||
Angola nuc 291 G | ||
Angola nuc 294 A | ||
Angola nuc 300 A | ||
Angola nuc 315 G | ||
Angola nuc 327 G | ||
Angola nuc 372 A | ||
Angola nuc 420 A | ||
Angola nuc 432 A | ||
Angola nuc 453 T | ||
Angola nuc 492 G | ||
Angola nuc 651 T | ||
Angola nuc 72 A | ||
Angola nuc 81 G | ||
Angola nuc 88 C | ||
Angola nuc 90 A | ||
Angola nuc 99 T | ||
East Africa nuc 171 G | ||
East Africa nuc 438 G | ||
East Africa nuc 45 A | ||
East Africa nuc 468 T | ||
East/Central Africa nuc 228 G | ||
South America I nuc 219 A | ||
South America I nuc 532 A | ||
South America II nuc 114 C | ||
South America II nuc 193 T | ||
South America II nuc 249 A | ||
South America II nuc 639 G | ||
West Africa I nuc 183 G | ||
West Africa I nuc 255 C | ||
West Africa II nuc 270 A | ||
West Africa II nuc 321 T | ||
West Africa II nuc 477 A | ||
West Africa II nuc 93 T |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# genotypes assigned by augur clades | ||
clade_membership Angola #3F63CF | ||
clade_membership East Africa #529AB6 | ||
clade_membership East/Central Africa #75B681 | ||
clade_membership South America I #A6BE55 | ||
clade_membership South America II #D4B13F | ||
clade_membership West Africa I #E68133 | ||
clade_membership West Africa II #DC2F24 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
files: | ||
auspice_config: "defaults/auspice_config.json" | ||
auspice_json: "auspice/tree.json" | ||
clades: "defaults/clades.tsv" | ||
colors: "defaults/colors.tsv" | ||
include: "defaults/include_strains.txt" | ||
reference_prM-E_fasta: "defaults/reference.fasta" | ||
reference_prM-E_gff: "defaults/genome_annotation.gff3" | ||
strain_id_field: "accession" | ||
align_and_extract_prM-E: | ||
min_length: 500 | ||
min_seed_cover: 0.01 | ||
ancestral: | ||
inference: "joint" | ||
export: | ||
metadata_columns: "strain" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
##sequence-region prM-E 1 672 | ||
NC_002031.1 feature source 1 672 . + . gene=nuc | ||
NC_002031.1 feature gene 1 333 . + . gene_name=prM | ||
NC_002031.1 feature gene 109 333 . + . gene_name=M | ||
NC_002031.1 feature gene 334 672 . + . gene_name=E |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
# Extracted from tables and figures in Mutebi et al. (J Virol. 2001 | ||
# Aug;75(15):6999-7008) and Bryant et al. (PLoS Pathog. 2007 May | ||
# 18;3(5):e75) | ||
AF369669 | ||
AF369670 | ||
AF369671 | ||
AY540431 | ||
AY540432 | ||
AY540433 | ||
AY540434 | ||
AY540435 | ||
U52390 | ||
AY540437 | ||
AY540438 | ||
AY540439 | ||
AY540440 | ||
AY540441 | ||
AY540442 | ||
AY540443 | ||
AY540444 | ||
AY540445 | ||
AY540446 | ||
AY540447 | ||
AY540448 | ||
AY540449 | ||
AY540450 | ||
AY540451 | ||
AY540452 | ||
AY540453 | ||
U23570 | ||
AY540454 | ||
AY540455 | ||
AY540456 | ||
AY540457 | ||
AY540458 | ||
AY540459 | ||
AY540460 | ||
AY540461 | ||
AY540462 | ||
AY540463 | ||
AY540464 | ||
AY540465 | ||
AY540466 | ||
AY540467 | ||
AY540468 | ||
AY540469 | ||
AY540470 | ||
AY540471 | ||
AY540472 | ||
AY540473 | ||
AY540436 | ||
U52392 | ||
U52395 | ||
AF369672 | ||
AF369673 | ||
AY540475 | ||
AY540476 | ||
AY540474 | ||
U52399 | ||
AY540477 | ||
AY540478 | ||
AF369674 | ||
AF369675 | ||
AY572535 | ||
AY640589 | ||
AF369686 | ||
U54798 | ||
AY603338 | ||
AF369676 | ||
U52403 | ||
AF369677 | ||
AF369678 | ||
AF368679 | ||
AF369680 | ||
AF369681 | ||
AF369682 | ||
AF369683 | ||
AF369684 | ||
AF369685 | ||
AY540479 | ||
AY540480 | ||
AY161927 | ||
AY161928 | ||
AY161929 | ||
AY161930 | ||
AY161931 | ||
U52411 | ||
AY161933 | ||
AY161934 | ||
AY161935 | ||
U52405 | ||
U52407 | ||
AY161938 | ||
AY161939 | ||
AY161940 | ||
AY161941 | ||
AY161942 | ||
AY161943 | ||
AY161944 | ||
AY161945 | ||
AY161946 | ||
AY161947 | ||
AY161948 | ||
AY161949 | ||
AY161950 | ||
AY161951 | ||
GI694115 | ||
U89338 | ||
AF369687 | ||
AF369688 | ||
U52413 | ||
AF369689 | ||
AF369690 | ||
AF369691 | ||
AF369692 | ||
AF369693 | ||
AY690831 | ||
AY690832 | ||
AY690833 | ||
DQ872411 | ||
DQ872412 | ||
AY540481 | ||
AY540482 | ||
AY540483 | ||
AY540484 | ||
AY540485 | ||
AY540486 | ||
AF369694 | ||
U52422 | ||
AF369695 | ||
AF369696 | ||
AY540487 | ||
AY540488 | ||
AY540489 | ||
AY540490 | ||
AF369697 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## Unreleased | ||
|
||
Initial release of yellow fever virus dataset. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Yellow fever virus dataset | ||
|
||
| Key | Value | | ||
| ----------------- | -----------------------------------------------------------------| | ||
| name | Yellow fever virus (YFV) prM-E region | | ||
| authors | [Nextstrain](https://nextstrain.org) | | ||
| reference | AY640589.1 | | ||
| workflow | <https://github.com/nextstrain/yellow-fever/tree/main/nextclade> | | ||
| path | `nextstrain/yellow-fever/prM-E` | | ||
|
||
## Scope of this dataset | ||
|
||
This dataset assigns genotypes to yellow fever virus samples based on | ||
strain and genotype information from [Mutebi et al.][] (J Virol. 2001 | ||
Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75) | ||
|
||
These two papers, collectively, define 7 distinct yellow fever virus | ||
genotypes based on a 670 nucleotide region of the yellow fever virus | ||
genome, (bases 641-1310), called the prM-E region. This region | ||
comprises the 3' end of the pre-membrane protein (prM) gene, the | ||
entire membrane protein (M) gene, and the 5' end of the envelope | ||
protein (E) gene. | ||
|
||
(N.b., the reference sequence used in this data set is actually 672nt | ||
long, from bases 641-1312 of the genome reference. The 2 extra bases | ||
make the reference an complete open reading frame.) | ||
|
||
This dataset can be used to assign genotypes to any sequence that | ||
includes at least 500 bp of the prM-E region, including whole genome | ||
sequences. Sequence data beyond the prM-E region will be reported as an | ||
insertion in the Nextclade output. | ||
|
||
## Features | ||
|
||
This dataset supports: | ||
|
||
- Assignment of genotypes | ||
- Phylogenetic placement | ||
- Sequence quality control (QC) | ||
|
||
## What are Nextclade datasets | ||
|
||
Read more about Nextclade datasets in the Nextclade documentation: | ||
<https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html> | ||
|
||
[Mutebi et al.]: https://pubmed.ncbi.nlm.nih.gov/11435580/ | ||
[Bryant et al.]: https://pubmed.ncbi.nlm.nih.gov/17511518/ |
Oops, something went wrong.