-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Segment builds for H5N1 cattle flu outbreak
Individual datasets for all 8 segments are added using NCBI data (consistent with the whole-genome view) and using the strains included in the genome tree to choose the appropriate monophyletic clade in each segment tree to include. There is an added Auspice colouring "genome_tree" which indicates whether the strain exists in the whole genome tree. See top-level README for instructions on how to run. There are a number of important caveats with the current implementation, all of which are fixable but I think better done in future work. * The input source must be NCBI for this build to work as expected and thus the `s3_src` config argument is essential. Snakemake will not reuse existing files in `./data` if they originated from a different source (e.g. GISAID), despite their filenames being identical as the params of the originating rule are different. I've added a note about this in the README. * The filenames produced include "_all-time" as the current Snakemake workflow requires a "time" wildcard, however we want to remove this prior to uploading. E.g. `auspice/avian-flu_h5n1-cattle-outbreak_ha_all-time.json` should be uploaded to /avian-flu/h5n1-cattle-outbreak/ha to match the URL structure for the genome build. We can use the following bash one-liner prior to uploading while this remains unfixed: ``` for i in auspice/avian-flu_h5n1-cattle-outbreak_*_all-time.json; do mv $i ${i%_all-time.json}.json; done ``` * The segment builds may include basal strains which aren't part of the cattle-flu outbreak but are included because there aren't mutations which distinguish them from others which should be. The "genome_tree" colouring is helpful here. (See the note in `restrict-via-common-ancestor.py` -- we may want to include more basal strains here).
- Loading branch information
1 parent
75e4139
commit 3d65a9b
Showing
5 changed files
with
234 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
|
||
rule download_tree: | ||
""" | ||
Downloads the tree behind nextstrain.org/avian-flu/h5n1-cattle-outbreak/genome | ||
so that the segment-level builds can use it to restrict the strains used. | ||
TODO: if the whole-genome analysis has been run locally we should (optionally) use that tree. | ||
""" | ||
output: | ||
tree = "results/tree_{subtype}_genome.json", | ||
params: | ||
dataset="https://data.nextstrain.org/avian-flu_h5n1-cattle-outbreak_genome.json" | ||
wildcard_constraints: | ||
subtype="h5n1-cattle-outbreak", | ||
time="all-time", | ||
shell: | ||
""" | ||
curl --compressed {params.dataset} -o {output.tree} | ||
""" | ||
|
||
|
||
rule prune_tree: | ||
input: | ||
tree = "results/tree_{subtype}_{segment}_{time}.nwk", | ||
strains = "results/tree_{subtype}_genome.json", | ||
output: | ||
tree = "results/tree_{subtype}_{segment}_{time}_outbreak-clade.nwk", | ||
node_data = "results/tree_{subtype}_{segment}_{time}_outbreak-clade.json", | ||
wildcard_constraints: | ||
subtype="h5n1-cattle-outbreak", | ||
time="all-time", | ||
shell: | ||
""" | ||
python3 scripts/restrict-via-common-ancestor.py \ | ||
--tree {input.tree} \ | ||
--strains {input.strains} \ | ||
--output-tree {output.tree} \ | ||
--output-metadata {output.node_data} | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.