Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use clade I reference KJ642613 for clade I build, mask correctly #292

Merged
merged 1 commit into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions phylogenetic/defaults/clade-i/auspice_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,46 @@
"title": "Release Date",
"type": "categorical"
},
{
"key": "sra_accession",
"title": "SRA Accession",
"type": "categorical"
},
{
"key": "coverage",
"title": "Coverage",
"type": "continuous"
},
{
"key": "missing_data",
"title": "Missing Data",
"type": "continuous"
},
{
"key": "nonACGTN",
"title": "Non-ACGTN",
"type": "continuous"
},
{
"key": "institution",
"title": "Institution",
"type": "categorical"
},
{
"key": "division",
"title": "Division",
"type": "categorical"
},
{
"key": "location",
"title": "Location",
"type": "categorical"
},
{
"key": "abbr_authors",
"title": "Abbreviated Authors",
"type": "categorical"
},
{
"key": "date",
"title": "Collection date",
Expand Down
12 changes: 6 additions & 6 deletions phylogenetic/defaults/clade-i/config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
reference: "defaults/clade-i/reference.fasta"
genome_annotation: "defaults/clade-i/genome_annotation.gff3"
genbank_reference: "defaults/clade-i/reference.gb"
include: "defaults/clade-i/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/clade-i/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"
tree_mask: "defaults/clade-i/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -18,7 +18,7 @@ auspice_name: "mpox_clade-I"

filter:
min_date: 1900
min_length: 100000
min_length: 170000


### Filter to only Clade I sequences
Expand Down Expand Up @@ -56,7 +56,7 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/mask.bed"
maskfile: "defaults/clade-i/mask.bed"

colors:
ignore_categories: "division location"
362 changes: 362 additions & 0 deletions phylogenetic/defaults/clade-i/genome_annotation.gff3

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions phylogenetic/defaults/clade-i/mask.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Chrom ChromStart ChromEnd locus tag Comment
chr 8340 8480 indel variation and long repetitive elements
chr 17960 17980 Next to stretch of Ns, suspicious
chr 18930 19050 termini of clade Ib deletion are sometimes incorrectly called
chr 19900 20100 termini of clade Ib deletion are sometimes incorrectly called
chr 20250 20280 triple mutation in Ib is inconsistently called
chr 22890 22920 homopolymer stretch
chr 31570 31610 Indel often incorrectly called
chr 77870 77880 Likely reversion to reference in some INRB sequences only
chr 109460 109470 Lots of ambiguous right next to start of Ns
chr 109730 109750 Indel often incorrectly called
chr 123370 123400 Right next to stretch of Ns in INRB sequences only
chr 138000 138300 indel variation and long repetitive elements
chr 141700 141800 indel variation and long repetitive elements
chr 144750 144830 indel variation and long repetitive elements
chr 148440 148660 indel variation and long repetitive elements
chr 149970 150020 indel variation and long repetitive elements
chr 152170 152300 indel variation and long repetitive elements
chr 157520 157570 homopolymer/tandem repeats
chr 158790 158800 Mutation right next to stretch of Ns, INRB sequences only
chr 162580 162610 indel variation and long repetitive elements
chr 169000 169350 indel variation and long repetitive elements
chr 177250 177350 indel variation and long repetitive elements
chr 178500 178900 indel variation and long repetitive elements
chr 180650 180710 Indel sometimes called messily
Loading