Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning the Nextclade all dataset #58

Merged
merged 12 commits into from
Jun 5, 2024
8 changes: 8 additions & 0 deletions nextclade/config/config_dengue.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ filter:
denv3: '36'
denv4: '36'

refine:
root_id:
all: "Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome"
denv1: "Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome"
denv2: "Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome"
denv3: "Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv3/genome"
denv4: "Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv4/genome"

traits:
sampling_bias_correction: '3'
traits_columns:
Expand Down
27 changes: 23 additions & 4 deletions nextclade/datasets/all/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,26 @@
# Nextclade dataset for "Dengue Virus"
# De dataset

## Dataset attributes
| Key | Value |
| :-- | :-- |
| name | Dengue (serotype-level) |
| authors | [Nextstrain](https://nextstrain.org) |
| reference | NC_002640.1 |
| workflow | https://github.com/nextstrain/dengue/tree/main/nextclade |
| path | `nextstrain/dengue/all` |

Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
## Scope of this dataset

This dataset assigns serotype to dengue samples based on [criteria outlined by the WHO](https://pubmed.ncbi.nlm.nih.gov/26868382/) and tree placement nearest references [NC_001477.1 (DENV1)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001477.1), [NC_001474.2 (DENV2)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001474.2), [NC_001475.2 (DENV3)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001475.2), and [NC_002640.1 (DENV4)](https://www.ncbi.nlm.nih.gov/nuccore/NC_002640.1).

## Features

This dataset supports:

- Assignment of serotypes
- Phylogenetic placement
- Sequence quality control (QC)

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
26 changes: 13 additions & 13 deletions nextclade/datasets/all/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_002640.1 1 10649
NC_002640.1 feature gene 102 440 . + . codon_start=1;gene=C;gene_name=C;
NC_002640.1 feature gene 441 713 . + . codon_start=1;gene=pr;gene_name=pr;
NC_002640.1 feature gene 441 938 . + . codon_start=1;gene=M;gene_name=M;
NC_002640.1 feature gene 939 2423 . + . codon_start=1;gene=E;gene_name=E;
NC_002640.1 feature gene 2424 3479 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_002640.1 feature gene 3480 4133 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_002640.1 feature gene 4134 4523 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_002640.1 feature gene 4524 6377 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_002640.1 feature gene 6378 6758 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_002640.1 feature gene 6759 6827 . + . codon_start=1;gene=2K;gene_name=2K;
NC_002640.1 feature gene 6828 7562 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_002640.1 feature gene 7563 10262 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome 1 10649
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 102 440 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 441 713 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 441 938 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 939 2423 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 2424 3479 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 3480 4133 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 4134 4523 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 4524 6377 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6378 6758 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6759 6827 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6828 7562 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 7563 10262 . + . codon_start=1;gene=NS5;gene_name=NS5;
8 changes: 6 additions & 2 deletions nextclade/datasets/all/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.01,
"minLength": 1000
},
Expand Down Expand Up @@ -30,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -56,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
181 changes: 2 additions & 179 deletions nextclade/datasets/all/reference.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion nextclade/datasets/all/tree.json

Large diffs are not rendered by default.

26 changes: 13 additions & 13 deletions nextclade/datasets/denv1/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_001477.1 1 10735
NC_001477.1 feature gene 95 436 . + . codon_start=1;gene=C;gene_name=C;
NC_001477.1 feature gene 437 709 . + . codon_start=1;gene=pr;gene_name=pr;
NC_001477.1 feature gene 437 934 . + . codon_start=1;gene=M;gene_name=M;
NC_001477.1 feature gene 935 2419 . + . codon_start=1;gene=E;gene_name=E;
NC_001477.1 feature gene 2420 3475 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_001477.1 feature gene 3476 4129 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_001477.1 feature gene 4130 4519 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_001477.1 feature gene 4520 6376 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_001477.1 feature gene 6377 6757 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_001477.1 feature gene 6758 6826 . + . codon_start=1;gene=2K;gene_name=2K;
NC_001477.1 feature gene 6827 7573 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_001477.1 feature gene 7574 10270 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome 1 10735
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 95 436 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 437 709 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 437 934 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 935 2419 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 2420 3475 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 3476 4129 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 4130 4519 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 4520 6376 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6377 6757 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6758 6826 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6827 7573 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 7574 10270 . + . codon_start=1;gene=NS5;gene_name=NS5;
9 changes: 7 additions & 2 deletions nextclade/datasets/denv1/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.1,
"minLength": 1000
},
Expand All @@ -17,6 +21,7 @@
"experimental": true,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
Expand All @@ -29,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -55,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
182 changes: 2 additions & 180 deletions nextclade/datasets/denv1/reference.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion nextclade/datasets/denv1/tree.json

Large diffs are not rendered by default.

26 changes: 13 additions & 13 deletions nextclade/datasets/denv2/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_001474.2 1 10723
NC_001474.2 feature gene 97 438 . + . codon_start=1;gene=C;gene_name=C;
NC_001474.2 feature gene 439 711 . + . codon_start=1;gene=pr;gene_name=pr;
NC_001474.2 feature gene 439 936 . + . codon_start=1;gene=M;gene_name=M;
NC_001474.2 feature gene 937 2421 . + . codon_start=1;gene=E;gene_name=E;
NC_001474.2 feature gene 2422 3477 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_001474.2 feature gene 3478 4131 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_001474.2 feature gene 4132 4521 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_001474.2 feature gene 4522 6375 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_001474.2 feature gene 6376 6756 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_001474.2 feature gene 6757 6825 . + . codon_start=1;gene=2K;gene_name=2K;
NC_001474.2 feature gene 6826 7569 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_001474.2 feature gene 7570 10269 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome 1 10723
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 97 438 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 439 711 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 439 936 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 937 2421 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 2422 3477 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 3478 4131 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 4132 4521 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 4522 6375 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6376 6756 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6757 6825 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6826 7569 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 7570 10269 . + . codon_start=1;gene=NS5;gene_name=NS5;
9 changes: 7 additions & 2 deletions nextclade/datasets/denv2/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.1,
"minLength": 1000
},
Expand All @@ -17,6 +21,7 @@
"experimental": true,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
Expand All @@ -29,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -55,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
Loading