-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #222 from nextstrain/fix/flu-rbd-phenotype
fix: RBD position numbering
- Loading branch information
Showing
11 changed files
with
108,491 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
37 changes: 37 additions & 0 deletions
37
data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/CHANGELOG.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
## Unreleased | ||
|
||
Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based. | ||
|
||
## 2024-07-03T08:29:55Z | ||
|
||
Added configuration of current and recent vaccine strains as 'reference nodes' on the reference tree, against which query sequences can be compared. This feature is in addition to the new 'compare to clade founder' feature, allowing to compare each query sequence to the most ancestral node of a clade or lineage. | ||
|
||
The datasets themselves remain unchanged. | ||
|
||
See Nextclade documentation for more details about 'relative mutations' functionality. | ||
|
||
## 2024-04-19T07:50:39Z | ||
|
||
Update of the datasets with more recent data. No new clades were added on this occasion. | ||
|
||
## 2024-02-22T16:12:03Z | ||
|
||
After discussion with various members of the seasonal influenza virus surveillance community, it was decided that subclade names starting with `H` have the potential to be confused with major influenza hemagglutinin subtypes. These subclades where therefore renamed to start with the alias `J`. | ||
|
||
- `H` --> `J` | ||
- `H.1` --> `J.1` | ||
- `H.2` --> `J.2` | ||
- `H.3` --> `J.3` | ||
- `H.4` --> `J.4` | ||
|
||
The subclades `H` and `H.*` were revoked, and a comment was added to explain the reason. No subclade definitions were changed. | ||
|
||
|
||
## 2024-01-16T20:31:02Z | ||
|
||
Initial release for Nextclade v3! | ||
|
||
- Addition of subclade H.1, H.2, H.3, and H.4 | ||
- Aliasing of G.1.3.1.1 as subclade H | ||
|
||
Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html |
37 changes: 37 additions & 0 deletions
37
data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Influenza A(H3N2) HA dataset based on reference "A/Darwin/6/2021" | ||
|
||
| Key | Value | | ||
| -------------------- | -------------------- | | ||
| authors | [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) | | ||
| name | Influenza A H3N2 HA | | ||
| reference | A/Darwin/6/2021 | | ||
| dataset path | flu/h3n2/ha/EPI1857216 | | ||
| reference accession | EPI1857216 | | ||
| clade definitions | [github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/) | | ||
|
||
## Scope of this dataset | ||
This dataset uses a recent reference sequence (A/Darwin/6/2021) and is suitable for the analysis of circulating viruses. | ||
|
||
## Features | ||
This dataset supports | ||
|
||
* Assignment to clades and subclades based on the nomenclature defined in [github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/) | ||
* Identification of glycosilation motifs | ||
* Counting of mutations in the RBD | ||
* Sequence QC | ||
* Phylogenetic placement | ||
|
||
## Clades of seasonal influenza viruses | ||
|
||
The WHO Collaborating centers define "clades" as genetic groups of viruses with signature mutations to facilitate discussion of circulating diversity of the viruses. | ||
Clade demarcation do not always coincide with significantly different antigenic properties of the viruses. | ||
Clade names are structured as _Number-Letter_ binomials (with exceptions) separated by periods as in `3C.2a1b.2a.2a.1a`. These sometimes get shortened by omission of leading binomials like `2a.1`. | ||
|
||
In addition to these clades, "subclades" are defined to break down diversity at higher resolution and allow following the spread of different viral groups. | ||
These follow a Pango-like nomenclature consisting of a letter followed by a numbers separated by periods as in `G.1.3.1`. | ||
The leading letter is an alias of a previous name. | ||
Details of the nomenclature system can be found at [github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/). | ||
|
||
## What is Nextclade dataset | ||
|
||
Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html |
Binary file not shown.
5 changes: 5 additions & 0 deletions
5
data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/genome_annotation.gff3
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
##gff-version 3 | ||
##sequence-region EPI1857216 1 1718 | ||
EPI1857216 feature gene 1 48 . + . gene_name="SigPep" | ||
EPI1857216 feature gene 49 1035 . + . gene_name="HA1" | ||
EPI1857216 feature gene 1036 1698 . + . gene_name="HA2" |
174 changes: 174 additions & 0 deletions
174
data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/pathogen.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
{ | ||
"schemaVersion": "3.0.0", | ||
"alignmentParams": { | ||
"excessBandwidth": 9, | ||
"terminalBandwidth": 100, | ||
"allowedMismatches": 4, | ||
"gapAlignmentSide": "right", | ||
"minSeedCover": 0.1 | ||
}, | ||
"compatibility": { | ||
"cli": "3.0.0-alpha.0", | ||
"web": "3.0.0-alpha.0" | ||
}, | ||
"defaultCds": "HA1", | ||
"files": { | ||
"changelog": "CHANGELOG.md", | ||
"examples": "sequences.fasta", | ||
"genomeAnnotation": "genome_annotation.gff3", | ||
"pathogenJson": "pathogen.json", | ||
"readme": "README.md", | ||
"reference": "reference.fasta", | ||
"treeJson": "tree.json" | ||
}, | ||
"qc": { | ||
"privateMutations": { | ||
"enabled": true, | ||
"typical": 5, | ||
"cutoff": 15, | ||
"weightLabeledSubstitutions": 2, | ||
"weightReversionSubstitutions": 1, | ||
"weightUnlabeledSubstitutions": 1 | ||
}, | ||
"missingData": { | ||
"enabled": false, | ||
"missingDataThreshold": 100, | ||
"scoreBias": 10 | ||
}, | ||
"snpClusters": { | ||
"enabled": false, | ||
"windowSize": 100, | ||
"clusterCutOff": 5, | ||
"scoreWeight": 50 | ||
}, | ||
"mixedSites": { | ||
"enabled": true, | ||
"mixedSitesThreshold": 4 | ||
}, | ||
"frameShifts": { | ||
"enabled": true | ||
}, | ||
"stopCodons": { | ||
"enabled": true, | ||
"ignoredStopCodons": [] | ||
} | ||
}, | ||
"cdsOrderPreference": [ | ||
"HA1", | ||
"HA2" | ||
], | ||
"maintenance": { | ||
"website": [ | ||
"https://nextstrain.org", | ||
"https://clades.nextstrain.org" | ||
], | ||
"documentation": [ | ||
"https://github.com/nextstrain/seasonal-flu" | ||
], | ||
"source code": [ | ||
"https://github.com/nextstrain/seasonal_flu" | ||
], | ||
"issues": [ | ||
"https://github.com/nextstrain/seasonal_flu/issues" | ||
], | ||
"organizations": [ | ||
"Nextstrain" | ||
], | ||
"authors": [ | ||
"Nextstrain team <https://nextstrain.org>" | ||
] | ||
}, | ||
"nucMutLabelMap": {}, | ||
"nucMutLabelMapReverse": {}, | ||
"shortcuts": [ | ||
"flu_h3n2_ha", | ||
"nextstrain/flu/h3n2", | ||
"nextstrain/flu/h3n2/ha", | ||
"nextstrain/flu/h3n2/ha/darwin-6-2021" | ||
], | ||
"phenotypeData": [ | ||
{ | ||
"name": "RBD", | ||
"nameFriendly": "RBD mutations", | ||
"description": "This column displays the number of differences between the sequence and the reference at positions identified by Koel et al. (145, 155, 156, 158, 159, 189, and 193 in HA1)", | ||
"cds": "HA1", | ||
"aaRange": { | ||
"begin": 100, | ||
"end": 200 | ||
}, | ||
"ignore": { | ||
"clades": [ | ||
"outgroup" | ||
] | ||
}, | ||
"data": [ | ||
{ | ||
"name": "differences", | ||
"weight": 1, | ||
"locations": { | ||
"144": { | ||
"default": 1 | ||
}, | ||
"154": { | ||
"default": 1 | ||
}, | ||
"155": { | ||
"default": 1 | ||
}, | ||
"157": { | ||
"default": 1 | ||
}, | ||
"158": { | ||
"default": 1 | ||
}, | ||
"188": { | ||
"default": 1 | ||
}, | ||
"192": { | ||
"default": 1 | ||
} | ||
} | ||
} | ||
] | ||
} | ||
], | ||
"aaMotifs": [ | ||
{ | ||
"name": "glycosylation", | ||
"nameShort": "Glyc.", | ||
"nameFriendly": "Glycosylation", | ||
"description": "N-linked glycosylation motifs (N-X-S/T with X any amino acid other than P)", | ||
"includeCdses": [ | ||
{ | ||
"cds": "HA1", | ||
"ranges": [] | ||
}, | ||
{ | ||
"cds": "HA2", | ||
"ranges": [ | ||
{ | ||
"begin": 0, | ||
"end": 186 | ||
} | ||
] | ||
} | ||
], | ||
"motifs": [ | ||
"N[^P][ST]" | ||
] | ||
} | ||
], | ||
"attributes": { | ||
"name": "Influenza A H3N2 HA", | ||
"segment": "ha", | ||
"reference accession": "EPI1857216", | ||
"reference name": "A/Darwin/6/2021" | ||
}, | ||
"version": { | ||
"tag": "unreleased", | ||
"compatibility": { | ||
"cli": "3.0.0-alpha.0", | ||
"web": "3.0.0-alpha.0" | ||
} | ||
} | ||
} |
23 changes: 23 additions & 0 deletions
23
data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/reference.fasta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
>EPI_ISL_1563628 | A/Darwin/6/2021 | A / H3N2 | | 2021-03-16 | ||
ATGAAGACTATCATTGCTTTGAGCAACATTCTATGTCTTGTTTTCGCTCAAAAAATACCTGGAAATGACAATAGCACGGC | ||
AACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGAAAACAATCACAAATGACCGAATTGAAGTTACTA | ||
ATGCTACTGAGTTGGTTCAGAATTCATCAATAGGTGAAATATGCGGCAGTCCTCATCAGATCCTTGATGGAGGGAACTGC | ||
ACACTAATAGATGCTCTATTGGGGGACCCTCAGTGTGACGGCTTTCAAAATAAGGAATGGGACCTTTTTGTTGAAAGAAG | ||
CAGAGCCAACAGCAACTGTTACCCTTATGATGTGCCGGATTATGCCTCCCTTAGGTCACTAGTTGCCTCATCCGGCACAC | ||
TGGAGTTTAAAAATGAAAGCTTCAATTGGACTGGAGTCAAACAAAACGGAACAAGTTCTGCGTGCATAAGGGGATCTAGT | ||
AGTAGTTTTTTTAGTAGATTAAATTGGTTGACCAGCTTAAACAACATATATCCAGCACAGAACGTGACTATGCCAAACAA | ||
GGAACAATTTGACAAATTGTACATTTGGGGGGTTCACCACCCGGATACGGACAAGAACCAAATCTCCCTGTTTGCTCAAT | ||
CATCAGGAAGAATCACAGTATCTACCAAAAGAAGCCAACAAGCTGTAATCCCAAATATCGGATCTAGACCCAGAATAAGG | ||
GATATCCCTAGCAGAATAAGCATCTATTGGACAATAGTAAAACCGGGAGACATACTTTTGATTAACAGCACAGGGAATCT | ||
AATTGCTCCTAGGGGTTACTTCAAAATACGAAGTGGGAAAAGCTCAATAATGAGATCAGATGCACCCATTGGCAAATGTA | ||
AGTCTGAATGCATCACTCCAAATGGAAGCATTCCCAATGACAAACCGTTCCAAAATGTAAACAGGATCACATACGGGGCC | ||
TGTCCCAGATATGTTAAGCAAAGCACCCTGAAATTGGCAACAGGAATGCGAAATGTACCAGAGAAACAAACCAGAGGCAT | ||
ATTTGGCGCAATAGCGGGTTTCATAGAAAATGGATGGGAGGGAATGGTGGATGGTTGGTACGGTTTCAGGCATCAAAATT | ||
CTGAGGGAAGAGGACAAGCAGCAGATCTCAAAAGCACTCAAGCAGCAATCGATCAAATCAATGGGAAGCTGAATCGATTG | ||
ATCGGAAAAACCAACGAGAAATTCCATCAGATTGAAAAAGAATTCTCAGAAGTAGAAGGAAGAGTTCAAGACCTTGAGAA | ||
ATATGTTGAGGACACTAAAATAGATCTCTGGTCATACAACGCGGAGCTTCTTGTTGCCCTGGAGAACCAACATACGATTG | ||
ACCTAACTGACTCAGAAATGAACAAACTGTTTGAAAAAACAAAGAAGCAACTGAGGGAAAATGCTGAGGATATGGGAAAT | ||
GGTTGTTTCAAAATATACCACAAATGTGACAATGCCTGCATAGGATCAATAAGAAATGAAACTTATGACCACAATGTGTA | ||
CAGGGATGAAGCATTAAACAACCGGTTCCAGATCAAGGGAGTTGAGCTGAAGTCAGGGTACAAAGATTGGATCCTATGGA | ||
TTTCCTTTGCCATGTCATGTTTTTTGCTTTGTATTGCTTTGTTGGGGTTCATCATGTGGGCCTGCCAAAAGGGCAACATT | ||
AGATGCAACATTTGCATTTGAGTGCATTAATTAAAAAC |
Oops, something went wrong.