Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relabel clades numerically, rather than geographically [#12] #13

Merged
merged 3 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion nextclade/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Yellow Fever Virus Nextclade Dataset Tree

This workflow creates a phylogenetic tree that can be used as part of
a Nextclade dataset to assign genotypes to yellow fever virus samples
a Nextclade dataset to assign clades to yellow fever virus samples
based on [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and
[Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75).

Expand All @@ -14,6 +14,22 @@ based on [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and
* Provide the following coloring options on the tree:
* Genotype assignment from `augur clades`

The clades we annotate (Clade I-VII) are roughly equivalent with the
following genotypes as described in the aforementioned two papers:

| Clade | Genotype |
|-----------|---------------------|
| Clade I | Angola |
| Clade II | East Africa |
| Clade III | East Central/Africa |
| Clade IV | West Africa I |
| Clade V | West Africa II |
| Clade VI | South America I |
| Clade VII | South America II |
genehack marked this conversation as resolved.
Show resolved Hide resolved

(N.b., this table is available as a TSV in this repo, at
`nextclade/defaults/clade-to-genotype.tsv`.)

## How to create a new tree

* Run the workflow: `nextstrain build .`
Expand Down
8 changes: 8 additions & 0 deletions nextclade/defaults/clade-to-genotype.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Clade Genotype
Clade I Angola
Clade II East Africa
Clade III East Central/Africa
Clade IV West Africa I
Clade V West Africa II
Clade VI South America I
Clade VII South America II
78 changes: 39 additions & 39 deletions nextclade/defaults/clades.tsv
Original file line number Diff line number Diff line change
@@ -1,40 +1,40 @@
clade gene site alt
Angola nuc 111 G
Angola nuc 219 T
Angola nuc 240 C
Angola nuc 246 A
Angola nuc 252 A
Angola nuc 255 A
Angola nuc 291 G
Angola nuc 294 A
Angola nuc 300 A
Angola nuc 315 G
Angola nuc 327 G
Angola nuc 372 A
Angola nuc 420 A
Angola nuc 432 A
Angola nuc 453 T
Angola nuc 492 G
Angola nuc 651 T
Angola nuc 72 A
Angola nuc 81 G
Angola nuc 88 C
Angola nuc 90 A
Angola nuc 99 T
East Africa nuc 171 G
East Africa nuc 438 G
East Africa nuc 45 A
East Africa nuc 468 T
East/Central Africa nuc 228 G
South America I nuc 219 A
South America I nuc 532 A
South America II nuc 114 C
South America II nuc 193 T
South America II nuc 249 A
South America II nuc 639 G
West Africa I nuc 183 G
West Africa I nuc 255 C
West Africa II nuc 270 A
West Africa II nuc 321 T
West Africa II nuc 477 A
West Africa II nuc 93 T
Clade I nuc 111 G
Clade I nuc 219 T
Clade I nuc 240 C
Clade I nuc 246 A
Clade I nuc 252 A
Clade I nuc 255 A
Clade I nuc 291 G
Clade I nuc 294 A
Clade I nuc 300 A
Clade I nuc 315 G
Clade I nuc 327 G
Clade I nuc 372 A
Clade I nuc 420 A
Clade I nuc 432 A
Clade I nuc 453 T
Clade I nuc 492 G
Clade I nuc 651 T
Clade I nuc 72 A
Clade I nuc 81 G
Clade I nuc 88 C
Clade I nuc 90 A
Clade I nuc 99 T
Clade II nuc 171 G
Clade II nuc 438 G
Clade II nuc 45 A
Clade II nuc 468 T
Clade III nuc 228 G
Clade VI nuc 219 A
Clade VI nuc 532 A
Clade VII nuc 114 C
Clade VII nuc 193 T
Clade VII nuc 249 A
Clade VII nuc 639 G
Clade IV nuc 183 G
Clade IV nuc 255 C
Clade V nuc 270 A
Clade V nuc 321 T
Clade V nuc 477 A
Clade V nuc 93 T
14 changes: 7 additions & 7 deletions nextclade/defaults/colors.tsv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# genotypes assigned by augur clades
clade_membership Angola #3F63CF
clade_membership East Africa #529AB6
clade_membership East/Central Africa #75B681
clade_membership South America I #A6BE55
clade_membership South America II #D4B13F
clade_membership West Africa I #E68133
clade_membership West Africa II #DC2F24
clade_membership Clade I #3F63CF
clade_membership Clade II #529AB6
clade_membership Clade III #75B681
clade_membership Clade IV #A6BE55
clade_membership Clade V #DC2F24
clade_membership Clade VI #E68133
clade_membership Clade VII #D4B13F

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if we discussed in the meeting whether we want multiple clade systems in the Nextclade dataset, i.e. should we have these numeric clades as the default, but still include the geographical genotype labels as a separate color-by?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if we discussed in the meeting whether we want multiple clade systems in the Nextclade dataset, i.e. should we have these numeric clades as the default, but still include the geographical genotype labels as a separate color-by?

The impression that I took away from the feedback in that meeting was that we wanted to de-emphasize the geographically oriented names, so I would lean away from allowing color-by for them.

15 changes: 14 additions & 1 deletion nextclade/defaults/nextclade-dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

## Scope of this dataset

This dataset assigns genotypes to yellow fever virus samples based on
This dataset assigns clades to yellow fever virus samples based on
strain and genotype information from [Mutebi et al.][] (J Virol. 2001
Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75)

Expand All @@ -21,6 +21,19 @@ comprises the 3' end of the pre-membrane protein (prM) gene, the
entire membrane protein (M) gene, and the 5' end of the envelope
protein (E) gene.

The clades we annotate (Clade I-VII) are roughly equivalent with the
following genotypes as described in the aforementioned two papers:

| Clade | Genotype |
|-----------|---------------------|
| Clade I | Angola |
| Clade II | East Africa |
| Clade III | East Central/Africa |
| Clade IV | West Africa I |
| Clade V | West Africa II |
| Clade VI | South America I |
| Clade VII | South America II |

(N.b., the reference sequence used in this data set is actually 672nt
long, from bases 641-1312 of the genome reference. The 2 extra bases
make the reference an complete open reading frame.)
Expand Down