Releases: nextstrain/nextclade_data
2023-04-02
2023-04-02
New dataset version (tag 2023-04-02T12:00:00Z
)
Influenza virus datasets
All Influenza virus datasets were updated with more recent sequences. The trees now include more older reference viruses for more robust designation of older clades.
The B/Vic annotation of the HA segment was fixed - it was previously off by 3 nucleotides resulting in amino acid numbering being off by one.
2023-03-28
Internal
- A typo in the configuration of the default reference sequence configuration for Flu H1N1pdm NA dataset prevented the dataset from being downloaded by Nextclade CLI. This is now fixed. This only affects dataset server infrastruture (index file) and does not change dataset files, so no new version of any dataset is released. See #70.
2023-03-16
New dataset version (tag 2023-03-16T12:00:00Z
)
SARS-CoV-2 datasets
- Placement priors: Every tree node is now annotated with a
placement_prior
, an approximate probability (on log10 scale) that a random sequence is attached to this node. For this dataset, the prior was caluclated after placing 300k sequences on the tree. A value of-10
is chosen when no sequence in the sample attached to a node. The placement priors will improve placement accuracy of incomplete sequences (such as Spike only) - but only with a recent version of Nextclade (probably 2.13.0 and above). In that release, we will introduce a new placement tie-breaking feature: when a query sequence can attach to multiple nodes with equal number of mismatches, the sequence will be attached to the reference tree node with the highest prior. This is in contrast to the previous naive tie breaking logic which always chose the node with the fewest number of parent nodes. This lead to a bias towards attaching to recombinants. See neherlab/nextclade_data_workflows#38 for the code calculating the placement priors, and nextstrain/nextclade#1119 to see how the priors are used in Nextclade. - Pango lineages desiganted between 2023-02-24 and 2023-03-15 are now included, unfold below to see a list of them:
Newly included lineages, with designation date in parentheses
- XBB.2.6 (2023-02-26)
- XBB.8 (2023-02-26)
- EM.1 (2023-02-26)
- XBB.1.5.15 (2023-02-26)
- EK.2 (2023-02-26)
- XBB.1.5.16 (2023-02-26)
- XBB.1.5.17 (2023-02-26)
- XBB.1.5.18 (2023-02-26)
- XBB.1.5.19 (2023-02-26)
- XBB.1.5.20 (2023-02-26)
- XBB.1.5.21 (2023-02-26)
- EN.1 (2023-02-26)
- EP.2 (2023-02-26)
- EP.1 (2023-02-26)
- XBC.1.5 (2023-02-26)
- EQ.1 (2023-02-28)
- CY.2 (2023-02-28)
- CP.7 (2023-03-02)
- BQ.1.1.71 (2023-03-03)
- XBB.1.16 (2023-03-05)
- ER.1 (2023-03-05)
- ER.1.1 (2023-03-07)
- ES.1 (2023-03-09)
- CH.1.1.15 (2023-03-09)
- BF.7.4.3 (2023-03-10)
- BQ.1.32 (2023-03-11)
Internal
Add robots.txt to prevent data endpoints from indexing by search engines.
2023-02-25
New dataset version (tag 2023-02-25T12:00:00Z
)
SARS-CoV-2 datasets
- Recombinant trees are now built using IQtree which results in shared mutations on internal nodes. In the past, recombinant trees were purely based on the Pango hierarchy. This change makes lineage assignment for recombinants more robust and reduces the number of private mutations. See e.g. the XBB tree here: https://next.nextstrain.org/staging/nextclade/sars-cov-2/21L?label=clade:22F
- Known stop codons and frame shifts were updated to reduce the number of false positive warnings
- The set of labeled mutations was updated (in
virus_properties.json
), including the addition of characteristic 23A mutations. This will help identifying recombinants. - Pango lineages desiganted between 2023-02-01 and 2022-02-24 are now included, unfold below to see a list of them:
Newly included lineages, with designation date in parentheses
- BN.1.3.5 (2023-02-02)
- XBB.1.9.2 (2023-02-03)
- XBB.2.4 (2023-02-03)
- DT.2 (2023-02-03)
- BA.4.1.11 (2023-02-03)
- BF.5.3 (2023-02-08)
- XBB.1.12 (2023-02-09)
- XBB.1.5.4 (2023-02-10)
- XBB.1.5.5 (2023-02-10)
- XBB.1.5.6 (2023-02-10)
- XBB.1.5.7 (2023-02-10)
- XBB.1.5.8 (2023-02-10)
- XBB.1.5.9 (2023-02-10)
- XBB.1.5.10 (2023-02-10)
- XBB.1.13 (2023-02-10)
- BQ.1.1.50 (2023-02-10)
- BF.5.4 (2023-02-10)
- XBB.1.14 (2023-02-10)
- BQ.1.1.51 (2023-02-10)
- XBF.1 (2023-02-11)
- XBF.2 (2023-02-11)
- XBF.3 (2023-02-11)
- XBC.1.3 (2023-02-11)
- XBC.1.4 (2023-02-11)
- XAY.2.1 (2023-02-11)
- XAY.2.2 (2023-02-11)
- XAY.1.1.1 (2023-02-11)
- BW.1.2 (2023-02-11)
- BW.1.1.1 (2023-02-11)
- BW.1.1.2 (2023-02-11)
- BQ.1.29 (2023-02-11)
- BQ.1.15.2 (2023-02-11)
- BQ.1.15.1 (2023-02-11)
- BQ.1.1.52 (2023-02-11)
- EA.1 (2023-02-11)
- EA.2 (2023-02-11)
- BQ.1.10.2 (2023-02-11)
- CL.1.2 (2023-02-11)
- CL.1.3 (2023-02-11)
- CL.1.1 (2023-02-11)
- BA.5.1.33 (2023-02-11)
- BA.5.1.34 (2023-02-11)
- BA.5.1.35 (2023-02-11)
- EB.1 (2023-02-11)
- BA.5.1.36 (2023-02-11)
- BA.5.1.37 (2023-02-11)
- BA.5.1.38 (2023-02-11)
- BF.7.16 (2023-02-11)
- BF.7.16.1 (2023-02-11)
- BF.7.17 (2023-02-11)
- BF.7.18 (2023-02-11)
- BF.35 (2023-02-11)
- BF.36 (2023-02-11)
- BF.37 (2023-02-11)
- BF.38 (2023-02-11)
- BF.38.1 (2023-02-11)
- BF.38.2 (2023-02-11)
- BF.38.3 (2023-02-11)
- BF.39 (2023-02-11)
- BF.39.1 (2023-02-11)
- BF.40 (2023-02-11)
- BF.41 (2023-02-11)
- BF.41.1 (2023-02-11)
- BA.5.2.51 (2023-02-11)
- BA.5.2.52 (2023-02-11)
- BA.5.2.53 (2023-02-11)
- BA.5.2.54 (2023-02-11)
- BA.5.2.55 (2023-02-11)
- BA.5.2.56 (2023-02-11)
- BA.5.2.57 (2023-02-11)
- BA.5.2.58 (2023-02-11)
- BA.5.2.59 (2023-02-11)
- BQ.1.30 (2023-02-11)
- EC.1 (2023-02-11)
- EC.1.1 (2023-02-11)
- BQ.1.31 (2023-02-11)
- BQ.1.10.3 (2023-02-11)
- BQ.1.1.53 (2023-02-11)
- BQ.1.1.54 (2023-02-11)
- BQ.1.1.55 (2023-02-11)
- BQ.1.1.56 (2023-02-11)
- BQ.1.1.57 (2023-02-11)
- BQ.1.1.58 (2023-02-11)
- BQ.1.1.59 (2023-02-11)
- BQ.1.1.60 (2023-02-11)
- BQ.1.1.61 (2023-02-11)
- BQ.1.1.62 (2023-02-11)
- BQ.1.1.63 (2023-02-11)
- BQ.1.1.64 (2023-02-11)
- BQ.1.1.65 (2023-02-11)
- BQ.1.1.66 (2023-02-11)
- BQ.1.1.67 (2023-02-11)
- DT.3 (2023-02-11)
- BQ.1.1.68 (2023-02-11)
- BQ.1.1.69 (2023-02-11)
- BQ.1.1.70 (2023-02-11)
- CZ.2 (2023-02-11)
- BA.5.2.60 (2023-02-12)
- BZ.2 (2023-02-12)
- BA.5.2.61 (2023-02-12)
- BA.5.2.62 (2023-02-12)
- BF.7.19 (2023-02-12)
- BF.7.20 (2023-02-12)
- BF.7.21 (2023-02-12)
- BF.7.22 (2023-02-12)
- BF.7.23 (2023-02-12)
- BF.7.24 (2023-02-12)
- BF.7.19.1 (2023-02-12)
- BF.7.26 (2023-02-12)
- ED.1 (2023-02-12)
- ED.2 (2023-02-12)
- ED.3 (2023-02-12)
- EE.1 (2023-02-12)
- EE.2 (2023-02-12)
- EE.3 (2023-02-12)
- EE.4 (2023-02-12)
- EE.5 (2023-02-12)
- EF.1 (2023-02-12)
- EF.2 (2023-02-12)
- EF.1.1 (2023-02-12)
- EF.1.1.1 (2023-02-12)
- EF.1.2 (2023-02-12)
- EF.1.3 (2023-02-12)
- BA.5.2.63 (2023-02-14)
- XBQ (2023-02-19)
- XBB.1.15 (2023-02-19)
- EG.1 (2023-02-19)
- XBB.1.9.3 (2023-02-19)
- XBB.2.5 (2023-02-19)
- XBF.4 (2023-02-19)
- XBK.1 (2023-02-19)
- XBR (2023-02-19)
- XBS (2023-02-19)
- XBT (2023-02-19)
- EH.1 (2023-02-19)
- XBF.5 (2023-02-19)
- XBF.6 (2023-02-19)
- XBF.7 (2023-02-19)
- XBF.7.1 (2023-02-19)
- XBF.8 (2023-02-19)
- DY.2 (2023-02-19)
- DY.3 (2023-02-19)
- DY.4 (2023-02-19)
- BF.7.14.4 (2023-02-19)
- BF.7.14.5 (2023-02-19)
- BF.7.14.6 (2023-02-19)
- BF.7.27 (2023-02-19)
- CH.1.1.10 (2023-02-19)
- CH.1.1.11 (2023-02-19)
- CH.1.1.12 (2023-02-19)
- CH.1.1.13 (2023-02-19)
- CH.1.1.14 (2023-02-19)
- BN.1.3.7 (2023-02-19)
- BN.1.3.6 (2023-02-19)
- BN.1.3.8 (2023-02-19)
- EJ.1 (2023-02-19)
- EJ.2 (2023-02-19)
- BN.1.2.2 (2023-02-19)
- BN.1.2.3 (2023-02-19)
- BN.1.2.4 (2023-02-19)
- BN.1.5.2 (2023-02-19)
- BN.1.4.2 (2023-02-19)
- BN.1.4.3 (2023-02-19)
- BN.1.4.4 (2023-02-19)
- BN.1.4.5 (2023-02-19)
- DS.3 (2023-02-19)
- XBB.1.5.11 (2023-02-19)
- XBB.1.5.12 (2023-02-19)
- XBB.1.5.13 (2023-02-19)
- EK.1 (2023-02-19)
- DV.4 (2023-02-19)
- XBB.1.5.14 (2023-02-21)
- EL.1 (2023-02-21)
- DB.3 (2023-02-23)
- DV.3.1 (2023-02-24)
- BF.7.14.7 (2023-02-24)
New dataset version (tag 2023-02-03T12:00:00Z
)
RSV A and B data sets
- fix definition of some older clades
- include older sequences to make sure older clades are included.
2023-02-01
New dataset version (tag 2023-02-01T12:00:00Z
)
SARS-CoV-2 datasets
-
Change: Values output into the
clade
column of Nextclade csv/tsv files change from composite type:20H (Beta, V2)
to simple year-letter:20H
. If you do not want this change, simply use the columnclade_legacy
that is also output into the csv/tsv files. Other clade types are unchanged:clade_nextstrain
(now same asclade
),clade_who
andNextclade_pango
. This migration was first mentioned three months ago in release 2022-10-31 -
New Nextstrain clade
23A
added, equivalent to Pango lineageXBB.1.5
, see nextstrain/ncov#1043 for details -
Data update: 55 new Pango lineages, with designation date between 2023-01-10 and 2023-01-31, are now included, unfold below to see all the lineages:
Newly included lineages, with designation date in parentheses
- DU.1 (2023-01-12)
- CH.1.1.5 (2023-01-13)
- BQ.1.1.35 (2023-01-14)
- DV.1 (2023-01-14)
- BQ.1.1.36 (2023-01-14)
- XBN (2023-01-14)
- BA.5.1.32 (2023-01-15)
- BA.5.2.50 (2023-01-21)
- DW.1 (2023-01-23)
- BQ.1.1.37 (2023-01-23)
- BQ.1.1.38 (2023-01-24)
- BF.7.14.1 (2023-01-25)
- BQ.1.11.1 (2023-01-26)
- BR.5 (2023-01-27)
- BQ.1.1.39 (2023-01-27)
- BQ.1.2.1 (2023-01-28)
- DV.2 (2023-01-28)
- DV.3 (2023-01-28)
- CH.1.1.6 (2023-01-28)
- CH.1.1.7 (2023-01-28)
- CH.1.1.8 (2023-01-28)
- CH.1.1.9 (2023-01-28)
- BN.1.10 (2023-01-28)
- BN.1.11 (2023-01-28)
- BQ.1.1.40 (2023-01-28)
- BQ.1.1.41 (2023-01-28)
- BQ.1.1.42 (2023-01-28)
- BQ.1.1.43 (2023-01-28)
- BQ.1.1.44 (2023-01-28)
- BQ.1.1.45 (2023-01-28)
- BQ.1.1.46 (2023-01-28)
- BQ.1.1.47 (2023-01-28)
- CM.8.1.2 (2023-01-28)
- CM.8.1.1 (2023-01-28)
- DS.2 (2023-01-28)
- XBB.1.10 (2023-01-28)
- DY.1 (2023-01-28)
- BF.7.14.2 (2023-01-28)
- BF.7.14.3 (2023-01-28)
- DZ.1 (2023-01-28)
- XBB.1.5.1 (2023-01-28)
- XBB.1.5.2 (2023-01-29)
- XBB.1.5.3 (2023-01-29)
- DN.1.1.1 (2023-01-30)
- DN.1.1.2 (2023-01-30)
- XBP (2023-01-30)
- BQ.1.1.48 (2023-01-30)
- BQ.1.1.49 (2023-01-30)
- DZ.2 (2023-01-31)
- XBB.7 (2023-01-31)
- XBB.2.3 (2023-01-31)
- XBB.1.11 (2023-01-31)
- XBB.1.11.1 (2023-01-31)
- DY.1.1 (2023-01-31)
- DJ.1.1.1 (2023-01-31)
Seasonal flu datasets
- removes a synonymous mutation from the definition of A/H3N2 clade 2a.3b. Some viruses that should be in this clade didn't have this change.
- adds glycosylation to the remaining flu data sets
2023-01-27
Seasonal flu datasets
New dataset version (tag 2023-01-27T12:00:00Z
)
- fixes the omitted A/H3N2 clade 2d (very rare, had dropped out)
- adds more contextual sequences to the trees
- adds NA datasets for A/H3N2, A/H1N1pdm, B/Vic
Monkeypox datasets
New dataset version (tag 2023-01-26T12:00:00Z
)
- New monkeypox lineages B.1.15, B.1.16, and B.1.17 were added to the datasets, see mpxv-lineages/lineage-designation#31 for details on these lineages.
2023-01-19
Influenza datasets
New clade definitions for default influenza datasets (tag 2023-01-19T12:00:00Z)
The default influenza datasets were updated to include recent consensus on clade definitions and more recent sequences in their reference tree to better reflect current circulation. In addition, these datasets contain a short_clade column which omits the long prefix and definition of glycosylation motifs.
2023-01-09
All SARS-CoV-2 datasets
New dataset version (tag 2023-01-09T12:00:00Z
)
-
Data update: 71 new Pango lineages, with designation date between 2022-12-14 and 2023-01-09 are now included, unfold below to see all the lineages:
Newly included lineages, with designation date in parentheses
- CJ.1.1 (2022-12-14)
- CM.5.2 (2022-12-15)
- CM.4.1 (2022-12-15)
- CN.2 (2022-12-15)
- BE.10 (2022-12-15)
- XBK (2022-12-15)
- CH.3.1 (2022-12-15)
- CH.1.1.3 (2022-12-15)
- XBB.1.6 (2022-12-16)
- CR.1.3 (2022-12-16)
- BF.10.1 (2022-12-18)
- BQ.1.25.1 (2022-12-21)
- BN.1.3.2 (2022-12-21)
- BN.1.3.3 (2022-12-22)
- XBB.3.2 (2022-12-22)
- XBB.2.1 (2022-12-22)
- XBB.2.2 (2022-12-22)
- XBB.1.7 (2022-12-22)
- DN.1 (2022-12-22)
- BQ.1.1.29 (2022-12-22)
- BQ.1.27 (2022-12-22)
- BQ.1.1.30 (2022-12-22)
- DJ.1.3 (2022-12-23)
- BQ.1.1.31 (2022-12-24)
- DP.1 (2022-12-24)
- BN.1.3.4 (2022-12-24)
- XBB.3.3 (2022-12-24)
- DN.1.1 (2022-12-25)
- BQ.1.13.1 (2022-12-25)
- BF.5.1 (2022-12-27)
- BF.5.2 (2022-12-27)
- CK.1.1 (2022-12-29)
- BA.5.2.46 (2022-12-30)
- BQ.1.28 (2022-12-31)
- BQ.1.1.32 (2022-12-31)
- BQ.1.1.33 (2022-12-31)
- DF.1.1 (2023-01-01)
- BA.5.2.47 (2023-01-01)
- DQ.1 (2023-01-01)
- DR.1 (2023-01-03)
- BF.7.14 (2023-01-06)
- BA.5.2.48 (2023-01-06)
- DS.1 (2023-01-07)
- CM.10 (2023-01-07)
- XBC.1.1 (2023-01-07)
- XBC.1.1.1 (2023-01-07)
- XBC.1.2 (2023-01-07)
- XBC.1.2.1 (2023-01-07)
- XBB.1.8 (2023-01-07)
- BL.6 (2023-01-07)
- CH.1.1.4 (2023-01-07)
- BF.7.15 (2023-01-09)
- XBL (2023-01-09)
- CM.11 (2023-01-09)
- DT.1 (2023-01-09)
- BQ.1.1.34 (2023-01-09)
- CM.12 (2023-01-09)
- CK.1.2 (2023-01-09)
- BA.2.3.22 (2023-01-09)
- XBM (2023-01-09)
- BM.1.1.4 (2023-01-09)
- BM.1.1.5 (2023-01-09)
- BN.1.4.1 (2023-01-09)
- XBB.6 (2023-01-09)
- XBB.6.1 (2023-01-09)
- XBB.1.9 (2023-01-09)
- XBB.1.9.1 (2023-01-09)
- BA.5.2.49 (2023-01-09)
- XAY.3 (2023-01-09)
- XAY.1.2 (2023-01-09)
- BN.1.5.1 (2023-01-09)
2022-12-22
Addition of RSV A and RSV B datasets
New dataset version (tag 2022-12-20T22:00:12Z
)
First release of RSV A and RSV A datasets by Laura Urbanska.
With permission of the authors, these datasets use the reference sequences hRSV/A/England/397/2017
for RSV-A and hRSV/B/Australia/VIC-RCH056/2019
for RSV-B.
The datasets implement two clade designations each.
One is primarily based on the G gene and was proposed by Goya et al, the other is based on the entire genome and was proposed by Ramaekers et al.
2022-12-14
All SARS-CoV-2 datasets
New dataset version (tag 2022-12-14T12:00:00Z
)
-
Data update: 28 new Pango lineages, with designation date between 2022-11-14 and 2022-12-10 are now included, unfold below to see all the lineages:
28 new Pango lineages included in this release, with designation date in parentheses
- XBG (2022-11-14)
- BA.5.1.31 (2022-11-15)
- XBH (2022-11-16)
- BW.1.1 (2022-11-20)
- BN.1.8 (2022-11-22)
- BQ.1.1.25 (2022-11-22)
- CM.2.1 (2022-11-22)
- DJ.1 (2022-11-23)
- DJ.1.1 (2022-11-23)
- BA.5.2.42 (2022-11-23)
- XBB.1.4.1 (2022-11-25)
- BA.5.2.43 (2022-11-26)
- BN.1.9 (2022-11-28)
- CH.1.1.1 (2022-11-29)
- CH.1.1.2 (2022-11-29)
- BA.5.2.44 (2022-11-29)
- DK.1 (2022-11-30)
- BQ.1.1.26 (2022-12-01)
- XBJ (2022-12-01)
- CH.3 (2022-12-01)
- BQ.1.1.27 (2022-12-02)
- DL.1 (2022-12-03)
- BA.5.2.45 (2022-12-03)
- BQ.1.1.28 (2022-12-04)
- BQ.1.26.1 (2022-12-04)
- CV.2 (2022-12-06)
- DM.1 (2022-12-07)
- DJ.1.2 (2022-12-10)
-
Added 5 new XBB.1.5 example sequences