Skip to content

Releases: nextstrain/nextclade_data

2023-04-02

03 Apr 15:26
2ab8a4f
Compare
Choose a tag to compare

2023-04-02

New dataset version (tag 2023-04-02T12:00:00Z)

Influenza virus datasets

All Influenza virus datasets were updated with more recent sequences. The trees now include more older reference viruses for more robust designation of older clades.
The B/Vic annotation of the HA segment was fixed - it was previously off by 3 nucleotides resulting in amino acid numbering being off by one.

2023-03-28

28 Mar 06:38
Compare
Choose a tag to compare

Internal

  • A typo in the configuration of the default reference sequence configuration for Flu H1N1pdm NA dataset prevented the dataset from being downloaded by Nextclade CLI. This is now fixed. This only affects dataset server infrastruture (index file) and does not change dataset files, so no new version of any dataset is released. See #70.

2023-03-16

21 Mar 09:18
94cff46
Compare
Choose a tag to compare

New dataset version (tag 2023-03-16T12:00:00Z)

SARS-CoV-2 datasets

  • Placement priors: Every tree node is now annotated with a placement_prior, an approximate probability (on log10 scale) that a random sequence is attached to this node. For this dataset, the prior was caluclated after placing 300k sequences on the tree. A value of -10 is chosen when no sequence in the sample attached to a node. The placement priors will improve placement accuracy of incomplete sequences (such as Spike only) - but only with a recent version of Nextclade (probably 2.13.0 and above). In that release, we will introduce a new placement tie-breaking feature: when a query sequence can attach to multiple nodes with equal number of mismatches, the sequence will be attached to the reference tree node with the highest prior. This is in contrast to the previous naive tie breaking logic which always chose the node with the fewest number of parent nodes. This lead to a bias towards attaching to recombinants. See neherlab/nextclade_data_workflows#38 for the code calculating the placement priors, and nextstrain/nextclade#1119 to see how the priors are used in Nextclade.
  • Pango lineages desiganted between 2023-02-24 and 2023-03-15 are now included, unfold below to see a list of them:
Newly included lineages, with designation date in parentheses
  • XBB.2.6 (2023-02-26)
  • XBB.8 (2023-02-26)
  • EM.1 (2023-02-26)
  • XBB.1.5.15 (2023-02-26)
  • EK.2 (2023-02-26)
  • XBB.1.5.16 (2023-02-26)
  • XBB.1.5.17 (2023-02-26)
  • XBB.1.5.18 (2023-02-26)
  • XBB.1.5.19 (2023-02-26)
  • XBB.1.5.20 (2023-02-26)
  • XBB.1.5.21 (2023-02-26)
  • EN.1 (2023-02-26)
  • EP.2 (2023-02-26)
  • EP.1 (2023-02-26)
  • XBC.1.5 (2023-02-26)
  • EQ.1 (2023-02-28)
  • CY.2 (2023-02-28)
  • CP.7 (2023-03-02)
  • BQ.1.1.71 (2023-03-03)
  • XBB.1.16 (2023-03-05)
  • ER.1 (2023-03-05)
  • ER.1.1 (2023-03-07)
  • ES.1 (2023-03-09)
  • CH.1.1.15 (2023-03-09)
  • BF.7.4.3 (2023-03-10)
  • BQ.1.32 (2023-03-11)

Internal

Add robots.txt to prevent data endpoints from indexing by search engines.

2023-02-25

27 Feb 18:48
8a56716
Compare
Choose a tag to compare

New dataset version (tag 2023-02-25T12:00:00Z)

SARS-CoV-2 datasets

  • Recombinant trees are now built using IQtree which results in shared mutations on internal nodes. In the past, recombinant trees were purely based on the Pango hierarchy. This change makes lineage assignment for recombinants more robust and reduces the number of private mutations. See e.g. the XBB tree here: https://next.nextstrain.org/staging/nextclade/sars-cov-2/21L?label=clade:22F
  • Known stop codons and frame shifts were updated to reduce the number of false positive warnings
  • The set of labeled mutations was updated (in virus_properties.json), including the addition of characteristic 23A mutations. This will help identifying recombinants.
  • Pango lineages desiganted between 2023-02-01 and 2022-02-24 are now included, unfold below to see a list of them:
Newly included lineages, with designation date in parentheses
  • BN.1.3.5 (2023-02-02)
  • XBB.1.9.2 (2023-02-03)
  • XBB.2.4 (2023-02-03)
  • DT.2 (2023-02-03)
  • BA.4.1.11 (2023-02-03)
  • BF.5.3 (2023-02-08)
  • XBB.1.12 (2023-02-09)
  • XBB.1.5.4 (2023-02-10)
  • XBB.1.5.5 (2023-02-10)
  • XBB.1.5.6 (2023-02-10)
  • XBB.1.5.7 (2023-02-10)
  • XBB.1.5.8 (2023-02-10)
  • XBB.1.5.9 (2023-02-10)
  • XBB.1.5.10 (2023-02-10)
  • XBB.1.13 (2023-02-10)
  • BQ.1.1.50 (2023-02-10)
  • BF.5.4 (2023-02-10)
  • XBB.1.14 (2023-02-10)
  • BQ.1.1.51 (2023-02-10)
  • XBF.1 (2023-02-11)
  • XBF.2 (2023-02-11)
  • XBF.3 (2023-02-11)
  • XBC.1.3 (2023-02-11)
  • XBC.1.4 (2023-02-11)
  • XAY.2.1 (2023-02-11)
  • XAY.2.2 (2023-02-11)
  • XAY.1.1.1 (2023-02-11)
  • BW.1.2 (2023-02-11)
  • BW.1.1.1 (2023-02-11)
  • BW.1.1.2 (2023-02-11)
  • BQ.1.29 (2023-02-11)
  • BQ.1.15.2 (2023-02-11)
  • BQ.1.15.1 (2023-02-11)
  • BQ.1.1.52 (2023-02-11)
  • EA.1 (2023-02-11)
  • EA.2 (2023-02-11)
  • BQ.1.10.2 (2023-02-11)
  • CL.1.2 (2023-02-11)
  • CL.1.3 (2023-02-11)
  • CL.1.1 (2023-02-11)
  • BA.5.1.33 (2023-02-11)
  • BA.5.1.34 (2023-02-11)
  • BA.5.1.35 (2023-02-11)
  • EB.1 (2023-02-11)
  • BA.5.1.36 (2023-02-11)
  • BA.5.1.37 (2023-02-11)
  • BA.5.1.38 (2023-02-11)
  • BF.7.16 (2023-02-11)
  • BF.7.16.1 (2023-02-11)
  • BF.7.17 (2023-02-11)
  • BF.7.18 (2023-02-11)
  • BF.35 (2023-02-11)
  • BF.36 (2023-02-11)
  • BF.37 (2023-02-11)
  • BF.38 (2023-02-11)
  • BF.38.1 (2023-02-11)
  • BF.38.2 (2023-02-11)
  • BF.38.3 (2023-02-11)
  • BF.39 (2023-02-11)
  • BF.39.1 (2023-02-11)
  • BF.40 (2023-02-11)
  • BF.41 (2023-02-11)
  • BF.41.1 (2023-02-11)
  • BA.5.2.51 (2023-02-11)
  • BA.5.2.52 (2023-02-11)
  • BA.5.2.53 (2023-02-11)
  • BA.5.2.54 (2023-02-11)
  • BA.5.2.55 (2023-02-11)
  • BA.5.2.56 (2023-02-11)
  • BA.5.2.57 (2023-02-11)
  • BA.5.2.58 (2023-02-11)
  • BA.5.2.59 (2023-02-11)
  • BQ.1.30 (2023-02-11)
  • EC.1 (2023-02-11)
  • EC.1.1 (2023-02-11)
  • BQ.1.31 (2023-02-11)
  • BQ.1.10.3 (2023-02-11)
  • BQ.1.1.53 (2023-02-11)
  • BQ.1.1.54 (2023-02-11)
  • BQ.1.1.55 (2023-02-11)
  • BQ.1.1.56 (2023-02-11)
  • BQ.1.1.57 (2023-02-11)
  • BQ.1.1.58 (2023-02-11)
  • BQ.1.1.59 (2023-02-11)
  • BQ.1.1.60 (2023-02-11)
  • BQ.1.1.61 (2023-02-11)
  • BQ.1.1.62 (2023-02-11)
  • BQ.1.1.63 (2023-02-11)
  • BQ.1.1.64 (2023-02-11)
  • BQ.1.1.65 (2023-02-11)
  • BQ.1.1.66 (2023-02-11)
  • BQ.1.1.67 (2023-02-11)
  • DT.3 (2023-02-11)
  • BQ.1.1.68 (2023-02-11)
  • BQ.1.1.69 (2023-02-11)
  • BQ.1.1.70 (2023-02-11)
  • CZ.2 (2023-02-11)
  • BA.5.2.60 (2023-02-12)
  • BZ.2 (2023-02-12)
  • BA.5.2.61 (2023-02-12)
  • BA.5.2.62 (2023-02-12)
  • BF.7.19 (2023-02-12)
  • BF.7.20 (2023-02-12)
  • BF.7.21 (2023-02-12)
  • BF.7.22 (2023-02-12)
  • BF.7.23 (2023-02-12)
  • BF.7.24 (2023-02-12)
  • BF.7.19.1 (2023-02-12)
  • BF.7.26 (2023-02-12)
  • ED.1 (2023-02-12)
  • ED.2 (2023-02-12)
  • ED.3 (2023-02-12)
  • EE.1 (2023-02-12)
  • EE.2 (2023-02-12)
  • EE.3 (2023-02-12)
  • EE.4 (2023-02-12)
  • EE.5 (2023-02-12)
  • EF.1 (2023-02-12)
  • EF.2 (2023-02-12)
  • EF.1.1 (2023-02-12)
  • EF.1.1.1 (2023-02-12)
  • EF.1.2 (2023-02-12)
  • EF.1.3 (2023-02-12)
  • BA.5.2.63 (2023-02-14)
  • XBQ (2023-02-19)
  • XBB.1.15 (2023-02-19)
  • EG.1 (2023-02-19)
  • XBB.1.9.3 (2023-02-19)
  • XBB.2.5 (2023-02-19)
  • XBF.4 (2023-02-19)
  • XBK.1 (2023-02-19)
  • XBR (2023-02-19)
  • XBS (2023-02-19)
  • XBT (2023-02-19)
  • EH.1 (2023-02-19)
  • XBF.5 (2023-02-19)
  • XBF.6 (2023-02-19)
  • XBF.7 (2023-02-19)
  • XBF.7.1 (2023-02-19)
  • XBF.8 (2023-02-19)
  • DY.2 (2023-02-19)
  • DY.3 (2023-02-19)
  • DY.4 (2023-02-19)
  • BF.7.14.4 (2023-02-19)
  • BF.7.14.5 (2023-02-19)
  • BF.7.14.6 (2023-02-19)
  • BF.7.27 (2023-02-19)
  • CH.1.1.10 (2023-02-19)
  • CH.1.1.11 (2023-02-19)
  • CH.1.1.12 (2023-02-19)
  • CH.1.1.13 (2023-02-19)
  • CH.1.1.14 (2023-02-19)
  • BN.1.3.7 (2023-02-19)
  • BN.1.3.6 (2023-02-19)
  • BN.1.3.8 (2023-02-19)
  • EJ.1 (2023-02-19)
  • EJ.2 (2023-02-19)
  • BN.1.2.2 (2023-02-19)
  • BN.1.2.3 (2023-02-19)
  • BN.1.2.4 (2023-02-19)
  • BN.1.5.2 (2023-02-19)
  • BN.1.4.2 (2023-02-19)
  • BN.1.4.3 (2023-02-19)
  • BN.1.4.4 (2023-02-19)
  • BN.1.4.5 (2023-02-19)
  • DS.3 (2023-02-19)
  • XBB.1.5.11 (2023-02-19)
  • XBB.1.5.12 (2023-02-19)
  • XBB.1.5.13 (2023-02-19)
  • EK.1 (2023-02-19)
  • DV.4 (2023-02-19)
  • XBB.1.5.14 (2023-02-21)
  • EL.1 (2023-02-21)
  • DB.3 (2023-02-23)
  • DV.3.1 (2023-02-24)
  • BF.7.14.7 (2023-02-24)

New dataset version (tag 2023-02-03T12:00:00Z)

RSV A and B data sets

  • fix definition of some older clades
  • include older sequences to make sure older clades are included.

2023-02-01

01 Feb 17:52
Compare
Choose a tag to compare

New dataset version (tag 2023-02-01T12:00:00Z)

SARS-CoV-2 datasets

  • Change: Values output into the clade column of Nextclade csv/tsv files change from composite type: 20H (Beta, V2) to simple year-letter: 20H. If you do not want this change, simply use the column clade_legacy that is also output into the csv/tsv files. Other clade types are unchanged: clade_nextstrain (now same as clade), clade_who and Nextclade_pango. This migration was first mentioned three months ago in release 2022-10-31

  • New Nextstrain clade 23A added, equivalent to Pango lineage XBB.1.5, see nextstrain/ncov#1043 for details

  • Data update: 55 new Pango lineages, with designation date between 2023-01-10 and 2023-01-31, are now included, unfold below to see all the lineages:

    Newly included lineages, with designation date in parentheses
    • DU.1 (2023-01-12)
    • CH.1.1.5 (2023-01-13)
    • BQ.1.1.35 (2023-01-14)
    • DV.1 (2023-01-14)
    • BQ.1.1.36 (2023-01-14)
    • XBN (2023-01-14)
    • BA.5.1.32 (2023-01-15)
    • BA.5.2.50 (2023-01-21)
    • DW.1 (2023-01-23)
    • BQ.1.1.37 (2023-01-23)
    • BQ.1.1.38 (2023-01-24)
    • BF.7.14.1 (2023-01-25)
    • BQ.1.11.1 (2023-01-26)
    • BR.5 (2023-01-27)
    • BQ.1.1.39 (2023-01-27)
    • BQ.1.2.1 (2023-01-28)
    • DV.2 (2023-01-28)
    • DV.3 (2023-01-28)
    • CH.1.1.6 (2023-01-28)
    • CH.1.1.7 (2023-01-28)
    • CH.1.1.8 (2023-01-28)
    • CH.1.1.9 (2023-01-28)
    • BN.1.10 (2023-01-28)
    • BN.1.11 (2023-01-28)
    • BQ.1.1.40 (2023-01-28)
    • BQ.1.1.41 (2023-01-28)
    • BQ.1.1.42 (2023-01-28)
    • BQ.1.1.43 (2023-01-28)
    • BQ.1.1.44 (2023-01-28)
    • BQ.1.1.45 (2023-01-28)
    • BQ.1.1.46 (2023-01-28)
    • BQ.1.1.47 (2023-01-28)
    • CM.8.1.2 (2023-01-28)
    • CM.8.1.1 (2023-01-28)
    • DS.2 (2023-01-28)
    • XBB.1.10 (2023-01-28)
    • DY.1 (2023-01-28)
    • BF.7.14.2 (2023-01-28)
    • BF.7.14.3 (2023-01-28)
    • DZ.1 (2023-01-28)
    • XBB.1.5.1 (2023-01-28)
    • XBB.1.5.2 (2023-01-29)
    • XBB.1.5.3 (2023-01-29)
    • DN.1.1.1 (2023-01-30)
    • DN.1.1.2 (2023-01-30)
    • XBP (2023-01-30)
    • BQ.1.1.48 (2023-01-30)
    • BQ.1.1.49 (2023-01-30)
    • DZ.2 (2023-01-31)
    • XBB.7 (2023-01-31)
    • XBB.2.3 (2023-01-31)
    • XBB.1.11 (2023-01-31)
    • XBB.1.11.1 (2023-01-31)
    • DY.1.1 (2023-01-31)
    • DJ.1.1.1 (2023-01-31)

Seasonal flu datasets

  • removes a synonymous mutation from the definition of A/H3N2 clade 2a.3b. Some viruses that should be in this clade didn't have this change.
  • adds glycosylation to the remaining flu data sets

2023-01-27

27 Jan 15:41
Compare
Choose a tag to compare

Seasonal flu datasets

New dataset version (tag 2023-01-27T12:00:00Z)

  • fixes the omitted A/H3N2 clade 2d (very rare, had dropped out)
  • adds more contextual sequences to the trees
  • adds NA datasets for A/H3N2, A/H1N1pdm, B/Vic

Monkeypox datasets

New dataset version (tag 2023-01-26T12:00:00Z)

2023-01-19

24 Jan 13:01
bc2974a
Compare
Choose a tag to compare

Influenza datasets

New clade definitions for default influenza datasets (tag 2023-01-19T12:00:00Z)

The default influenza datasets were updated to include recent consensus on clade definitions and more recent sequences in their reference tree to better reflect current circulation. In addition, these datasets contain a short_clade column which omits the long prefix and definition of glycosylation motifs.

2023-01-09

10 Jan 02:58
d355472
Compare
Choose a tag to compare

All SARS-CoV-2 datasets

New dataset version (tag 2023-01-09T12:00:00Z)

  • Data update: 71 new Pango lineages, with designation date between 2022-12-14 and 2023-01-09 are now included, unfold below to see all the lineages:

    Newly included lineages, with designation date in parentheses
    • CJ.1.1 (2022-12-14)
    • CM.5.2 (2022-12-15)
    • CM.4.1 (2022-12-15)
    • CN.2 (2022-12-15)
    • BE.10 (2022-12-15)
    • XBK (2022-12-15)
    • CH.3.1 (2022-12-15)
    • CH.1.1.3 (2022-12-15)
    • XBB.1.6 (2022-12-16)
    • CR.1.3 (2022-12-16)
    • BF.10.1 (2022-12-18)
    • BQ.1.25.1 (2022-12-21)
    • BN.1.3.2 (2022-12-21)
    • BN.1.3.3 (2022-12-22)
    • XBB.3.2 (2022-12-22)
    • XBB.2.1 (2022-12-22)
    • XBB.2.2 (2022-12-22)
    • XBB.1.7 (2022-12-22)
    • DN.1 (2022-12-22)
    • BQ.1.1.29 (2022-12-22)
    • BQ.1.27 (2022-12-22)
    • BQ.1.1.30 (2022-12-22)
    • DJ.1.3 (2022-12-23)
    • BQ.1.1.31 (2022-12-24)
    • DP.1 (2022-12-24)
    • BN.1.3.4 (2022-12-24)
    • XBB.3.3 (2022-12-24)
    • DN.1.1 (2022-12-25)
    • BQ.1.13.1 (2022-12-25)
    • BF.5.1 (2022-12-27)
    • BF.5.2 (2022-12-27)
    • CK.1.1 (2022-12-29)
    • BA.5.2.46 (2022-12-30)
    • BQ.1.28 (2022-12-31)
    • BQ.1.1.32 (2022-12-31)
    • BQ.1.1.33 (2022-12-31)
    • DF.1.1 (2023-01-01)
    • BA.5.2.47 (2023-01-01)
    • DQ.1 (2023-01-01)
    • DR.1 (2023-01-03)
    • BF.7.14 (2023-01-06)
    • BA.5.2.48 (2023-01-06)
    • DS.1 (2023-01-07)
    • CM.10 (2023-01-07)
    • XBC.1.1 (2023-01-07)
    • XBC.1.1.1 (2023-01-07)
    • XBC.1.2 (2023-01-07)
    • XBC.1.2.1 (2023-01-07)
    • XBB.1.8 (2023-01-07)
    • BL.6 (2023-01-07)
    • CH.1.1.4 (2023-01-07)
    • BF.7.15 (2023-01-09)
    • XBL (2023-01-09)
    • CM.11 (2023-01-09)
    • DT.1 (2023-01-09)
    • BQ.1.1.34 (2023-01-09)
    • CM.12 (2023-01-09)
    • CK.1.2 (2023-01-09)
    • BA.2.3.22 (2023-01-09)
    • XBM (2023-01-09)
    • BM.1.1.4 (2023-01-09)
    • BM.1.1.5 (2023-01-09)
    • BN.1.4.1 (2023-01-09)
    • XBB.6 (2023-01-09)
    • XBB.6.1 (2023-01-09)
    • XBB.1.9 (2023-01-09)
    • XBB.1.9.1 (2023-01-09)
    • BA.5.2.49 (2023-01-09)
    • XAY.3 (2023-01-09)
    • XAY.1.2 (2023-01-09)
    • BN.1.5.1 (2023-01-09)

2022-12-22

23 Dec 17:10
Compare
Choose a tag to compare

Addition of RSV A and RSV B datasets

New dataset version (tag 2022-12-20T22:00:12Z)

First release of RSV A and RSV A datasets by Laura Urbanska.

With permission of the authors, these datasets use the reference sequences hRSV/A/England/397/2017 for RSV-A and hRSV/B/Australia/VIC-RCH056/2019 for RSV-B.

The datasets implement two clade designations each.

One is primarily based on the G gene and was proposed by Goya et al, the other is based on the entire genome and was proposed by Ramaekers et al.

2022-12-14

14 Dec 23:20
f807231
Compare
Choose a tag to compare

All SARS-CoV-2 datasets

New dataset version (tag 2022-12-14T12:00:00Z)

  • Data update: 28 new Pango lineages, with designation date between 2022-11-14 and 2022-12-10 are now included, unfold below to see all the lineages:

    28 new Pango lineages included in this release, with designation date in parentheses
    • XBG (2022-11-14)
    • BA.5.1.31 (2022-11-15)
    • XBH (2022-11-16)
    • BW.1.1 (2022-11-20)
    • BN.1.8 (2022-11-22)
    • BQ.1.1.25 (2022-11-22)
    • CM.2.1 (2022-11-22)
    • DJ.1 (2022-11-23)
    • DJ.1.1 (2022-11-23)
    • BA.5.2.42 (2022-11-23)
    • XBB.1.4.1 (2022-11-25)
    • BA.5.2.43 (2022-11-26)
    • BN.1.9 (2022-11-28)
    • CH.1.1.1 (2022-11-29)
    • CH.1.1.2 (2022-11-29)
    • BA.5.2.44 (2022-11-29)
    • DK.1 (2022-11-30)
    • BQ.1.1.26 (2022-12-01)
    • XBJ (2022-12-01)
    • CH.3 (2022-12-01)
    • BQ.1.1.27 (2022-12-02)
    • DL.1 (2022-12-03)
    • BA.5.2.45 (2022-12-03)
    • BQ.1.1.28 (2022-12-04)
    • BQ.1.26.1 (2022-12-04)
    • CV.2 (2022-12-06)
    • DM.1 (2022-12-07)
    • DJ.1.2 (2022-12-10)
  • Added 5 new XBB.1.5 example sequences