-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
87a0c76
commit 5bf6e4b
Showing
5 changed files
with
108 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
================================= | ||
Viewing previous versions of data | ||
================================= | ||
|
||
|
||
.. contents:: Sections in this document | ||
:local: | ||
:depth: 2 | ||
|
||
TODO | ||
==== | ||
* Change URLs to nextstrain.org once it's in production | ||
* Reduce size of screenshot png | ||
* Fill in TKTK sections | ||
|
||
Overview | ||
======== | ||
|
||
Analyses are a snapshot in time, and for most of our `core Nextstrain datasets | ||
<https://nextstrain.org/pathogens>`__ we update this snapshot frequently, often | ||
even daily. When you view a dataset such as the seasonal influenza build | ||
`flu/seasonal/h3n2/ha/6y <https://dev.nextstrain.org/flu/seasonal/h3n2/ha/6y>`__ | ||
you can see in the header of the page that it's been updated sometime in the | ||
last week or so. Because we update this every week, we have a large archive of | ||
past updates which we can go view. If we want to view the snapshot from mid way | ||
through 2023, we could load `flu/seasonal/h3n2/ha/6y@2023-07-01 | ||
<https://dev.nextstrain.org/flu/seasonal/h3n2/ha/6y@2023-07-01>`__, which will | ||
load the latest available snapshot on July 1st, which was a dataset updated on | ||
June 30. | ||
|
||
In general, appending a ``@YYYY-MM-DD`` string to a Nextstrain core dataset URL | ||
will load the dataset that was the latest available at that particular date. | ||
|
||
.. note:: | ||
|
||
This functionality is newly introduced in 2024 and is currently only available | ||
for core Nextstrain datasets. There is not yet a way to see a list / visualise | ||
all the available datasets, but this is in the works. | ||
|
||
|
||
Tanglegrams to compare changes | ||
------------------------------ | ||
|
||
Using tanglegrams allows us to easily view two different versions of the same | ||
dataset side-by-side. Using the above examples we can view the latest dataset | ||
against the one from the middle of 2023 via the URL | ||
`flu/seasonal/h3n2/ha/6y:flu/seasonal/h3n2/ha/6y@2023-07-01 | ||
<https://dev.nextstrain.org/flu/seasonal/h3n2/ha/6y:flu/seasonal/h3n2/ha/6y@2023-07-01>`__. Here's a screenshot of this taken in early January 2024, allowing us to see the expansion of clade | ||
2a.3a.1 over the past 6 months: | ||
|
||
.. image:: ../images/tanglegram-h3n2.png | ||
:alt: Tanglegram of flu/seasonal/h3n2/ha/6y:flu/seasonal/h3n2/ha/6y@2023-07-01 | ||
|
||
Over time, the data shown by this URL link will start to change as we update the dataset, but by versioning both datasets we can preserve this particular view into the data: | ||
`flu/seasonal/h3n2/ha/6y@2024-01-03:flu/seasonal/h3n2/ha/6y@2023-07-01 | ||
<https://dev.nextstrain.org/flu/seasonal/h3n2/ha/6y@2024-01-03:flu/seasonal/h3n2/ha/6y@2023-07-01>`__. | ||
|
||
|
||
|
||
Details for dataset maintainers | ||
=============================== | ||
|
||
This section is more technical and aimed primarily at those managing datasets. | ||
|
||
|
||
S3 Delete Markers | ||
----------------- | ||
Our core datasets are all stored in a versioned S3 bucket, which is how we are | ||
able to provide this functionality. When files are "deleted" from a versioned | ||
bucket, the normal behaviour is to preserve the file but add a `delete marker | ||
<https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html>`__. | ||
When looking back at versions over time, we interpret the intended behaviour of | ||
a delete marker as removing the then-latest file from history, so it wont be | ||
available via any ``@YYYY-MM-DD`` value. | ||
|
||
.. image:: ../images/delete-markers.png | ||
|
||
|
||
How far back does this go? | ||
-------------------------- | ||
Around August 2018. Dataset dependent. TKTK. | ||
https://dev.nextstrain.org/flu/seasonal/h3n2/ha/3y@2018-08-01 is the earliest I could find. | ||
|
||
What about if the URL changed over time? | ||
---------------------------------------- | ||
|
||
We don't currently track this, but this is possible to implement when/if we want to do so. TKTK | ||
|
||
SARS-CoV-2 datestamped datasets | ||
------------------------------- | ||
|
||
TKTK | ||
|
||
Multiple datasets uploaded on the same day | ||
------------------------------------------ | ||
|
||
A day is UTC. Earliest are ignored. TKTK | ||
|
||
|
||
Sidecar files | ||
------------- | ||
Must be uploaded the same day. TKTK | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters