diff --git a/tutorial/part3/data_exploitability_pangeo.ipynb b/tutorial/part3/data_exploitability_pangeo.ipynb index 9f02b7a..8109ec9 100644 --- a/tutorial/part3/data_exploitability_pangeo.ipynb +++ b/tutorial/part3/data_exploitability_pangeo.ipynb @@ -44,7 +44,8 @@ "### Relevant resources\n", "\n", "* More information on Pangeo can be found here: https://pangeo.io/\n", - "* More information on the STAC specification can be found here: https://stacspec.org/\n" + "* More information on the STAC specification can be found here: https://stacspec.org/\n", + "* More examples on how to use xarray can be found here: https://tutorial.xarray.dev/en/latest/\n" ] }, { @@ -59,9 +60,6 @@ "execution_count": null, "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false - }, "tags": [] }, "outputs": [], @@ -159,9 +157,6 @@ "execution_count": null, "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false - }, "tags": [] }, "outputs": [], @@ -183,9 +178,6 @@ "execution_count": null, "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false - }, "tags": [] }, "outputs": [], @@ -232,9 +224,6 @@ "execution_count": null, "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false - }, "tags": [] }, "outputs": [], @@ -264,9 +253,6 @@ "execution_count": null, "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false - }, "tags": [] }, "outputs": [], @@ -277,6 +263,7 @@ " intersects=aoi_geojson,\n", " collections=[\"sentinel-2-l2a\"],\n", " datetime=\"2019-02-01/2019-06-10\"\n", + " # query={\"eo:cloud_cover\": {\"lt\": 60}}, # uncomment to filter by cloud cover\n", ").item_collection()\n", "len(items)" ] @@ -293,10 +280,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -312,7 +296,7 @@ "source": [ "#### Load data\n", "We will use the stackstac library to load the data. The stackstac library is a library that allows loading data from a STAC API into an xarray dataset.\n", - "Here we will load the green and swir16 bands, which are the bands we will use to calculate the snow cover. We will also load the scl band, which is the scene classification layer, which we will use to mask out clouds.\n", + "Here we will load the green and swir16 bands (on the original dataset named B03 and B11), which are the bands we will use to calculate the snow cover. We will also load the scl band, which is the scene classification layer, which we will use to mask out clouds.\n", "Spatial resolution of 20m is selected for the analysis. The data is loaded in chunks of 2048x2048 pixels.\n", "\n", "[Stackstac](https://stackstac.readthedocs.io/en/latest/) is not the only way to create a xarray dataset from a STAC API. Other libraries can be used, such as [xpystac](https://github.com/stac-utils/xpystac) or [odc.stac](https://github.com/opendatacube/odc-stac). The choice of the library depends on the use case and specific needs." @@ -322,19 +306,43 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], + "source": [ + "stackstac.stack(items)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "When the results of the STAC query are compiled into an xarray dataset, the result is a four-dimensional dataset: time, band, x, and y. The 'band' dimension comprises the various spectral bands, while 'x' and 'y' dimensions represent the spatial information. By examining the dataset's visual representation, we can quickly estimate its total size. Without any filtering, we expect the dataset to be around 5.42 terabytes.\n", + "\n", + "Since we require only certain bands and are focused on the Area of Interest (AOI), we will apply additional filters to the dataset to pare down the data volume to what is strictly necessary.\n", + "\n", + "- The 'bounds_latlon' parameter defines the Area of Interest with four values: the minimum and maximum longitudes and latitudes. We will input the catchment's boundaries to set our area of interest.\n", + "- The 'resolution' parameter determines the dataset's spatial resolution, requiring a single value. We will select a resolution of 20 meters.\n", + "- The 'chunksize' parameter sets the dimensions for data chunking, accepting one value to define chunk size. We will opt for chunks that are 2048 by 2048 pixels. GDAL will handle the data chunking during the loading process as per our specifications.\n", + "- Lastly, the 'assets' parameter selects the data bands to be loaded, requiring a list of the band names as strings. We will load the 'green' and 'swir16' bands for snow cover analysis, along with the 'scl' band, the scene classification layer, to filter out clouds\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], "source": [ "ds = stackstac.stack(items,\n", " bounds_latlon=aoi.iloc[0].geometry.bounds,\n", " resolution=20,\n", " chunksize=2048,\n", " assets=['green', 'swir16', 'scl'])" - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", @@ -351,10 +359,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -374,10 +379,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -411,10 +413,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -434,10 +433,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -460,7 +456,7 @@ }, "outputs": [], "source": [ - "mask = np.logical_not(scl.isin([8, 9, 3])) # alternative you can use the mask = (scl != 8) & (scl != 9) & (scl != 3) \n", + "mask = np.logical_not(scl.isin([8, 9, 3])) \n", "snow_cloud = xr.where(mask, snowmap, 2)" ] }, @@ -479,10 +475,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -503,10 +496,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -525,10 +515,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -546,10 +533,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -567,7 +551,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Data aggregation is a very important step in the analysis. It allows to reduce the amount of data and to make the analysis more efficient. Moreover as in this case we are going to aggregate the date to daily values, this will allow use to compute statistic on the data at the basin scale later on.\n", + "Data aggregation is a very important step in the analysis. It allows to reduce the amount of data and to make the analysis more efficient. Moreover, as in this case, we are going to aggregate the date to daily values, this will allow use to compute statistic on the data at the basin scale later on.\n", "\n", "The `groupby` method allows to group the data by a specific dimension. We will group the data by the time dimension, aggregating to the date and removing the time information, once the group is obtained we will aggregate the data by taking the maximum value." ] @@ -576,10 +560,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -590,10 +571,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -611,10 +589,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -625,10 +600,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -649,10 +621,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -802,10 +771,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -817,10 +783,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -832,10 +795,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -847,10 +807,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -873,10 +830,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -887,10 +841,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -901,10 +852,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -916,10 +864,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -931,10 +876,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -945,10 +887,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -967,10 +906,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -988,10 +924,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [ @@ -1005,10 +938,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "outputs": [], "source": [