Skip to content

Commit

Permalink
complete restructure headings, check grammar and typos
Browse files Browse the repository at this point in the history
  • Loading branch information
acocac committed Nov 2, 2023
1 parent 5ef2bc9 commit 4e87cd0
Showing 1 changed file with 51 additions and 29 deletions.
80 changes: 51 additions & 29 deletions tutorial/part3/data_exploitability_pangeo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,17 @@
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Load data\n",
"\n",
"### Catchment outline"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
Expand All @@ -172,10 +183,15 @@
{
"cell_type": "markdown",
"source": [
"#### Load satellite collections\n",
"We will utilize the STAC API to search for satellite data in this exercise, specifically leveraging the API provided by AWS/Element84. The STAC API operates as a RESTful service, enabling the querying of satellite data with various filters such as spatial range, time period, and other specific metadata. This API is constructed based on the STAC specification, a collaborative, community-driven standard aimed at enhancing the discoverability and usability of satellite data. Numerous data providers, including AWS, Google Earth Engine, and Planet (Copernicus Data Space Ecosystem (CDSE) is coming soon **), among others, have implemented the STAC API, exemplifying its widespread adoption and utility in accessing diverse satellite datasets.\n",
"We will limit the serch to the Sentinel 2 L2A collection, which is a collection of Sentinel 2 data that has been processed to surface reflectance (Top Of Canopy).\n",
"### Satellite collections\n",
"\n",
"#### Search for satellite data using STAC\n",
"\n",
"We will utilize the STAC API to search for satellite data in this exercise, specifically leveraging the API provided by AWS/Element84. The STAC API operates as a RESTful service, enabling the querying of satellite data with various filters such as spatial range, time period, and other specific metadata. This API is constructed based on the STAC specification, a collaborative, community-driven standard aimed at enhancing the discoverability and usability of satellite data. Numerous data providers, including AWS, Google Earth Engine, and Planet (Copernicus Data Space Ecosystem (CDSE) is coming soon**), among others, have implemented the STAC API, exemplifying its widespread adoption and utility in accessing diverse satellite datasets.\n",
"\n",
"For the purposes of this exercise, we will limit the search to the Sentinel 2 L2A collection, which is a collection of Sentinel 2 data that has been processed to surface reflectance (Top Of Canopy).\n",
"We will also limit the search to the time period between 1st February 2019 and 10th June 2019 and to the extent of the catchment.\n",
"\n",
"** at the moment of writing the STAC catalog of the CDSE is not yet fully operational."
],
"metadata": {
Expand Down Expand Up @@ -254,9 +270,11 @@
{
"cell_type": "markdown",
"source": [
"#### Calculate snow cover\n",
"## Calculate snow cover\n",
"\n",
"We will calculate the Normalized Difference Snow Index (NDSI) to calculate the snow cover. The NDSI is calculated as the difference between the green and the swir16 bands divided by the sum of the green and the swir16 bands.\n",
"For a metter of clarity we will define the green and the swir16 bands as variables. Other approches can be used to manage the data, but this is the one we will use in this exercise."
"\n",
"For a matter of clarity we will define the green and the swir16 bands as variables. Other approaches can be used to manage the data, but this is the one we will use in this exercise."
],
"metadata": {
"collapsed": false
Expand All @@ -278,7 +296,7 @@
{
"cell_type": "markdown",
"source": [
"We will calculate the NDSI and we will mask out the clouds. We will use the scene classification layer (scl) to mask out the clouds. The scl is a layer that contains information about the type of land cover. We will mask out the clouds, which are identified by the values 8 and 9 in the scl layer."
"Let's compute the NDSI and mask out the clouds."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -318,8 +336,9 @@
{
"cell_type": "markdown",
"source": [
"We will mask out the clouds, which are identified by the values 8 and 9 in the scl layer.\n",
"More dettailed info can be found here: https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview"
"We will mask out the clouds, which are identified by the values 8 and 9 in the scene classification layer (scl). The scl contains information about the type of land cover. We will mask out the clouds, which are identified by the values 8 and 9 in the scl layer.\n",
"\n",
"More detailed info can be found here: https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview"
],
"metadata": {
"collapsed": false
Expand All @@ -343,9 +362,11 @@
{
"cell_type": "markdown",
"source": [
"#### Mask data\n",
"As we are only interestd to the snow cover in the catchment, we will mask out the data outside the catchment.\n",
"To acheive it we need to convert the catchment geometry to the same coordinate reference system as the data. The data is in the UTM32N coordinate reference system (EPSG:32632)."
"## Process snow cover data\n",
"\n",
"### Mask data\n",
"\n",
"As we are only interested to the snow cover in the catchment, we will mask out the data outside the catchment. To achieve it we need to convert the catchment geometry to the same coordinate reference system as the data. The data is in the UTM32N coordinate reference system (EPSG:32632)."
],
"metadata": {
"collapsed": false
Expand All @@ -366,8 +387,9 @@
{
"cell_type": "markdown",
"source": [
"As we are going to use the RioXarray library to mask out the data, we need to add some more information to the data. We need to specify the coordinate reference system and the nodata value. \n",
"Both informations can be found in the metadata of the data but we need to reinforce it so that RioXarray can use it. "
"As we are going to use the `RioXarray` library to mask out the data, we need to add some more information to the data. The RioXarray library is a library that allows to manipulate geospatial data in xarray datasets. Underneath it uses the rasterio library that is a library built on top of GDAL.\n",
"\n",
"We need first to specify the coordinate reference system and the nodata value. Both information can be found in the metadata of the data but we need to reinforce it so that `RioXarray` can use it."
],
"metadata": {
"collapsed": false
Expand All @@ -388,7 +410,7 @@
{
"cell_type": "markdown",
"source": [
"The clipping is done by the RioXarray library. The RioXarray library is a library that allows to manipulate geospatial data in xarray datasets. Underneath it uses the rasterio library that is a library built on top of GDAL."
"Let's clip the snow_cloud object using the catchment geometry in the UTM32N coordinate reference system."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -437,8 +459,9 @@
{
"cell_type": "markdown",
"source": [
"Data aggregation is a very important step in the analysis. It allows to reduce the amount of data and to make the analysis more efficient. Moreover as in this case we are going to aggregate the date to daily values, this will allow use to compute statistic on the data at the basin scale later on. \n",
"The groupby method allows to group the data by a specific dimension. In this case we will group the data by the time dimension, aggregating to the date and removing the time information, once the group is obtained we will aggregate the data by taking the maximum value."
"Data aggregation is a very important step in the analysis. It allows to reduce the amount of data and to make the analysis more efficient. Moreover as in this case we are going to aggregate the date to daily values, this will allow use to compute statistic on the data at the basin scale later on.\n",
"\n",
"The `groupby` method allows to group the data by a specific dimension. We will group the data by the time dimension, aggregating to the date and removing the time information, once the group is obtained we will aggregate the data by taking the maximum value."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -469,7 +492,7 @@
{
"cell_type": "markdown",
"source": [
"as the data has been aggregated to daily values, we need to rename the floor method to something more meaningfull as date."
"As the data has been aggregated to daily values, we need to rename the floor method to something more meaningful as date."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -500,9 +523,9 @@
{
"cell_type": "markdown",
"source": [
"#### Visualize data\n",
"We will use the hvplot library to visualize the data. The hvplot library is a library that allows to visualize data in xarray datasets. It is based on the holoviews library, which is a library that allows to visualize multidimensional data.\n",
"As we are going to visualize the data on a map, we need to specify the coordinate reference system of the data. The data is in the UTM32N coordinate reference system (EPSG:32632). This will allow the library to project the data on a map.\n",
"### Visualize data\n",
"We will use the `hvplot` library to visualize the data. The library allows to visualize data in `xarray` datasets. It is based on the holoviews library, which is a library that allows to visualize multidimensional data.\n",
"To visualize the data on a map, we need to specify the coordinate reference system of the data. The data is in the UTM32N coordinate reference system (EPSG:32632). This will allow the library to project the data on a map.\n",
"More info on the hvplot library can be found here: https://hvplot.holoviz.org/"
],
"metadata": {
Expand Down Expand Up @@ -533,15 +556,15 @@
{
"cell_type": "markdown",
"source": [
"### Compute statistics\n",
"## Compute statistics\n",
"\n",
"Based on the original [notebook](https://github.com/EO-College/cubes-and-clouds/blob/main/lectures/3.1_data_processing/exercises/_alternatives/31_data_processing_stac.ipynb), see the Calculate Catchment Statistics section.\n",
"\n",
"from the orinal notebook:\n",
"Calculate Catchment Statistics\n",
"We are looking at a region over time. We need to make sure that the information content meets our expected quality. Therefore, we calculate the cloud percentage for the catchment for each timestep. We use this information to filter the timeseries. All timesteps that have a cloud coverage of over 25% will be discarded.\n",
"\n",
"Ultimately we are interested in the snow covered area (SCA) within the catchment. We count all snow covered pixels within the catchment for each time step. Multiplied by the pixel size that would be the snow covered area. Divided the pixel count by the total number of pixels in the catchment is the percentage of pixels covered with snow. We will use this number.\n",
"\n",
"Get number of pixels in catchment: total, clouds, snow."
"Let's get the number of pixels of total, clouds, snow across the catchment."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -635,7 +658,7 @@
"execution_count": null,
"outputs": [],
"source": [
"# viaualize snow fraction\n",
"# visualize snow fraction\n",
"snow_fraction.hvplot.line(title='Snow cover area (%)', ylabel=\"%\")"
],
"metadata": {
Expand Down Expand Up @@ -700,7 +723,7 @@
{
"cell_type": "markdown",
"source": [
"Let's refine a little bit the data so that we can compare it with the snow cover data"
"Let's refine a little bit the data so that we can compare it with the snow cover data."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -734,16 +757,15 @@
{
"cell_type": "markdown",
"source": [
"### Conclusion\n",
"## Conclusion\n",
"\n",
"In this analysis, we have comprehensively examined the features, capabilities, and limitations of two prominent geospatial data processing frameworks: OpenEO and Pangeo. OpenEO offers a unified API that simplifies the process of accessing and processing earth observation data across various backends, allowing users to interact with different data sources seamlessly. Its standardized interface is a strong asset, making it accessible to a wide range of users, from researchers to application developers.\n",
"\n",
"On the other hand, Pangeo excels in facilitating big data geoscience. Its robust ecosystem, built around existing Python libraries like Dask and Xarray, makes it a powerful tool for large-scale data analysis and visualization. Pangeo’s community-driven approach and open-source nature foster collaboration and innovation, promoting a dynamic and adaptable framework.\n",
"\n",
"Each platform has its own set of advantages and constraints. OpenEO simplifies interoperability and enhances accessibility, making it particularly beneficial for users who wish to work with diverse data sources without delving deeply into the complexities of each backend. Pangeo, with its emphasis on leveraging existing Python tools and libraries, is particularly potent for those comfortable with Python and seeking to perform extensive, scalable analyses.\n",
"\n",
"Choosing between OpenEO and Pangeo ultimately depends on the specific requirements and constraints of a project. Considerations such as the user's familiarity with Python, the necessity for interoperability across various data backends, and the scale of data processing required should guide the decision-making process.\n",
"\n"
"Choosing between OpenEO and Pangeo ultimately depends on the specific requirements and constraints of a project. Considerations such as the user's familiarity with Python, the necessity for interoperability across various data backends, and the scale of data processing required should guide the decision-making process."
],
"metadata": {
"collapsed": false
Expand Down

0 comments on commit 4e87cd0

Please sign in to comment.