diff --git a/.binder/environment-python_and_r.yml b/.binder/environment-python_and_r.yml index 8db45885..931f04b6 100644 --- a/.binder/environment-python_and_r.yml +++ b/.binder/environment-python_and_r.yml @@ -10,6 +10,7 @@ dependencies: - cc-plugin-ncei - cf_xarray - cftime + - ckanapi - compliance-checker - cython - descartes diff --git a/.binder/environment.yml b/.binder/environment.yml index 78f9e7f9..abf5e3ca 100644 --- a/.binder/environment.yml +++ b/.binder/environment.yml @@ -10,6 +10,7 @@ dependencies: - cc-plugin-ncei - cf_xarray - cftime + - ckanapi - compliance-checker - descartes - easyargs diff --git a/jupyterbook/content/code_gallery/data_access_notebooks/2024-09-17-CKAN_API_Query.ipynb b/jupyterbook/content/code_gallery/data_access_notebooks/2024-09-17-CKAN_API_Query.ipynb new file mode 100644 index 00000000..58a9d2cf --- /dev/null +++ b/jupyterbook/content/code_gallery/data_access_notebooks/2024-09-17-CKAN_API_Query.ipynb @@ -0,0 +1,2030 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "AIX-_9o2P07V", + "outputId": "09d2de4a-1f34-4518-c51f-487e14b55b66" + }, + "outputs": [], + "source": [ + "# For Google Colab\n", + "\n", + "import subprocess\n", + "import sys\n", + "\n", + "COLAB = \"google.colab\" in sys.modules\n", + "\n", + "\n", + "def _install(package):\n", + " if COLAB:\n", + " ans = input(f\"Install { package }? [y/n]:\")\n", + " if ans.lower() in [\"y\", \"yes\"]:\n", + " subprocess.check_call(\n", + " [sys.executable, \"-m\", \"pip\", \"install\", \"--quiet\", package]\n", + " )\n", + " print(f\"{ package } installed!\")\n", + "\n", + "\n", + "def _colab_install_missing_deps(deps):\n", + " import importlib\n", + "\n", + " for dep in deps:\n", + " if importlib.util.find_spec(dep) is None:\n", + " if dep == \"iris\":\n", + " dep = \"scitools-iris\"\n", + " _install(dep)\n", + "\n", + "\n", + "deps = [\"ckanapi\", \"geopandas\"]\n", + "\n", + "\n", + "_colab_install_missing_deps(deps)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Programmatically query the IOOS Data Catalog for a specific observation type" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Created: 2024-09-17\n", + "\n", + "Updated: 2024-09-18\n", + "\n", + "Author: [Mathew Biddle](mailto:mathew.biddle@noaa.gov)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dl6UQcydrdtx" + }, + "source": [ + "In this notebook we highlight the ability to search the [IOOS Data Catalog](https://data.ioos.us/) for a specific subset of observations using the [CKAN](https://ckan.org/) web accessible Application Programming Interface (API). \n", + "\n", + "For this example, we want to look for observations of oxygen in the water column across the IOOS Catalog. As part of the [IOOS Metadata Profile](https://ioos.github.io/ioos-metadata/), which the US IOOS community uses to publish datasets, we know that each Regional Association and DAC will be following the [Climate and Forecast (CF) Conventions](http://cfconventions.org/) and using CF `standard_names` to describe their datasets. So, with that assumption, we can search across the IOOS Data catalog for datasets with the CF standard names that contain `oxygen` and `sea_water`. Then, we can build a simple map to show the geographical distribution of those datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fz_XVNHerUus" + }, + "source": [ + "## Build CKAN API query base.\n", + "\n", + "Uses https://github.com/ckan/ckanapi" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8ilaNW-tPtVy", + "outputId": "9bf22340-1404-48e6-b3e8-fdc2cd6a3d23" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from ckanapi import RemoteCKAN\n", + "ua = 'ckanapiioos/1.0 (+https://ioos.us/)'\n", + "\n", + "#ioos_catalog = RemoteCKAN('https://data.ioos.us', user_agent=ua, get_only=True)\n", + "ioos_catalog = RemoteCKAN('https://data.ioos.us', user_agent=ua)\n", + "ioos_catalog" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9DISgdWPrRd0" + }, + "source": [ + "## What organizations are in the catalog?\n", + "\n", + "Tell me what organizations are there." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "O4joF0z8Px-m", + "outputId": "95a91429-b9aa-4350-b67c-6d0f07e4a12c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['aoos', 'caricoos', 'cdip', 'cencoos', 'comt', 'gcoos', 'glider-dac', 'glos', 'hf-radar-dac', 'ioos', 'maracoos', 'nanoos', 'neracoos', 'noaa-co-ops', 'noaa-ndbc', 'oceansites', 'pacioos', 'sccoos', 'secoora', 'unidata', 'usgs', 'us-navy']\n" + ] + } + ], + "source": [ + "orgs = ioos_catalog.action.organization_list()\n", + "print(orgs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yN2ELDQZrNah" + }, + "source": [ + "## How many datasets are we searching across?\n", + "\n", + "Grab all the datasets available and return the count." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_ov9sSwpP8VP", + "outputId": "4d06f0ca-0ad6-4260-ecd5-52f8019652be" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "43574" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#datasets = ioos_catalog.action.package_search(fq='+cf_standard_names:mass_concentration_of_oxygen_in_sea_water', rows=50)\n", + "datasets = ioos_catalog.action.package_search()\n", + "datasets['count']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rkgC5oLfmGB1" + }, + "source": [ + "## Grab the most recent applicable CF standard names\n", + "\n", + "Collect [CF standard names](https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html) that contain `oxygen` and `sea_water` from the CF standard name list." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 394 + }, + "id": "enKjucgnXivM", + "outputId": "9d2c266f-755a-42dc-9b5e-40607552c56f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CF Standard Name Table: 86\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
iddescription
469depth_at_shallowest_local_minimum_in_vertical_...Depth is the vertical distance below the surfa...
624fractional_saturation_of_oxygen_in_sea_waterFractional saturation is the ratio of some mea...
1357mass_concentration_of_oxygen_in_sea_waterMass concentration means mass per unit volume ...
1725mole_concentration_of_dissolved_molecular_oxyg...Mole concentration means number of moles per u...
1726mole_concentration_of_dissolved_molecular_oxyg...\"Mole concentration at saturation\" means the m...
1727mole_concentration_of_dissolved_molecular_oxyg...Mole concentration means number of moles per u...
1825mole_concentration_of_preformed_dissolved_mole...\"Mole concentration\" means the number of moles...
1996moles_of_oxygen_per_unit_mass_in_sea_watermoles_of_X_per_unit_mass_inY is also called \"m...
3203surface_molecular_oxygen_partial_pressure_diff...The surface called \"surface\" means the lower b...
3700temperature_of_sensor_for_oxygen_in_sea_waterTemperature_of_sensor_for_oxygen_in_sea_water ...
4776volume_fraction_of_oxygen_in_sea_water\"Volume fraction\" is used in the construction ...
4780volume_mixing_ratio_of_oxygen_at_stp_in_sea_water\"ratio_of_X_to_Y\" means X/Y. \"stp\" means stand...
\n", + "
" + ], + "text/plain": [ + " id \\\n", + "469 depth_at_shallowest_local_minimum_in_vertical_... \n", + "624 fractional_saturation_of_oxygen_in_sea_water \n", + "1357 mass_concentration_of_oxygen_in_sea_water \n", + "1725 mole_concentration_of_dissolved_molecular_oxyg... \n", + "1726 mole_concentration_of_dissolved_molecular_oxyg... \n", + "1727 mole_concentration_of_dissolved_molecular_oxyg... \n", + "1825 mole_concentration_of_preformed_dissolved_mole... \n", + "1996 moles_of_oxygen_per_unit_mass_in_sea_water \n", + "3203 surface_molecular_oxygen_partial_pressure_diff... \n", + "3700 temperature_of_sensor_for_oxygen_in_sea_water \n", + "4776 volume_fraction_of_oxygen_in_sea_water \n", + "4780 volume_mixing_ratio_of_oxygen_at_stp_in_sea_water \n", + "\n", + " description \n", + "469 Depth is the vertical distance below the surfa... \n", + "624 Fractional saturation is the ratio of some mea... \n", + "1357 Mass concentration means mass per unit volume ... \n", + "1725 Mole concentration means number of moles per u... \n", + "1726 \"Mole concentration at saturation\" means the m... \n", + "1727 Mole concentration means number of moles per u... \n", + "1825 \"Mole concentration\" means the number of moles... \n", + "1996 moles_of_X_per_unit_mass_inY is also called \"m... \n", + "3203 The surface called \"surface\" means the lower b... \n", + "3700 Temperature_of_sensor_for_oxygen_in_sea_water ... \n", + "4776 \"Volume fraction\" is used in the construction ... \n", + "4780 \"ratio_of_X_to_Y\" means X/Y. \"stp\" means stand... " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "url = \"https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml\"\n", + "\n", + "tbl_version = pd.read_xml(url, xpath=\"./*\")['version_number'][0].astype(int)\n", + "\n", + "df = pd.read_xml(url, xpath=\"entry\")\n", + "\n", + "std_names = df.loc[(df['id'].str.contains('oxygen') & df['id'].str.contains('sea_water'))]\n", + "\n", + "print('CF Standard Name Table: {}'.format(tbl_version))\n", + "\n", + "std_names[['id','description']]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G-KxN2RlpLOu" + }, + "source": [ + "## Search across IOOS Data Catalog using CKAN API\n", + "\n", + "Search the IOOS Data Catalog for CF standard names that match those above." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "xI6wTAPqXnt1", + "outputId": "4101e59a-dd91-45d4-95d9-0b8821b0013d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "depth_at_shallowest_local_minimum_in_vertical_profile_of_mole_concentration_of_dissolved_molecular_oxygen_in_sea_water\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n", + "fractional_saturation_of_oxygen_in_sea_water\n", + "num_results: 987, result_count: 0\n", + "num_results: 987, result_count: 500\n", + "num_results: 987, result_count: 1000\n", + "mass_concentration_of_oxygen_in_sea_water\n", + "num_results: 2735, result_count: 0\n", + "num_results: 2735, result_count: 500\n", + "num_results: 2735, result_count: 1000\n", + "num_results: 2735, result_count: 1500\n", + "num_results: 2735, result_count: 2000\n", + "num_results: 2735, result_count: 2500\n", + "num_results: 2735, result_count: 3000\n", + "mole_concentration_of_dissolved_molecular_oxygen_in_sea_water\n", + "num_results: 300, result_count: 0\n", + "num_results: 300, result_count: 300\n", + "mole_concentration_of_dissolved_molecular_oxygen_in_sea_water_at_saturation\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n", + "mole_concentration_of_dissolved_molecular_oxygen_in_sea_water_at_shallowest_local_minimum_in_vertical_profile\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n", + "mole_concentration_of_preformed_dissolved_molecular_oxygen_in_sea_water\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n", + "moles_of_oxygen_per_unit_mass_in_sea_water\n", + "num_results: 813, result_count: 0\n", + "num_results: 813, result_count: 500\n", + "num_results: 813, result_count: 1000\n", + "surface_molecular_oxygen_partial_pressure_difference_between_sea_water_and_air\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n", + "temperature_of_sensor_for_oxygen_in_sea_water\n", + "num_results: 167, result_count: 0\n", + "num_results: 167, result_count: 167\n", + "volume_fraction_of_oxygen_in_sea_water\n", + "num_results: 18, result_count: 0\n", + "num_results: 18, result_count: 18\n", + "volume_mixing_ratio_of_oxygen_at_stp_in_sea_water\n", + "num_results: 0, result_count: 0\n", + "num_results: 0, result_count: 0\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlorgstd_name
0St. Lucie Estuary - South Fork 2 (SLE-SF2)https://erddap.secoora.org/erddap/tabledap/st-...SECOORAfractional_saturation_of_oxygen_in_sea_water
0Neuse River at Marker 15 (ModMon 70, AWS J8903...https://erddap.secoora.org/erddap/tabledap/neu...SECOORAfractional_saturation_of_oxygen_in_sea_water
0Pamlico Sound at PS9 (ModMon)https://erddap.secoora.org/erddap/tabledap/pam...SECOORAfractional_saturation_of_oxygen_in_sea_water
0Indian River Lagoon - Jensen Beach (IRL-JB)https://erddap.secoora.org/erddap/tabledap/ind...SECOORAfractional_saturation_of_oxygen_in_sea_water
0Indian River Lagoon - Sebastian (IRL-SB)https://erddap.secoora.org/erddap/tabledap/ind...SECOORAfractional_saturation_of_oxygen_in_sea_water
...............
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
\n", + "

5485 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 St. Lucie Estuary - South Fork 2 (SLE-SF2) \n", + "0 Neuse River at Marker 15 (ModMon 70, AWS J8903... \n", + "0 Pamlico Sound at PS9 (ModMon) \n", + "0 Indian River Lagoon - Jensen Beach (IRL-JB) \n", + "0 Indian River Lagoon - Sebastian (IRL-SB) \n", + ".. ... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "\n", + " url org \\\n", + "0 https://erddap.secoora.org/erddap/tabledap/st-... SECOORA \n", + "0 https://erddap.secoora.org/erddap/tabledap/neu... SECOORA \n", + "0 https://erddap.secoora.org/erddap/tabledap/pam... SECOORA \n", + "0 https://erddap.secoora.org/erddap/tabledap/ind... SECOORA \n", + "0 https://erddap.secoora.org/erddap/tabledap/ind... SECOORA \n", + ".. ... ... \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "\n", + " std_name \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + ".. ... \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "\n", + "[5485 rows x 4 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from ckanapi import RemoteCKAN\n", + "import time\n", + "ua = 'ckanapiioos/1.0 (+https://ioos.us/)'\n", + "\n", + "#ioos_catalog = RemoteCKAN('https://data.ioos.us', user_agent=ua, get_only=True)\n", + "ioos_catalog = RemoteCKAN('https://data.ioos.us', user_agent=ua)\n", + "ioos_catalog\n", + "\n", + "df_out = pd.DataFrame()\n", + "\n", + "for std_name in std_names['id']:\n", + "\n", + " print(std_name)\n", + "\n", + " fq = '+cf_standard_names:{}'.format(std_name)\n", + "\n", + " result_count = 0\n", + "\n", + " while True:\n", + " datasets = ioos_catalog.action.package_search(fq=fq, rows=500)\n", + "\n", + " num_results = datasets['count']\n", + "\n", + " print(f\"num_results: {num_results}, result_count: {result_count}\")\n", + "\n", + " for dataset in datasets['results']:\n", + " #print(dataset['title'])\n", + " df = pd.DataFrame({'title': [dataset['title']],\n", + " 'url': [dataset['resources'][0]['url']],\n", + " 'org': [dataset['organization']['title']],\n", + " 'std_name':std_name})\n", + "\n", + " df_out = pd.concat([df_out, df])\n", + "\n", + " result_count = result_count + 1\n", + "\n", + " time.sleep(1)\n", + "\n", + " if(result_count >= num_results):\n", + " print(f\"num_results: {num_results}, result_count: {result_count}\")\n", + " break\n", + "\n", + "df_out" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A0atXi0kEnbr" + }, + "source": [ + "## Do some summarizing of the responses\n", + "\n", + "The DataFrame of the matching datasets is quite large. I wonder what the distribution of those datasets across organizations looks like? Let's use [pandas.groupby()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) to generate some statistics about how many datasets are provided, matching our criteria, by which organization." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "K3tcW2iyDpFd", + "outputId": "aacb7d85-9c9c-4fe8-ba2e-74d805a7deb5" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlstd_name
org
CeNCOOS222
GCOOS220822082208
Glider DAC262526252625
NERACOOS282828
PacIOOS161616
SECOORA606606606
\n", + "
" + ], + "text/plain": [ + " title url std_name\n", + "org \n", + "CeNCOOS 2 2 2\n", + "GCOOS 2208 2208 2208\n", + "Glider DAC 2625 2625 2625\n", + "NERACOOS 28 28 28\n", + "PacIOOS 16 16 16\n", + "SECOORA 606 606 606" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out.groupby(by='org').count()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t9PwQW_7Gj0a" + }, + "source": [ + "## Drop the Glider DAC data\n", + "\n", + "Glider DAC data are already making it to NCEI, so we can drop those entries." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 423 + }, + "id": "u0y8KMRYGhl9", + "outputId": "21f54f00-02e1-497b-a45e-83ee0004ca4b" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlstd_name
org
CeNCOOS222
GCOOS220822082208
NERACOOS282828
PacIOOS161616
SECOORA606606606
\n", + "
" + ], + "text/plain": [ + " title url std_name\n", + "org \n", + "CeNCOOS 2 2 2\n", + "GCOOS 2208 2208 2208\n", + "NERACOOS 28 28 28\n", + "PacIOOS 16 16 16\n", + "SECOORA 606 606 606" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out_no_glider = df_out.loc[~df_out['org'].str.contains('Glider DAC')]\n", + "df_out_no_glider.groupby(by='org').count()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yVz8vdNkEt6X" + }, + "source": [ + "## Digging into some of the nuances\n", + "\n", + "There are still quite a lot of datasets from each organization. As our search above looked for each CF standard_name across all the datasets, there might be duplicate datasets which have multiple matching CF standard names. ie. one dataset might have both `mass_concentration_of_oxygen_in_sea_water` and `fractional_saturation_of_oxygen_in_sea_water`, but we only need to know that it's one dataset.\n", + "\n", + "As we only need to know about the unique datasets, let's count how many unique dataset urls we have." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 455 + }, + "id": "0wygCC8X5Tpc", + "outputId": "48be015f-7e8d-45af-b2cd-e7a37fa6bf86" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleorgstd_name
url
http://www.neracoos.org/erddap/tabledap/A01_optode_all333
http://www.neracoos.org/erddap/tabledap/GRBGBWQ_NERRS444
http://www.neracoos.org/erddap/tabledap/GRBLRWQ_NERRS444
http://www.neracoos.org/erddap/tabledap/LOBO_CSV_65111
http://www.neracoos.org/erddap/tabledap/URI_168-MV_BottomSonde161616
............
https://gcoos5.geos.tamu.edu/erddap/tabledap/deepwater_pe972250_ctd666
https://gcoos5.geos.tamu.edu/erddap/tabledap/deepwater_pe972274_ctd666
https://gcoos5.geos.tamu.edu/erddap/tabledap/deepwater_st093lay_ctd666
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/hui_water_quality888
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/nss_012888
\n", + "

422 rows × 3 columns

\n", + "
" + ], + "text/plain": [ + " title org std_name\n", + "url \n", + "http://www.neracoos.org/erddap/tabledap/A01_opt... 3 3 3\n", + "http://www.neracoos.org/erddap/tabledap/GRBGBWQ... 4 4 4\n", + "http://www.neracoos.org/erddap/tabledap/GRBLRWQ... 4 4 4\n", + "http://www.neracoos.org/erddap/tabledap/LOBO_CS... 1 1 1\n", + "http://www.neracoos.org/erddap/tabledap/URI_168... 16 16 16\n", + "... ... ... ...\n", + "https://gcoos5.geos.tamu.edu/erddap/tabledap/de... 6 6 6\n", + "https://gcoos5.geos.tamu.edu/erddap/tabledap/de... 6 6 6\n", + "https://gcoos5.geos.tamu.edu/erddap/tabledap/de... 6 6 6\n", + "https://pae-paha.pacioos.hawaii.edu/erddap/tabl... 8 8 8\n", + "https://pae-paha.pacioos.hawaii.edu/erddap/tabl... 8 8 8\n", + "\n", + "[422 rows x 3 columns]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out_no_glider.groupby(by='url').count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Drop duplicate records\n", + "\n", + "As you can see above, there are a lot of duplicate dataset urls which we can simplify down. We identify duplicates by looking at the URL, which should be unique for each dataset, and drop the duplicates." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 423 + }, + "id": "H6rQj51d7cAm", + "outputId": "0942f042-834a-415c-e276-ad0e89e732ed" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlorgstd_name
0Neuse River near the south shore (ModMon 96)https://erddap.secoora.org/erddap/tabledap/neu...SECOORAfractional_saturation_of_oxygen_in_sea_water
0Great Bay,NH. Lamprey River WQ stationhttp://www.neracoos.org/erddap/tabledap/GRBLRW...NERACOOSfractional_saturation_of_oxygen_in_sea_water
0Great Bay,NH. Great Bay WQ stationhttp://www.neracoos.org/erddap/tabledap/GRBGBW...NERACOOSfractional_saturation_of_oxygen_in_sea_water
0WACCASASSA RIVER NR GULF HAMMOCK, FLA. (USGS 0...https://erddap.secoora.org/erddap/tabledap/gov...SECOORAmass_concentration_of_oxygen_in_sea_water
0LITTLE BACK RIVER AT HOG ISLAND, NEAR SAVANNAH...https://erddap.secoora.org/erddap/tabledap/gov...SECOORAmass_concentration_of_oxygen_in_sea_water
...............
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
0Walton-Smith CTD, WS22215, WS22215_2022_08_Wea...https://gcoos5.geos.tamu.edu/erddap/tabledap/W...GCOOSvolume_fraction_of_oxygen_in_sea_water
\n", + "

422 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 Neuse River near the south shore (ModMon 96) \n", + "0 Great Bay,NH. Lamprey River WQ station \n", + "0 Great Bay,NH. Great Bay WQ station \n", + "0 WACCASASSA RIVER NR GULF HAMMOCK, FLA. (USGS 0... \n", + "0 LITTLE BACK RIVER AT HOG ISLAND, NEAR SAVANNAH... \n", + ".. ... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "0 Walton-Smith CTD, WS22215, WS22215_2022_08_Wea... \n", + "\n", + " url org \\\n", + "0 https://erddap.secoora.org/erddap/tabledap/neu... SECOORA \n", + "0 http://www.neracoos.org/erddap/tabledap/GRBLRW... NERACOOS \n", + "0 http://www.neracoos.org/erddap/tabledap/GRBGBW... NERACOOS \n", + "0 https://erddap.secoora.org/erddap/tabledap/gov... SECOORA \n", + "0 https://erddap.secoora.org/erddap/tabledap/gov... SECOORA \n", + ".. ... ... \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "0 https://gcoos5.geos.tamu.edu/erddap/tabledap/W... GCOOS \n", + "\n", + " std_name \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 fractional_saturation_of_oxygen_in_sea_water \n", + "0 mass_concentration_of_oxygen_in_sea_water \n", + "0 mass_concentration_of_oxygen_in_sea_water \n", + ".. ... \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "0 volume_fraction_of_oxygen_in_sea_water \n", + "\n", + "[422 rows x 4 columns]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out_nodups_no_glider = df_out_no_glider.drop_duplicates(subset=['url'],keep='last')\n", + "\n", + "df_out_nodups_no_glider" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UDsBe8k9E-47" + }, + "source": [ + "## How many endpoints are not ERDDAP?\n", + "\n", + "Now we have a unique list of datasets which match our CF standard name criteria. Since we have some background in using [ERDDAP to query for data](https://ioos.github.io/ioos_code_lab/content/code_gallery/data_access_notebooks/2017-03-21-ERDDAP_IOOS_Sensor_Map.html), let's take a look at what other endpoints each of the datasets are using.\n", + "\n", + "_Hint: We know ERDDAP systems typically have `erddap` in their urls._" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 423 + }, + "id": "saDpOVP5778u", + "outputId": "f0a8f4bb-e77a-4020-b295-f66cfd57fb64" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlorgstd_name
\n", + "
" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: [title, url, org, std_name]\n", + "Index: []" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out_nodups_no_glider.loc[~df_out_nodups_no_glider['url'].str.contains('erddap')]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f_PGG_wsHYwF" + }, + "source": [ + "## What's the remaining distribution?\n", + "\n", + "This is the distribution of unique datasets found in the IOOS Data Catalog which have a CF Standard Name that contains the work `oxygen` and `sea_water`. We've dropped out the Glider DAC datasets as, theoretically, those are in NCEI already." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 331 + }, + "id": "fRHL-7lPGwOL", + "outputId": "b72e8a38-1e63-44b1-9644-5eaa7df9abc6" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleurlstd_name
org
CeNCOOS222
GCOOS378378378
NERACOOS555
PacIOOS222
SECOORA353535
\n", + "
" + ], + "text/plain": [ + " title url std_name\n", + "org \n", + "CeNCOOS 2 2 2\n", + "GCOOS 378 378 378\n", + "NERACOOS 5 5 5\n", + "PacIOOS 2 2 2\n", + "SECOORA 35 35 35" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_out_nodups_no_glider.groupby(by='org').count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ingest data\n", + "\n", + "Let's rip through all of the datasets, grab the data as a table (including units) and make a monster dictionary. This takes a bit." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "Wk9myBdgBUxH" + }, + "outputs": [], + "source": [ + "dict_out_final = {}\n", + "\n", + "for index,row in df_out_nodups_no_glider.iterrows():\n", + " #print(row)\n", + " dict_out_final['{}'.format(row['title'])] = pd.read_csv('{}.csvp'.format(row['url']), low_memory=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's take a quick look at one of the DataFrames.\n", + "\n", + "Transpose it when we print, so we can see all the columns." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
profileNaNNaNNaNNaNNaN
time (UTC)1968-01-20T03:14:07Z1968-01-20T03:14:07Z1968-01-20T03:14:07Z1968-01-20T03:14:07Z1968-01-20T03:14:07Z
latitude (degrees_north)29.247229.247229.247229.247229.2472
longitude (degrees_east)-87.888901-87.888901-87.888901-87.888901-87.888901
numberOfLevel351351351351351
depth (m)7.08.09.010.011.0
temperature (degree_C)20.82320.83200120.83720.83799920.854
salinity (PSU)35.32199935.33200135.33599935.33800135.345001
oxygen (milliliters per liter)4.594.624.634.634.75
nitrite (micromols)0.00.00.00.00.01
nitrate (micromols)0.080.080.110.110.1
phosphate (micromols)0.020.010.00.010.01
silicate (micromols)1.331.181.131.051.41
salinity2 (PSU)36.10300136.09299936.09400236.09600136.285999
qualityFlag0.00.00.00.00.0
instrumentNaNNaNNaNNaNNaN
instrument1NaNNaNNaNNaNNaN
instrument2NaNNaNNaNNaNNaN
instrument3NaNNaNNaNNaNNaN
instrument4NaNNaNNaNNaNNaN
instrument5NaNNaNNaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " 0 1 \\\n", + "profile NaN NaN \n", + "time (UTC) 1968-01-20T03:14:07Z 1968-01-20T03:14:07Z \n", + "latitude (degrees_north) 29.2472 29.2472 \n", + "longitude (degrees_east) -87.888901 -87.888901 \n", + "numberOfLevel 351 351 \n", + "depth (m) 7.0 8.0 \n", + "temperature (degree_C) 20.823 20.832001 \n", + "salinity (PSU) 35.321999 35.332001 \n", + "oxygen (milliliters per liter) 4.59 4.62 \n", + "nitrite (micromols) 0.0 0.0 \n", + "nitrate (micromols) 0.08 0.08 \n", + "phosphate (micromols) 0.02 0.01 \n", + "silicate (micromols) 1.33 1.18 \n", + "salinity2 (PSU) 36.103001 36.092999 \n", + "qualityFlag 0.0 0.0 \n", + "instrument NaN NaN \n", + "instrument1 NaN NaN \n", + "instrument2 NaN NaN \n", + "instrument3 NaN NaN \n", + "instrument4 NaN NaN \n", + "instrument5 NaN NaN \n", + "\n", + " 2 3 \\\n", + "profile NaN NaN \n", + "time (UTC) 1968-01-20T03:14:07Z 1968-01-20T03:14:07Z \n", + "latitude (degrees_north) 29.2472 29.2472 \n", + "longitude (degrees_east) -87.888901 -87.888901 \n", + "numberOfLevel 351 351 \n", + "depth (m) 9.0 10.0 \n", + "temperature (degree_C) 20.837 20.837999 \n", + "salinity (PSU) 35.335999 35.338001 \n", + "oxygen (milliliters per liter) 4.63 4.63 \n", + "nitrite (micromols) 0.0 0.0 \n", + "nitrate (micromols) 0.11 0.11 \n", + "phosphate (micromols) 0.0 0.01 \n", + "silicate (micromols) 1.13 1.05 \n", + "salinity2 (PSU) 36.094002 36.096001 \n", + "qualityFlag 0.0 0.0 \n", + "instrument NaN NaN \n", + "instrument1 NaN NaN \n", + "instrument2 NaN NaN \n", + "instrument3 NaN NaN \n", + "instrument4 NaN NaN \n", + "instrument5 NaN NaN \n", + "\n", + " 4 \n", + "profile NaN \n", + "time (UTC) 1968-01-20T03:14:07Z \n", + "latitude (degrees_north) 29.2472 \n", + "longitude (degrees_east) -87.888901 \n", + "numberOfLevel 351 \n", + "depth (m) 11.0 \n", + "temperature (degree_C) 20.854 \n", + "salinity (PSU) 35.345001 \n", + "oxygen (milliliters per liter) 4.75 \n", + "nitrite (micromols) 0.01 \n", + "nitrate (micromols) 0.1 \n", + "phosphate (micromols) 0.01 \n", + "silicate (micromols) 1.41 \n", + "salinity2 (PSU) 36.285999 \n", + "qualityFlag 0.0 \n", + "instrument NaN \n", + "instrument1 NaN \n", + "instrument2 NaN \n", + "instrument3 NaN \n", + "instrument4 NaN \n", + "instrument5 NaN " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dict_out_final[\"\\\"Deepwater CTD - pe972218.ctd.nc - 29.25N, -87.89W - 1997-03-21\\\"\"].head(5).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Let's make a nice map of the distribution of observations\n", + "\n", + "Below we create a mapping function to plot the unique dataset points on a map. Then, we use that function with our full response. We have to do a little reorganizing of the data to build one DataFrame for all the coordinates." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import geopandas as gpd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def make_map(df):\n", + " # initialize an axis\n", + " fig, ax = plt.subplots(figsize=(8,6))# plot map on axis\n", + " countries = gpd.read_file( \n", + " gpd.datasets.get_path(\"naturalearth_lowres\"))\n", + "\n", + " countries[countries[\"name\"] == \"United States of America\"].plot(color=\"lightgrey\",\n", + " ax=ax)\n", + "\n", + " # plot points\n", + " df.plot(x=\"longitude (degrees_east)\", y=\"latitude (degrees_north)\", \n", + " kind=\"scatter\",\n", + " ax=ax)# add grid\n", + "\n", + " ax.grid(visible=True, alpha=0.5)\n", + "\n", + " return ax" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df_coords = pd.DataFrame(columns=['latitude (degrees_north)','longitude (degrees_east)'])\n", + "\n", + "for key in dict_out_final.keys():\n", + " df_coords = pd.concat([df_coords, dict_out_final[key][['latitude (degrees_north)','longitude (degrees_east)']]])\n", + "\n", + "# drop all duplicates\n", + "df_coords_clean = df_coords.drop_duplicates(ignore_index=True)\n", + "\n", + "# make the map\n", + "make_map(df_coords_clean)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lets explore those points on an interactive map\n", + "\n", + "Just for fun, we can us [`geopandas.explore()`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html) to plot these points on an interactive map to browse around." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gdf = gpd.GeoDataFrame(\n", + " df_coords_clean, geometry=gpd.points_from_xy(df_coords_clean['longitude (degrees_east)'], df_coords_clean['latitude (degrees_north)']), crs=\"EPSG:4326\"\n", + ")\n", + "\n", + "gdf.explore()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We hope this example demonstrates the flexibility of direct requests to the IOOS Data Catalog CKAN server and all the possibilities it provides. In this notebook we:\n", + "\n", + "* Search the IOOS Data Catalog CKAN API with keywords.\n", + "* Found datasets matching our specified criteria.\n", + "* Collected all the data from each of the datasets matching our criteria.\n", + "* Created a simple map of the distribution of datasets which match our criteria.\n", + "\n", + "To take this one step further, since we collected all the data from each of the datasets (in the dictionary `dict_out_final`) a user could integrate all of the oxygen observations together and start to build a comprehensive dataset. \n", + "\n", + "Additionally, a user could modify the CKAN query to search for terms outside of the CF standard names to potentially gather more datasets. " + ] + } + ], + "metadata": { + "colab": { + "authorship_tag": "ABX9TyOq6Zm4CP25L4Z2jB+P61RB", + "include_colab_link": true, + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}