NOAA-GFDL · wrongkindofdoctor · Aug 20, 2024 · Aug 19, 2024 · Aug 20, 2024 · Aug 20, 2024
@@ -1,32 +1,32 @@
 .. _ref-git-intro:
 Git-based development workflow
 ==============================
 Steps for brand new users:
 ------------------------------
-1. Fork the MDTF-diagnostics branch to your GitHub account (:ref:`ref-fork-code`)
-2. Clone (:ref:`ref-clone`) your fork of the MDTF-diagnostics repository (repo) to your local machine
+#. Fork the MDTF-diagnostics branch to your GitHub account (:ref:`ref-fork-code`)
+#. Clone (:ref:`ref-clone`) your fork of the MDTF-diagnostics repository (repo) to your local machine
    (if you are not using the web interface for development)
-3. Check out a new branch from the local main branch (:ref:`ref-new-pod`)
-4. Start coding
-5. Commit the changes in your POD branch (:ref:`ref-new-pod`)
-6. Push the changes to the copy of the POD branch on your remote fork (:ref:`ref-new-pod`)
-7. Repeat steps 4--6 until you are finished working
-8. Submit a pull request to the NOAA-GFDL repo for review (:ref:`ref-pull-request`).
+#. Check out a new branch from the local main branch (:ref:`ref-new-pod`)
+#. Start coding
+#. Commit the changes in your POD branch (:ref:`ref-new-pod`)
+#. Push the changes to the copy of the POD branch on your remote fork (:ref:`ref-new-pod`)
+#. Repeat steps 4--6 until you are finished working
+#. Submit a pull request to the NOAA-GFDL repo for review (:ref:`ref-pull-request`).
 
 Steps for users continuing work on an existing POD branch
 -------------------------------------------------------------
-1. Create a backup copy of the MDTF-Diagnostics repo on your local machine
-2. Pull in updates from the NOAA-GFDL/main branch to the main branch in your remote repo (:ref:`ref-update-main`)
-3. Pull in updates from the main branch in your remote fork into the main branch in your local repo
+#. Create a backup copy of the MDTF-Diagnostics repo on your local machine
+#. Pull in updates from the NOAA-GFDL/main branch to the main branch in your remote repo (:ref:`ref-update-main`)
+#. Pull in updates from the main branch in your remote fork into the main branch in your local repo
    (:ref:`ref-update-main`)
-4. Sync your POD branch in your local repository with the local main branch using an interactive rebase
+#. Sync your POD branch in your local repository with the local main branch using an interactive rebase
    (:ref:`ref-rebase`) or merge (:ref:`ref-merge`). Be sure to make a backup copy of of your local *MDTF-diagnostics*
    repo first, and test your branch after rebasing/merging as described in the linked instructions before proceeding
    to the next step.
-5. Continue working on your POD branch
-6. Commit the changes in your POD branch
-7. Push the changes to the copy of the POD branch in your remote fork (:ref:`ref-push`)
-8. Submit a pull request (PR) to NOAA-GFDL/main branch when your code is ready for review (:ref:`ref-pull-request`)
+#. Continue working on your POD branch
+#. Commit the changes in your POD branch
+#. Push the changes to the copy of the POD branch in your remote fork (:ref:`ref-push`)
+#. Submit a pull request (PR) to NOAA-GFDL/main branch when your code is ready for review (:ref:`ref-pull-request`)
 
 .. _ref-fork-code:
 
@@ -68,30 +68,35 @@
 
 Working on a brand new POD
 ------------------------------
-Developers can either clone the MDTF-diagnostics repo to their computer, or manage the MDTF package using the GitHub webpage interface.
+Developers can either clone the MDTF-diagnostics repo to their computer, or manage the MDTF package using the GitHub
+webpage interface.
 Whichever method you choose, remember to create your [POD branch name] branch from the main branch, not the main branch.
 Since developers commonly work on their own machines, this manual provides command line instructions.
 
 1. Check out a branch for your POD
+
 ::
 
    git checkout -b [POD branch name]
 
 2. Write code, add files, etc...
 
 3. Add the files you created and/or modified to the staging area
+
 ::
 
    git add [file 1]
    git add [file 2]
    ...
 
 4. Commit your changes, including a brief description
+
 ::
 
    git commit -m "description of my changes"
 
 5. Push the updates to your remote repository
+
 ::
 
    git push -u origin [POD branch name]
@@ -117,27 +122,28 @@
 
 To submit a PR :
 
-1. Click the *Contribute* link on the main page of your MDTF-diagnostics fork and click the *Open Pull Request* button
+#. Click the *Contribute* link on the main page of your MDTF-diagnostics fork and click the *Open Pull Request* button
 
-2. Verify that your fork is set as the **base** repository, and *main* is set as the **base branch**,
+#. Verify that your fork is set as the **base** repository, and *main* is set as the **base branch**,
    that *NOAA-GFDL* is set as the **head repository**, and *main* is set as the **head** branch
 
-3. Click the *Create Pull Request* button, add a brief description to the PR header, and go through the checklist to
+#. Click the *Create Pull Request* button, add a brief description to the PR header, and go through the checklist to
    ensure that your code meets that baseline requirements for review
 
-4. Click the *Create Pull Request* button (now in the lower left corner of the message box)
+#. Click the *Create Pull Request* button (now in the lower left corner of the message box)
 
-Note that you can submit a Draft Pull Request if you want to run the code through the CI, but are not ready
-for a full review by the framework team. Starting from step 3. above
+   Note that you can submit a Draft Pull Request if you want to run the code through the CI, but are not ready
+   for a full review by the framework team. Starting from step 3. above
 
-1. Click the arrow on the right edge of the *Create Pull Request* button and select *Create draft pull request* from the dropdown menu.
+#. Click the arrow on the right edge of the *Create Pull Request* button and select *Create draft pull request*
+   from the dropdown menu.
 
-2. Continue pushing changes to your POD branch until you are ready for a review (the PR will update automatically)
+#. Continue pushing changes to your POD branch until you are ready for a review (the PR will update automatically)
 
-3. When you are ready for review, navigate to the NOAA-GFDL/MDTF-Diagnostics
+#. When you are ready for review, navigate to the NOAA-GFDL/MDTF-Diagnostics
    `*Pull requests* <https://github.com/NOAA-GFDL/MDTF-diagnostics/pulls>`__ page, and click on your PR
 
-4. Scroll down to the header that states "this pull request is still a work in progress",
+#. Scroll down to the header that states "this pull request is still a work in progress",
    and click the *ready for review* button to move the PR out of *draft* mode
 
 .. _ref-update-main:
@@ -193,6 +199,7 @@
 2. Update the local and remote main branches on your fork as described in :ref:`ref-update-main`.
 
 3. Check out your POD branch, and merge the main branch into your POD branch
+
 ::
 
    git checkout [POD branch name]
@@ -201,13 +208,15 @@
 4. Resolve any conflicts that occur from the merge
 
 5. Add the updated files to the staging area
+
 ::
 
    git add file1
    git add file2
    ...
 
 6. Push the branch updates to your remote fork
+
 ::
 
    git push -u origin [POD branch name]
@@ -217,19 +226,22 @@
 If you want to revert to the commit(s) before you pulled in updates:
 
 1. Find the commit hash(es) with the updates, in your git log
+
 ::
 
    git log
 
-or consult the commit log in the web interface
+   or consult the commit log in the web interface
 
 2. Revert each commit in order from newest to oldest
+
 ::
 
    git revert <newer commit hash>
    git revert <older commit hash>
 
 3. Push the updates to the remote branch
+
 ::
 
    git push origin [POD branch name]

@@ -35,7 +35,9 @@ scripting languages, including, `R <https://anaconda.org/conda-forge/r-base>`__,
 `NCL <https://anaconda.org/conda-forge/ncl>`__, `Ruby <https://anaconda.org/conda-forge/ruby>`__, etc...
 
 
-Python-based PODs should be written in Python 3.12 or newer. We provide a developer version of the python3_base environment (described below) that includes Jupyter and other developer-specific tools. This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script:
+Python-based PODs should be written in Python 3.12 or newer. We provide a developer version of the
+python3_base environment (described below) that includes Jupyter and other developer-specific tools.
+This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script:
 
 If you are using Anaconda or miniconda to manage the conda environments, run:
 

@@ -6,5 +6,82 @@
 ESM-intake catalogs
 ===================
 
-The MDTF-diagnostics uses `intake-ESM <https://intake-esm.readthedocs.io/en/stable/>`__ catalogs and APIs to access
-model datasets and verify POD data requirements. The MDTF-diagnostics package provides a basic
+The MDTF-diagnostics package uses `intake-ESM <https://intake-esm.readthedocs.io/en/stable/>`__ catalogs and APIs to
+access model datasets and verify POD data requirements. Intake-ESM is a software package that uses
+`intake <https://intake.readthedocs.io/en/latest/>`__ to load
+catalog *assets*--netCDF or Zarr files and associated metadata--into a Pandas Dataframe or an xarray dataset.
+Users can query catalogs and access data subsets according to desired date ranges, variables, simulations, and
+other criteria without having to walk the directory structure or open files to verify information beforehand, making
+them convenient and, depending on the location of the reference dataset, faster than on-the-fly file search methods.
+
+Intake-ESM catalogs are generated using information from standardized directory structures and/or
+file metadata using custom tools and or the ecgtools package following the intake-ESM recommendations. The final
+output from the catalog generator will be a csv file populated with files and metadata, and a json header file that
+points to the location of the csv file and contains information about the column headers. Users pass the json
+header file to the intake-ESM `open-esm_datastore` utility so that it can parse the information in the csv file
+to perform catalog queries.
+
+.. code-block:: python
+
+   # define dictionary with catalog query info
+   query_dict = {}
+
+   query_dict['frequency'] = "day"
+   query_dict["realm"] = "atmos"
+   query_dict['standard_name'] = "air_temperature"
+   # open the intake-ESM catalog
+   cat = intake.open_esm_datastore("/path/to/data_catalog.json")
+   # query the catalog for data subset matching query_dict info
+   cat_subset = cat.search(**query_dict)
+
+The MDTF-diagnostics package provides a basic :ref:`catalog builder tool <ref-catalog-builder>` built on top of
+`ecgtools <https://github.com/ncar-xdev/ecgtools>`__ that has been tested with
+CMIP6 and GFDL datasets. The catalog builder contains hooks for CESM datasets that will be implemented in later
+versions of the MDTF-diagnostics package. GFDL also maintains a lightweight
+`CatalogBuilder <https://github.com/NOAA-GFDL/CatalogBuilder>`__ that has been tested with GFDL and CMIP6 datasets.
+Users may try both tools and select the one works best for their dataset and system, or create their own builder script.
+
+Required catalog information
+----------------------------
+
+The following intake-ESM catalog columns must be populated for each file for MDTF-diagnostics functionality; other
+columns are optional at this time but may be used to refine query results in future releases:
+
+  * activity_id: (str) the dataset convention:
+      * "CMIP"
+      * "CESM"
+      * "GFDL"
+  * file_path: (str) full path to the file
+  * frequency: (str) output frequency of the data; use the following CMIP definitions:
+      * sampled hourly = "1hr""sampled hourly"
+      * monthly-mean diurnal cycle resolving each day into 1-hour means = "1hrCM"
+      * sampled hourly at specified time point within an hour = "1hrPt"
+      * 3 hourly mean samples = "3hr"
+      * sampled 3 hourly at specified time point within the time period = "3hrPt"
+      * 6 hourly mean samples = "6hr"
+      * sampled 6 hourly at specified time point within the time period = "6hrPt"
+      * daily mean samples = "day"
+      * decadal mean samples = "dec"
+      * fixed (time invariant) field = "fx"
+      * monthly mean samples = "mon"
+      * monthly climatology computed from monthly mean samples = "monC"
+      * sampled monthly at specified time point within the time period = "monPt"
+      * sampled sub-hourly at specified time point within an hour = "subhrPt"
+      * annual mean samples = "yr"
+      * sampled yearly at specified time point within the time period = "yrPt"
+  * realm | modeling_realm: (str) model realm for the variable; use the following CMIP definitions:
+      * Aerosol = "aerosol"
+      * Atmosphere = "atmos"
+      * Atmospheric Chemistry = "atmosChem"
+      * Land Surface = "land"
+      * Land Ice = "landIce"
+      * Ocean = "ocean"
+      * Ocean Biogeochemistry = "ocnBgchem"
+      * Sea Ice = "seaIce"
+  * standard_name: (str) if a standard_name is not defined for a variable in the target file, use the equivalent CMIP6
+    standard_name or the long_name with underscores in place of spaces (e.g., air temperature -> air_temperature)
+  * time_range: (str or int) time range spanned by file with start_time and end_time separated by a '-', e.g.,:
+      * yyyy-mm-dd-yyyy-mm-dd
+      * yyyymmdd:HHMMSS-yyyymmdd:HHMMSS
+  * units: (str) variable units
+  * variable_id: (str) variable name id (e.g., temp, precip, PSL, TAUX)
@@ -22,7 +22,6 @@ with the run in our :code-rst:`wkdir`. The resulting tree should look like this:
           ├── CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231.log
           ├── config_save.json
           ├── example_multicase/
-          ├── example_multicase.data.log
           ├── index.html
           ├── MDTF_CMIP_Synthetic_r1i1p1f1_gr1_19800101-19841231/
           ├── MDTF_CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231/
@@ -32,14 +31,18 @@ with the run in our :code-rst:`wkdir`. The resulting tree should look like this:
           └── MDTF_postprocessed_data.json
 
 To explain the contents within:
-   * :code-rst:`index.html` is the html page used to consolidate the MDTF run for the end-user. 
-     This serves as the main way to view all related plots and information for all PODs ran in a nice, condensed manner.
+   * :code-rst:`config_save.json` contains a copy of the runtime configuraton
+   * :code-rst:`index.html` is the html page used to consolidate the MDTF run results for the end-user.
+     Open this file in a web browser (e.g., :console:`% firefox index.html`) to view the figures and logs for each
+     POD.
    * :code-rst:`MDTF_postprocessed_data.csv` and :code-rst:`MDTF_postprocessed_data.json` are the ESM-intake catalog 
      csv and json header files with information about the processed model data.
-   * The catalog points towards data that can be found in the folders :code-rst:`MDTF_CMIP_Synthetic_*`
-   * The rest of the files serve as a method of logging information about what the framework did and various issues that
-     might have occured. Information inside these files could greatly help both POD developers and the framework 
-     development team!
+   * The catalog points towards data that can be found in the folders :code-rst:`MDTF_CMIP_Synthetic_*`.
+     To re-run the framework using the same processed dataset, set `DATA_CATALOG`
+     to the path to the :code-rst:`MDTF_processed_data.json` header file and set `run_pp` to `false` in the
+     runtime configuration file.
+   * The `.log` files contain framework and case-specific logging information. Please include information from these
+     logs in any issues related to running the framework that you submit to the MDTF-diagnostics team.
 
 POD Output Directory
 -------------------------------
@@ -53,14 +56,25 @@ This directory, :code-rst:`example_multicase`, contains all of the output for th
           ├── example_multicase.data.log
           ├── example_multicase.html
           ├── example_multicase.log
-          ├── index.html
           ├── model/
           └── obs/
 
-These files and folders being:
-   * :code-rst:`example_multicase.html` serves as the landing page for the POD and can be easily reached from :code-rst:`index.html`.
-   * :code-rst:`case_info.yml` provides information about the cases ran for the POD.
-   * :code-rst:`model/` and :code-rst:`obs/` contain both plots and data for both the model data and observation data respectively.
-   * There also exists various log files which function the same as mentioned previously.
+These files and folders are:
+   * :code-rst:`example_multicase.html` serves as the landing page for the POD and can be easily reached from
+     :code-rst:`index.html`.
+   * :code-rst:`case_info.yml` provides environment variables for each case. Multirun PODs can read and set the
+     environment variables from this file following the
+     `example_multicase.py <https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/diagnostics/example_multicase/example_multicase.py>`__
+      template
+   * :code-rst:`model/` and :code-rst:`obs/` contain both plots and data for both the model data and observation data
+     respectively. The framework appends a temporary :code-rst:`PS` subdirectory to the :code-rst:`model` and
+     :code-rst:`obs`directories where PODs can write postscript files instead of png files. The framework will convert
+     any .(e)ps files in the :code-rst:`PS`
+     subdirectories to .png files and move them to the :code-rst:`model` and/or :code-rst:`obs` subdirectories, then
+     delete the :code-rst:`PS` subdirectories during the output generation stage. Users can retain the :code-rst:`PS`
+     directories and files by setting `save_ps` to `true` in the runtime configuration file.
+   * :code-rst:`example_multicase.log` contains POD-specific logging information in addition to some main logging messages
+     that is helpful when diagnosing issues.
+   * :code-rst:`example_multicase.data.log` has a list of processed data files that the POD read.
 
-If multiple PODs were run, you would find such a directory for each POD in the :code-rst:`MDTF_output` directory.
+If multiple PODs are run, you will find a directory for each POD in the :code-rst:`MDTF_output` directory.
@@ -24,11 +24,13 @@ The following block in your JSON or yml file is required for the submodule to la
        }
    },
 
-Where, ${MODULE_NAME} is the name for the package you want to launch a function from, ${FUNCTION_NAME} is the function you want to call, and ${FUNCTION_ARGS} is the arguments to be passed to the function.
+Where, ${MODULE_NAME} is the name for the package you want to launch a function from, ${FUNCTION_NAME} is the
+function you want to call, and ${FUNCTION_ARGS} is the arguments to be passed to the function.
 
 TempestExtremes Example
 ------------------------
-As an example, we will build and run TempestExtremes (TE) from MDTF. First, clone the latest TE with a python wrapper. As of writing, this can be found 'here <https://github.com/amberchen122/tempestextremes/>'_
+As an example, we will build and run TempestExtremes (TE) from MDTF. First, clone the latest TE with a python wrapper.
+See the`tempestExtremes example <https://github.com/amberchen122/tempestextremes/>`__
 In the cloned directory, it can be built using the commands:
 
 .. code-block::