Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #670

Merged
merged 5 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 40 additions & 28 deletions doc/sphinx/dev_git_intro.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
.. _ref-git-intro:
Git-based development workflow

Check warning on line 2 in doc/sphinx/dev_git_intro.rst

View workflow job for this annotation

GitHub Actions / build

Explicit markup ends without a blank line; unexpected unindent.
==============================
Steps for brand new users:
------------------------------
1. Fork the MDTF-diagnostics branch to your GitHub account (:ref:`ref-fork-code`)
2. Clone (:ref:`ref-clone`) your fork of the MDTF-diagnostics repository (repo) to your local machine
#. Fork the MDTF-diagnostics branch to your GitHub account (:ref:`ref-fork-code`)
#. Clone (:ref:`ref-clone`) your fork of the MDTF-diagnostics repository (repo) to your local machine
(if you are not using the web interface for development)
3. Check out a new branch from the local main branch (:ref:`ref-new-pod`)
4. Start coding
5. Commit the changes in your POD branch (:ref:`ref-new-pod`)
6. Push the changes to the copy of the POD branch on your remote fork (:ref:`ref-new-pod`)
7. Repeat steps 4--6 until you are finished working
8. Submit a pull request to the NOAA-GFDL repo for review (:ref:`ref-pull-request`).
#. Check out a new branch from the local main branch (:ref:`ref-new-pod`)
#. Start coding
#. Commit the changes in your POD branch (:ref:`ref-new-pod`)
#. Push the changes to the copy of the POD branch on your remote fork (:ref:`ref-new-pod`)
#. Repeat steps 4--6 until you are finished working
#. Submit a pull request to the NOAA-GFDL repo for review (:ref:`ref-pull-request`).

Steps for users continuing work on an existing POD branch
-------------------------------------------------------------
1. Create a backup copy of the MDTF-Diagnostics repo on your local machine
2. Pull in updates from the NOAA-GFDL/main branch to the main branch in your remote repo (:ref:`ref-update-main`)
3. Pull in updates from the main branch in your remote fork into the main branch in your local repo
#. Create a backup copy of the MDTF-Diagnostics repo on your local machine
#. Pull in updates from the NOAA-GFDL/main branch to the main branch in your remote repo (:ref:`ref-update-main`)
#. Pull in updates from the main branch in your remote fork into the main branch in your local repo
(:ref:`ref-update-main`)
4. Sync your POD branch in your local repository with the local main branch using an interactive rebase
#. Sync your POD branch in your local repository with the local main branch using an interactive rebase
(:ref:`ref-rebase`) or merge (:ref:`ref-merge`). Be sure to make a backup copy of of your local *MDTF-diagnostics*
repo first, and test your branch after rebasing/merging as described in the linked instructions before proceeding
to the next step.
5. Continue working on your POD branch
6. Commit the changes in your POD branch
7. Push the changes to the copy of the POD branch in your remote fork (:ref:`ref-push`)
8. Submit a pull request (PR) to NOAA-GFDL/main branch when your code is ready for review (:ref:`ref-pull-request`)
#. Continue working on your POD branch
#. Commit the changes in your POD branch
#. Push the changes to the copy of the POD branch in your remote fork (:ref:`ref-push`)
#. Submit a pull request (PR) to NOAA-GFDL/main branch when your code is ready for review (:ref:`ref-pull-request`)

.. _ref-fork-code:

Expand Down Expand Up @@ -68,30 +68,35 @@

Working on a brand new POD
------------------------------
Developers can either clone the MDTF-diagnostics repo to their computer, or manage the MDTF package using the GitHub webpage interface.
Developers can either clone the MDTF-diagnostics repo to their computer, or manage the MDTF package using the GitHub
webpage interface.
Whichever method you choose, remember to create your [POD branch name] branch from the main branch, not the main branch.
Since developers commonly work on their own machines, this manual provides command line instructions.

1. Check out a branch for your POD

::

git checkout -b [POD branch name]

2. Write code, add files, etc...

3. Add the files you created and/or modified to the staging area

::

git add [file 1]
git add [file 2]
...

4. Commit your changes, including a brief description

::

git commit -m "description of my changes"

5. Push the updates to your remote repository

::

git push -u origin [POD branch name]
Expand All @@ -117,27 +122,28 @@

To submit a PR :

1. Click the *Contribute* link on the main page of your MDTF-diagnostics fork and click the *Open Pull Request* button
#. Click the *Contribute* link on the main page of your MDTF-diagnostics fork and click the *Open Pull Request* button

2. Verify that your fork is set as the **base** repository, and *main* is set as the **base branch**,
#. Verify that your fork is set as the **base** repository, and *main* is set as the **base branch**,
that *NOAA-GFDL* is set as the **head repository**, and *main* is set as the **head** branch

3. Click the *Create Pull Request* button, add a brief description to the PR header, and go through the checklist to
#. Click the *Create Pull Request* button, add a brief description to the PR header, and go through the checklist to
ensure that your code meets that baseline requirements for review

4. Click the *Create Pull Request* button (now in the lower left corner of the message box)
#. Click the *Create Pull Request* button (now in the lower left corner of the message box)

Note that you can submit a Draft Pull Request if you want to run the code through the CI, but are not ready
for a full review by the framework team. Starting from step 3. above
Note that you can submit a Draft Pull Request if you want to run the code through the CI, but are not ready
for a full review by the framework team. Starting from step 3. above

1. Click the arrow on the right edge of the *Create Pull Request* button and select *Create draft pull request* from the dropdown menu.
#. Click the arrow on the right edge of the *Create Pull Request* button and select *Create draft pull request*
from the dropdown menu.

2. Continue pushing changes to your POD branch until you are ready for a review (the PR will update automatically)
#. Continue pushing changes to your POD branch until you are ready for a review (the PR will update automatically)

3. When you are ready for review, navigate to the NOAA-GFDL/MDTF-Diagnostics
#. When you are ready for review, navigate to the NOAA-GFDL/MDTF-Diagnostics
`*Pull requests* <https://github.com/NOAA-GFDL/MDTF-diagnostics/pulls>`__ page, and click on your PR

4. Scroll down to the header that states "this pull request is still a work in progress",
#. Scroll down to the header that states "this pull request is still a work in progress",
and click the *ready for review* button to move the PR out of *draft* mode

.. _ref-update-main:
Expand Down Expand Up @@ -193,6 +199,7 @@
2. Update the local and remote main branches on your fork as described in :ref:`ref-update-main`.

3. Check out your POD branch, and merge the main branch into your POD branch

::

git checkout [POD branch name]
Expand All @@ -201,13 +208,15 @@
4. Resolve any conflicts that occur from the merge

5. Add the updated files to the staging area

::

git add file1
git add file2
...

6. Push the branch updates to your remote fork

::

git push -u origin [POD branch name]
Expand All @@ -217,19 +226,22 @@
If you want to revert to the commit(s) before you pulled in updates:

1. Find the commit hash(es) with the updates, in your git log

::

git log

or consult the commit log in the web interface
or consult the commit log in the web interface

2. Revert each commit in order from newest to oldest

::

git revert <newer commit hash>
git revert <older commit hash>

3. Push the updates to the remote branch

::

git push origin [POD branch name]
Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx/dev_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ scripting languages, including, `R <https://anaconda.org/conda-forge/r-base>`__,
`NCL <https://anaconda.org/conda-forge/ncl>`__, `Ruby <https://anaconda.org/conda-forge/ruby>`__, etc...


Python-based PODs should be written in Python 3.12 or newer. We provide a developer version of the python3_base environment (described below) that includes Jupyter and other developer-specific tools. This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script:
Python-based PODs should be written in Python 3.12 or newer. We provide a developer version of the
python3_base environment (described below) that includes Jupyter and other developer-specific tools.
This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script:

If you are using Anaconda or miniconda to manage the conda environments, run:

Expand Down
81 changes: 79 additions & 2 deletions doc/sphinx/ref_catalogs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,82 @@
ESM-intake catalogs
===================

The MDTF-diagnostics uses `intake-ESM <https://intake-esm.readthedocs.io/en/stable/>`__ catalogs and APIs to access
model datasets and verify POD data requirements. The MDTF-diagnostics package provides a basic
The MDTF-diagnostics package uses `intake-ESM <https://intake-esm.readthedocs.io/en/stable/>`__ catalogs and APIs to
access model datasets and verify POD data requirements. Intake-ESM is a software package that uses
`intake <https://intake.readthedocs.io/en/latest/>`__ to load
catalog *assets*--netCDF or Zarr files and associated metadata--into a Pandas Dataframe or an xarray dataset.
Users can query catalogs and access data subsets according to desired date ranges, variables, simulations, and
other criteria without having to walk the directory structure or open files to verify information beforehand, making
them convenient and, depending on the location of the reference dataset, faster than on-the-fly file search methods.

Intake-ESM catalogs are generated using information from standardized directory structures and/or
file metadata using custom tools and or the ecgtools package following the intake-ESM recommendations. The final
output from the catalog generator will be a csv file populated with files and metadata, and a json header file that
points to the location of the csv file and contains information about the column headers. Users pass the json
header file to the intake-ESM `open-esm_datastore` utility so that it can parse the information in the csv file
to perform catalog queries.

.. code-block:: python

# define dictionary with catalog query info
query_dict = {}

query_dict['frequency'] = "day"
query_dict["realm"] = "atmos"
query_dict['standard_name'] = "air_temperature"
# open the intake-ESM catalog
cat = intake.open_esm_datastore("/path/to/data_catalog.json")
# query the catalog for data subset matching query_dict info
cat_subset = cat.search(**query_dict)

The MDTF-diagnostics package provides a basic :ref:`catalog builder tool <ref-catalog-builder>` built on top of
`ecgtools <https://github.com/ncar-xdev/ecgtools>`__ that has been tested with
CMIP6 and GFDL datasets. The catalog builder contains hooks for CESM datasets that will be implemented in later
versions of the MDTF-diagnostics package. GFDL also maintains a lightweight
`CatalogBuilder <https://github.com/NOAA-GFDL/CatalogBuilder>`__ that has been tested with GFDL and CMIP6 datasets.
Users may try both tools and select the one works best for their dataset and system, or create their own builder script.

Required catalog information
----------------------------

The following intake-ESM catalog columns must be populated for each file for MDTF-diagnostics functionality; other
columns are optional at this time but may be used to refine query results in future releases:

* activity_id: (str) the dataset convention:
* "CMIP"
* "CESM"
* "GFDL"
* file_path: (str) full path to the file
* frequency: (str) output frequency of the data; use the following CMIP definitions:
* sampled hourly = "1hr""sampled hourly"
* monthly-mean diurnal cycle resolving each day into 1-hour means = "1hrCM"
* sampled hourly at specified time point within an hour = "1hrPt"
* 3 hourly mean samples = "3hr"
* sampled 3 hourly at specified time point within the time period = "3hrPt"
* 6 hourly mean samples = "6hr"
* sampled 6 hourly at specified time point within the time period = "6hrPt"
* daily mean samples = "day"
* decadal mean samples = "dec"
* fixed (time invariant) field = "fx"
* monthly mean samples = "mon"
* monthly climatology computed from monthly mean samples = "monC"
* sampled monthly at specified time point within the time period = "monPt"
* sampled sub-hourly at specified time point within an hour = "subhrPt"
* annual mean samples = "yr"
* sampled yearly at specified time point within the time period = "yrPt"
* realm | modeling_realm: (str) model realm for the variable; use the following CMIP definitions:
* Aerosol = "aerosol"
* Atmosphere = "atmos"
* Atmospheric Chemistry = "atmosChem"
* Land Surface = "land"
* Land Ice = "landIce"
* Ocean = "ocean"
* Ocean Biogeochemistry = "ocnBgchem"
* Sea Ice = "seaIce"
* standard_name: (str) if a standard_name is not defined for a variable in the target file, use the equivalent CMIP6
standard_name or the long_name with underscores in place of spaces (e.g., air temperature -> air_temperature)
* time_range: (str or int) time range spanned by file with start_time and end_time separated by a '-', e.g.,:
* yyyy-mm-dd-yyyy-mm-dd
* yyyymmdd:HHMMSS-yyyymmdd:HHMMSS
* units: (str) variable units
* variable_id: (str) variable name id (e.g., temp, precip, PSL, TAUX)
42 changes: 28 additions & 14 deletions doc/sphinx/ref_output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ with the run in our :code-rst:`wkdir`. The resulting tree should look like this:
├── CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231.log
├── config_save.json
├── example_multicase/
├── example_multicase.data.log
├── index.html
├── MDTF_CMIP_Synthetic_r1i1p1f1_gr1_19800101-19841231/
├── MDTF_CMIP_Synthetic_r1i1p1f1_gr1_19850101-19891231/
Expand All @@ -32,14 +31,18 @@ with the run in our :code-rst:`wkdir`. The resulting tree should look like this:
└── MDTF_postprocessed_data.json

To explain the contents within:
* :code-rst:`index.html` is the html page used to consolidate the MDTF run for the end-user.
This serves as the main way to view all related plots and information for all PODs ran in a nice, condensed manner.
* :code-rst:`config_save.json` contains a copy of the runtime configuraton
* :code-rst:`index.html` is the html page used to consolidate the MDTF run results for the end-user.
Open this file in a web browser (e.g., :console:`% firefox index.html`) to view the figures and logs for each
POD.
* :code-rst:`MDTF_postprocessed_data.csv` and :code-rst:`MDTF_postprocessed_data.json` are the ESM-intake catalog
csv and json header files with information about the processed model data.
* The catalog points towards data that can be found in the folders :code-rst:`MDTF_CMIP_Synthetic_*`
* The rest of the files serve as a method of logging information about what the framework did and various issues that
might have occured. Information inside these files could greatly help both POD developers and the framework
development team!
* The catalog points towards data that can be found in the folders :code-rst:`MDTF_CMIP_Synthetic_*`.
To re-run the framework using the same processed dataset, set `DATA_CATALOG`
to the path to the :code-rst:`MDTF_processed_data.json` header file and set `run_pp` to `false` in the
runtime configuration file.
* The `.log` files contain framework and case-specific logging information. Please include information from these
logs in any issues related to running the framework that you submit to the MDTF-diagnostics team.

POD Output Directory
-------------------------------
Expand All @@ -53,14 +56,25 @@ This directory, :code-rst:`example_multicase`, contains all of the output for th
├── example_multicase.data.log
├── example_multicase.html
├── example_multicase.log
├── index.html
├── model/
└── obs/

These files and folders being:
* :code-rst:`example_multicase.html` serves as the landing page for the POD and can be easily reached from :code-rst:`index.html`.
* :code-rst:`case_info.yml` provides information about the cases ran for the POD.
* :code-rst:`model/` and :code-rst:`obs/` contain both plots and data for both the model data and observation data respectively.
* There also exists various log files which function the same as mentioned previously.
These files and folders are:
* :code-rst:`example_multicase.html` serves as the landing page for the POD and can be easily reached from
:code-rst:`index.html`.
* :code-rst:`case_info.yml` provides environment variables for each case. Multirun PODs can read and set the
environment variables from this file following the
`example_multicase.py <https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/diagnostics/example_multicase/example_multicase.py>`__
template
* :code-rst:`model/` and :code-rst:`obs/` contain both plots and data for both the model data and observation data
respectively. The framework appends a temporary :code-rst:`PS` subdirectory to the :code-rst:`model` and
:code-rst:`obs`directories where PODs can write postscript files instead of png files. The framework will convert
any .(e)ps files in the :code-rst:`PS`
subdirectories to .png files and move them to the :code-rst:`model` and/or :code-rst:`obs` subdirectories, then
delete the :code-rst:`PS` subdirectories during the output generation stage. Users can retain the :code-rst:`PS`
directories and files by setting `save_ps` to `true` in the runtime configuration file.
* :code-rst:`example_multicase.log` contains POD-specific logging information in addition to some main logging messages
that is helpful when diagnosing issues.
* :code-rst:`example_multicase.data.log` has a list of processed data files that the POD read.

If multiple PODs were run, you would find such a directory for each POD in the :code-rst:`MDTF_output` directory.
If multiple PODs are run, you will find a directory for each POD in the :code-rst:`MDTF_output` directory.
6 changes: 4 additions & 2 deletions doc/sphinx/ref_submodules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,13 @@ The following block in your JSON or yml file is required for the submodule to la
}
},

Where, ${MODULE_NAME} is the name for the package you want to launch a function from, ${FUNCTION_NAME} is the function you want to call, and ${FUNCTION_ARGS} is the arguments to be passed to the function.
Where, ${MODULE_NAME} is the name for the package you want to launch a function from, ${FUNCTION_NAME} is the
function you want to call, and ${FUNCTION_ARGS} is the arguments to be passed to the function.

TempestExtremes Example
------------------------
As an example, we will build and run TempestExtremes (TE) from MDTF. First, clone the latest TE with a python wrapper. As of writing, this can be found 'here <https://github.com/amberchen122/tempestextremes/>'_
As an example, we will build and run TempestExtremes (TE) from MDTF. First, clone the latest TE with a python wrapper.
See the`tempestExtremes example <https://github.com/amberchen122/tempestextremes/>`__
In the cloned directory, it can be built using the commands:

.. code-block::
Expand Down
Loading
Loading