Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDTF-diagnostics preprocessor update (#509) #510

Merged
merged 347 commits into from
Mar 8, 2024

Conversation

wrongkindofdoctor
Copy link
Collaborator

  • fix logging classes and procedure for datasourcebase class add routine to check the conda env specified by the POD for the required packages

  • Finalize the POD python package check in setup_pod

  • Change ref from util.Singleton to metaclass=util.Singleton since this is the correct way to inherit from the modified Singleton definition

  • modify the pod_setup conda package check to just verify the packages in PODs that require the python environment, since packages for NCL are not explicitly imported in the driver scripts

  • fix the ref to the coordinate table in the CMIP fieldlist

  • add calls to instantiate and populate translator object to the driver script

  • modify the VariableTranslator read_convention method and dependencies to handle objects in the Fieldlist files

  • clean up formatting add calls to register methods for abstract atts back to data_model.py

  • rework the register_coords method to accommodate multiple entries for a single standard name working on the call to Fieldlist.from_struct

  • add RegexDict function to basic module and util init module

  • change HorizontalCoord classes back to separate X and Y classes add try except block to catch duplicates to _DMDimensionsMixin build_axes method

  • update data model horizontal coord class refs in translation module move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import

  • remove atmos_realm and ocean_realm from modifers table and fieldlists change modeling_realm to realm in fieldlists

  • make realm a non-mandatory string attribute that defaults to an empty string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do

  • Refine _process_var and _process_coord methods to properly handle modifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there

  • change HorizonatalCoordinate refs to X and Y Coordinate refs to match the change in class names in varlist_util.py

  • add type hints and placeholder methods to DataSourceBase class

  • remove varlist_util.varlistCoordinateMixin class and move need_bounds attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util

  • add units defs to lat and lon dims in pod settings files

  • add axis defs to the lat and lon dimensions in the pod settings files

  • pass pod object to get_varlist instead of just the pod_vars in pod_setup define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call

  • add varlist_util with changes from prior commit

  • clean up dataclass.py formatting

  • replace var_dict parameter with parent in get_varlist method definition

  • minor cleanup of fieldlist and core modules

  • remove commented out line from varlist_util.py comment out convention match check for varlist setup development in pod_setup

  • check for time and set axis to T in data_model.coordinate_from_struct

  • add type hints to translation methods add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup
    change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods

  • fix dest_path definition in varlist_util add dummy attributes required by abstract methods in data_model clean up formatting
    add realm to from_CF method in translation

  • remove unnecessary pod_convention parameter from translation calls change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup

  • start re-working preprocessor module to make previous multirun classes the default

  • rename my_scripts to user_pp_scripts in template files

  • fix pod_setup status checks, add user_pp_scripts att and routine to set the att to the pod object

  • add call to add user-defined pp scripts to workflow to mdtf_framework.py

  • clean up pp module some more and add placeholder class for user-defined pp

  • add calls to instantiate MODEL_WORK_DIR and preprocessor objects to driver script refactor path_utils to divide MODEL_WORK_DIR and POD_WORK_DIR into separate objects, since the model work dir will be used by all PODs in a run, and does not need to be copied to each pod directory ancticipatory cleanup of preprocessor module
    add todos to query_fetch_preprocess module

  • add translate_data option to runtime config files refactor pod_setup to configure paths using podpathmanager and run translation based on new translate_data cli flag

  • refine path definitions in reconfigured path_utils setup update path object dependencies

  • move main log to framework run subdirectory

  • change PodObject._children to return case list values instead of None

  • edit preprocessor init methods and make DaskMultiFilePreprocessor the default pp class reorder the preprocessor init and pod setup calls in the driver script edit the runtime config template to point to oar.gfdl.mdtf conda installation

  • start defining proprocessor.query_catalog function and call

  • refine preprocessor catalog search criteria

  • move deactivate routine from core to util/basic move objectstatus from log module to basic module
    update src init module to reflect util mods

  • update deactivate calls to reflect routine changes in pod_setup and varlist_util

  • fix logic in translation.translate_coord and add assertion error message

  • define standard_name for esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv entries

  • fix standard_name defs in esm_catalog_CMIP_synthetic fit catalog path def in runtime_config.jsonc
    keep working on catalog query in preprocessor

  • rearrange calls in driver script

  • fix data path regex pattern and create aggregate catalog to return in preprocessor.query_catalog

  • rework catalog query and function parameters refactor edit_request calls in preprocessing routines

  • fix preprocessor edit_request interfaces

  • update edit_request and execute calls for each pp func remove refs to edit_request_wrapper-will prob delete since it is a PITA and can be replaced with something less confusing to handle alternates begin figuring out the whole dataset open-read procedure

  • add arguments for catalog_subset to preprocessor read_file functions remove extraneous classes from xr_parser and preprocessor add type hints to translation functions
    update runtime_config_template for local testing

  • notebook added

  • supporting configs added that was used for a test

  • 2 cases compared, with catalog from Ciheim generated by CatalogBuilder

  • fix formatting in output_manager and processes

  • add micromamba_exe parm to runtime_config templates

  • update demo

remove old demo and update new demo with 2 cases

  • remove call to conda check from mdtf_framework add support for micromamba to pod_setup module
    add micromamba_exe parms to config templates
    add temporary comments to config jsonc template

  • fix typos in util modules add routine to append row to pandas dataframe to util.basic

  • add procedure to create dataframe from preliminary intake catalog query work on modding check_group_range to create DateRange object from catalog start_time and end_time and append it existing dataframe

  • update _parse_input_string in datelabel module to accept colon delimiter, and extend accepted date string format description

  • add check_date_format routine to cli.py with additional accepted date string formats for startdate and enddate input data

  • fix catalog dataframe update procedure in query_catalog work on passing xarray dataset from catalog query to preprocessing functions

  • update init method for xr_parser DefaultDataParser and calls refine preprocessing rouinte
    start updating xr_parser methods
    remove unused preprocesor load and read methods

  • add handling for microseconds to datelabel DatePrecision

  • rorganize xr_parser parse method to handle catalog xarray dataset, comment out calls to methods that may no longer be needed add logging to DefaultDataParser class

  • remove unused SingleVarFilePreprocessor class and read_dataset methods refactor preprocessor parse method to only call xr_parser.parse increase precision of datestring returned by CropDateRangeFunction excute logger comment out AssociatedVariablesFunction since it is not yet implemented and lacks a use case at this time

  • fix cmip6.py formatting

  • set output_to_ncl preprocessor attribute reorganize pp routines and remove unused methods
    start refactoring write routines

  • add pod runtime settings attribute to pod object

  • update mdtf_framework function calls

  • update config parameter defs and calls in output_manager fix calls to preprocess method
    refactor assocVariablesFunction
    replace args with kwargs parm in preprocessor execute methods and pass keyword args to function calls

  • add preliminary calls to environment and runtime managers to mdtf driver add failed and active properties to pod_setup

  • update enviroment and output manager modules to use multirun config base classes add preliminary calls and routines to handle multirun html template generation to output manager

  • refactor example_multicase html templates into separate header and plot files

  • Remove unused modules

  • move tempdirmanager to util/filesystem.py

  • update calls to tempdirmanager methods with config parameter

  • clean up preprocessor and data_sources modules add assoc_files attribute to varlist_util.Varlist class

  • refactor environment_manager subprocessruntimemanager methods update calls in mdtf_framework.py

  • update modules in toc rst doc files start updating fmwk_cli.rst
    update dev_start.rst

  • comment out calls to attributes that are not set in logs module add case_dict parm to data_sources init in pod setup set new_work_dir to True in paths init in pod setup add iter_vars_only method to data_model.py
    define iter_vars_only attribute in data_sources DataSourceBase class update environment setup and subprocess spawn calls in environment_manager start refactoring output_manager module
    update methods calls in mdtf_framework.py

  • rename example_multicase_header.html to example_multicase.html

  • remove unused code from pod_setup continue refactoring output_manager
    remove dry_run parameter from subprocess methods and calls update output_manager calls in mdtf_framework

  • add type hints to cli.read_config_files and make parms lowercase

  • change WK_DIR to WORK_DIR in environment_manager

  • continue refactoring output_manager to work with ctx.config informaton

  • refactor tempdir class to work with ctx.config information

  • add TEMP_DIR_ROOT, unit_test, and _configs attributes to ctx.config move backup_config method and ConfigTuple defs from core module to mdtf_framework

  • remove config parm from tempdir_cleanup_handler and calls add keep_temp attribute to TempDirManager
    define attributes in TempDirManager
    clean up logs.py formatting

  • fix formatting in filesystem, path_utils, and enviroment_manager fix pod data output dir definition so that it doesn't doubly append dates case directories in path_utils

  • add placeholder method for pp catalog creation to preprocessor.py

  • add catalog module to src/util with methods for postprocessed data catalog creation

  • clean up datelabel module

  • modify find_json method to accept full filepath and do a simple search for a file in a directory consolidate read_config_file and read_config_files methods to parse a json using the MDTF root direcory, subdirectory tree, and file name

  • refine output file catalog attributes and assets definitions

  • clean up path_utils and date_label modules

  • add methods to parse output file directories and split file name parts to define attributes to catalog.py refine calls to catalog methods in preprocessor

  • set new_workdir option to False in pod_setup pathutils initialization

  • remove unused imports from cli.py

  • work on defining regex to isolate time_range in file name in catalog module

  • refine write_pp_catalog method move define_pp_catalog to catalog module

  • update columns in pp catalog

  • fix order of imports in util init file to avoid circular import error

  • refine catalog assets setup add output file path to catalog asset definition method

  • add calls to update ctx.config WORK_DIR and OUTPUT_DIR with values defined by model_paths atts

  • fix csv file name in os.path.join call in catalog.define_pp_catalog_assets

  • add logic to PathManager to check for existing MDTF_output subdirectory before appending it to a directory attribute

  • change find mindepth to 1 remove duplicate entries from filelist before returning it in catalog.get_file_list

  • add calls to validate catalog to preprocessor

  • add hacked version of save function to catalog.py to try an work around output file name issue

  • add call to new save method and debug

  • move case object from pod to its own dictionary in the main program and update usage in data_sources and pod_setup finalize catalog save method and update comments

  • add logging and error handling to preprocessor write_pp_catalog method

  • refactor environment manager routines to use separate case dictionary add CATALOG_FILE environment variable to case_info.yaml

  • update intake-esm versions in base and python3 base envs

  • replace call to custom catalog save util with esm-intake serialize method in preprocessor since version update fixed the fsspec issue

  • remove save_catalog from catalog.py since it is not needed with esm-intake version update

  • fix data_sources formatting

  • get rid of VarlistEntryMixin class and consolidate with VarlistEntry class make env_vars a class attribute instead of a property to avoid collision with the attribute defined in the data_sources parent class create new set_env_vars method to define VarlistEntry env_vars and add the call to Varlist.setup_var method

  • resolve merge conflicts

  • fix the CASE_LIST key in the case_info.yaml creation fix the micromamba_exe parameter call in the subprocess command definition

  • delete cli_plugins and template jsons

  • remove member_id from groupby_attrs def in catalog module

  • remove debugging lines and fix file close and memory cleanup in example_multicase POD

  • replace case_info.yml with updated example_case_info_output.yml in example_multicase directory

  • change file name from case_info.yaml to case_info.yml in environment_manager

  • update the information in the example_multicase html template

  • update env vars in albedofb.py

  • update env vars and clean up formatting in blocking_neale.py

  • update env vars and fix formatting in convective_transition_diag scripts

  • update env vars in enso mse and enso rws drivers

  • update env variables in eulerian_storm_track scripts, and clean up formatting in POD files

  • update env vars and fix formatting in example POD files

  • add realm to varlist env_vars

  • update mixed_layer_depth env vars and clean up formatting in driver and html files

  • clean up formatting in MJO_prop_amp driver and html files

  • update env_vars and fix formatting in MJO_prop_amp NCL scripts

  • update env_vars and clean up ENSO_RWS scripts

  • update blocking_neale scripts

  • update and clean up ENSO_MSE scripts

  • update MJO suite env vars

  • update and clean up MJO_teleconnection scripts

  • update ocn_surf_flux_diag env vars and clean up formatting

  • update precip_diurnal_cycle env_vars and fix file formatting

  • update seaice_suite env_vars and clean up file formatting

  • update SM_ET_coupling env dirs and fix formatting in files

  • update stc_annluar_modes env vars and fix file formatting

  • update stc_eddy_heat_fluxes env vars and clean up file formatting

  • clean up and update env vars in stc_eddy_heat_fluxes scripts

  • clean up and update env vars in stc_spv_extremes scripts

  • clean up and update env vars in stc_vert_wave_coupling scripts

  • clean up and update env vars in TC_MSE scripts

  • clean up and update env vars in TC_Rain scripts

  • clean up and update env vars in temp_extremes_distshape scripts

  • clean up and update env vars in top_heaviness_metric scripts

  • clean up and update env vars in tropical_pacific_sea_level scripts

  • clean up and update env vars in Wheeler-Kiladis scripts

  • update env_vars and clean up formatting in stc_qbo_enso scripts

  • change WK_DIR to WORK_DIR in forcing_feedback python files

  • bump intake_esm to v 2024.2.6 in base and python3_base conda env files

  • update precip_buoy_diag env vars and clean up formatting in html and python files

  • replace _query_error_handler calls with generic error logging add check for empty dataframe returned by initial catalog search add comments with more advanced regex queries to test if esm-intake fixes issues in _search method

  • remove core.py

  • add POD env vars to subprocess env for single-case PODS make the case_name env var CASENAME for consistency with existing definition

  • fix type checks in write_data_log_file and fix file cleanup logic in output_manager

  • bump netcdf4, h5py, matplotlib, pip, dask, and xarray versions in base and python3_base env files

  • fix output path for netcdf file and add checks to example_diag.py

  • close case_info.yml file after writing

  • add print_summary routine to mdtf_framework.py

  • clean up verify_links formatting

  • replace PodObject.cases with PodObject.multi_case_dict attribute to match the att that contains the necessary case info in environment_manager

  • remove redundant checks in output_manager

  • add log argument and pass main log file to print_summary in driver script

  • add calls to close varlist loggers to driver script

  • add env vars from pod settings to PodObject pod_env_vars object in pod_setup module

  • comment out log string with file path since it is causing issues in the io stream

  • formatting cleanup in environment_manager

  • add routine to append case list atts to html template dict for single-case config, and placeholder for multicase config to output_manager

  • clean up formatting in logs and exceptions modules

  • add comments to mdtf_framework.py

  • clean up lines that were too long in environment_manager

  • Create Test_Notebook.ipynb

  • add block to read in case_info yaml generated by framework to the test notebook

  • update figure name convention in example_multicase html and driver scripts

  • add _DoubleBraceTemplate to util/init.py and clean up filesystem.py formatting

  • refine generate_html_file_case_loop procedure, add file name parm to make_html, add error message to assertion statement in output_manager

  • update comments in example_multicase.html template

  • update multirun_config_template.jsonc

  • add flag to append html code for 1 figure per case to runtime_config.yml

  • add missing parenthesis to multirun_config_template.jsonc

  • update runtime_config.jsonc

  • update multirun_config_template.jsonc

  • move case information template generation to a separate method add boolean attribute to determine whether to run the case loop template generator clean up formatting in output_manager

  • clean up units.py formatting

  • change file arg to io stream of open handler and open the output file in append mode before writing the case inafo to the html output file in output_manager

  • remove old travis tests

  • remove unused test script

  • clean up pod_setup.py formatting

  • clean up and remove unused class from dataclass.py

  • remove unused class from and clean up basic.py

  • clean up util test scripts

  • rename src/tests/test_core.py and start refactoring

  • start refactoring unit tests

  • delete old tests and refacotr data_manager, diagnostic, and units tests

  • move user_pp_scripts attribute to from pod_setup to preprocessor add module loader procedure to import custom preprocessing python scripts to DaskMultFilePreprocessor init method

  • create an example custom preprocessing script

  • update comments in example pp script and refine the main routine loop

  • move json routines from filesystem to json_utils module

  • remove unused MDTFEnumInt class and fix _MDTFMixin init method

  • update util init.py to reflect module modifications

  • remove unused MDTFIntEnum test

  • remove unused VarlistEntryStage attribute and calls from varlist_utils remove unused deactivate_data_key method from VarlistEntry

  • add init.py to user_scripts

  • add json_utils methods to read in config file to example_pp_script

  • change example_pp_script.py to work on daily data and finish debugging main routine

  • set progressbar to fales in to_dataset_dict call example pp script

  • set progressbar to false in to_dataset_dict call in preprocessor debug the custom module load method

  • add check for unit_test attribute in config param to filesystem tempdirmanager

  • stop tracking example_multicase catalog update example_multicase config and environment yamls

  • rearrange user_pp_scripts call still need to debug calling custom module on xr ds and variable

  • add realm parameter to from_CF_name calls in translation and test/test_translation modules

  • add init module to main repo directory

  • remove unused add_row function from basic module

  • add test routine to example_pp_script.py

  • finalize custom script import procedure in preprocessor change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths

  • fix edit_request an execute functions and calls so that they return varlistentries or xarray datasets whether they perform operations or act as dummy functions

  • update example_multirun_demo notebook


Description
Include a summary of the change, and link the associated issue (if applicable).
List any dependencies that are required for this change, including libraries and variables,
and their metadata (units, frequencies, etc...). Be sure to separate PRs and issues for new PODs and PODs currently under development.

Associated issue # (replace this phrase and parentheses with the issue number)

How Has This Been Tested?
Please describe the tests that you ran to verify your changes in enough detail that
someone can reproduce them. Include any relevant details for your test configuration
such as the Python version, package versions, expected POD wallclock time, and the
operating system(s) you ran your tests on.

Checklist:

  • My branch is up-to-date with the NOAA-GFDL main branch, and all merge conflicts are resolved
  • The scripts are written in Python 3.11 or above (preferred; required if funded by a CPO grant), NCL, or R
  • All of my scripts are in the diagnostics/[POD short name] subdirectory, and include a main_driver script, template html, and settings.jsonc file
  • I have made corresponding changes to the documentation in the POD's doc/ subdirectory
  • I have requested that the framework developers add packages required by my POD to the python3, NCL, or R environment yaml file if necessary, and my environment builds with conda_env_setup.sh
  • I have added any necessary data to input_data/obs_data/[pod short name] and/or input_data/model/[pod short name]
  • My code is portable; it uses MDTF environment variables, and does not contain hard-coded file or directory paths
  • I have provided the code to generate digested data files from raw data files
  • Each digested data file generated by the script contains numerical data (no figures), and is 3 GB or less in size
  • I have included copies of the figures generated by the POD in the pull request
  • The repository contains no extra test scripts or data files

add calls to register methods for abstract atts back to data_model.py
… a single standard name

working on the call to Fieldlist.from_struct
add try except block to catch duplicates to _DMDimensionsMixin build_axes method
move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import
… string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do
…ifier and realm entries when defining fieldlist lookup tables

remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there
… the change in class names in varlist_util.py
… attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util
define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call
comment out convention match check for varlist setup development in pod_setup
add date_range attribute to datasourcebase and a method to set the date range
add call to set_date_range to pod_setup
change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods
add dummy attributes required by abstract methods in data_model
clean up formatting
add realm to from_CF method in translation
change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation'
refine logic when choosing to translate or not based on convention match in pod_setup
wrongkindofdoctor and others added 20 commits February 28, 2024 16:31
remove unused deactivate_data_key method from VarlistEntry
update example_multicase config and environment yamls
still need to debug calling custom module on xr ds and variable
change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths
…arlistentries

or xarray datasets whether they perform operations or act as dummy functions
* fix logging classes and procedure for datasourcebase class
add routine to check the conda env specified by the POD for the required packages

* Finalize the POD python package check in setup_pod

* Change ref from util.Singleton to metaclass=util.Singleton since this is the correct way to inherit from the modified Singleton definition

* modify the pod_setup conda package check to just verify the packages in PODs that require the python environment, since packages for NCL are not explicitly imported in the driver scripts

* fix the ref to the coordinate table in the CMIP fieldlist

* add calls to instantiate and populate translator object to the driver script

* modify the VariableTranslator read_convention method and dependencies to handle  objects in the Fieldlist files

* clean up formatting
add calls to register methods for abstract atts back to data_model.py

* rework the register_coords method to accommodate multiple entries for a single standard name
working on the call to Fieldlist.from_struct

* add RegexDict function to basic module and util init module

* change HorizontalCoord classes back to separate X and Y classes
add try except block to catch duplicates to _DMDimensionsMixin build_axes method

* update data model horizontal coord class refs in translation module
move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import

* remove atmos_realm and ocean_realm from modifers table and fieldlists
change modeling_realm to realm in fieldlists

* make realm a non-mandatory string attribute that defaults to an empty string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do

* Refine _process_var and _process_coord methods to properly handle modifier and realm entries when defining fieldlist lookup tables
remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there

* change HorizonatalCoordinate refs to X and Y Coordinate refs to match the change in class names in varlist_util.py

* add type hints and placeholder methods to DataSourceBase class

* remove varlist_util.varlistCoordinateMixin class and move need_bounds attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util

* add units defs to lat and lon dims in pod settings files

* add axis defs to the lat and lon  dimensions in the pod settings files

* pass pod object to get_varlist instead of just the pod_vars in pod_setup
define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call

* add varlist_util with changes from prior commit

* clean up dataclass.py formatting

* replace var_dict parameter with parent in get_varlist method definition

* minor cleanup of fieldlist and core modules

* remove commented out line from varlist_util.py
comment out convention match check for varlist setup development in pod_setup

* check for time and set axis to T in data_model.coordinate_from_struct

* add type hints to translation methods
add date_range attribute to datasourcebase and a method to set the date range
add call to set_date_range to pod_setup
change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods

* fix dest_path definition in varlist_util
add dummy attributes required by abstract methods in data_model
clean up formatting
add realm to from_CF method in translation

* remove unnecessary pod_convention parameter from translation calls
change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation'
refine logic when choosing to translate or not based on convention match in pod_setup

* start re-working preprocessor module to make previous multirun classes the default

* rename my_scripts to user_pp_scripts in template files

* fix pod_setup status checks, add user_pp_scripts att and routine to set the att to the pod object

* add call to add user-defined pp scripts to workflow to mdtf_framework.py

* clean up pp module some more and add placeholder class for user-defined pp

* add calls to instantiate MODEL_WORK_DIR and preprocessor objects to driver script
refactor path_utils to divide MODEL_WORK_DIR and POD_WORK_DIR into separate objects, since the model work dir will be used
by all PODs in a run, and does not need to be copied to each pod directory
ancticipatory cleanup of preprocessor module
add todos to query_fetch_preprocess module

* add translate_data option to runtime config files
refactor pod_setup to configure paths using podpathmanager  and run translation based on new translate_data cli flag

* refine path definitions in reconfigured path_utils setup
update path object dependencies

* move main log to framework run subdirectory

* change PodObject._children to return case list values instead of None

* edit preprocessor init methods and make DaskMultiFilePreprocessor the default pp class
reorder the preprocessor init and pod setup calls in the driver script
edit the runtime config template to point to oar.gfdl.mdtf conda installation

* start defining proprocessor.query_catalog function and call

* refine preprocessor catalog search criteria

* move deactivate routine from core to util/basic
move objectstatus from log module to basic module
update src init module to reflect util mods

* update deactivate calls to reflect routine changes in pod_setup and varlist_util

* fix logic in translation.translate_coord and add assertion error message

* define standard_name for esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv entries

* fix standard_name defs in esm_catalog_CMIP_synthetic
fit catalog path def in runtime_config.jsonc
keep working on catalog query in preprocessor

* rearrange calls in driver script

* fix data path regex pattern and create aggregate catalog to return in preprocessor.query_catalog

* rework catalog query and function parameters
refactor edit_request calls in preprocessing routines

* fix preprocessor edit_request interfaces

* update edit_request and execute calls for each pp func
remove refs to edit_request_wrapper-will prob delete since it is a PITA and can be replaced with something less confusing to handle alternates
begin figuring out the whole dataset open-read procedure

* add arguments for catalog_subset to preprocessor read_file functions
remove extraneous classes from xr_parser and preprocessor
add type hints to translation functions
update runtime_config_template for local testing

* notebook added

* supporting configs added that was used for a test

* 2 cases compared, with catalog from Ciheim generated by CatalogBuilder

* fix formatting in output_manager and processes

* add micromamba_exe parm to runtime_config templates

* update demo

remove old demo and update new demo with 2 cases

* remove call to conda check from mdtf_framework
add support for micromamba to pod_setup module
add micromamba_exe parms to config templates
add temporary comments to config jsonc template

* fix typos in util modules
add routine to append row to pandas dataframe to util.basic

* add procedure to create dataframe from preliminary intake catalog query
work on modding check_group_range to create DateRange object from catalog start_time and end_time and append it existing dataframe

* update _parse_input_string in datelabel module to accept colon delimiter, and extend accepted date string format description

* add check_date_format routine to cli.py with additional accepted date string formats for startdate and enddate input data

* fix catalog dataframe update procedure in query_catalog
work on passing xarray dataset from catalog query to preprocessing functions

* update init method for xr_parser DefaultDataParser and calls
refine preprocessing rouinte
start updating xr_parser methods
remove unused preprocesor load and read methods

* add handling for microseconds to datelabel DatePrecision

* rorganize xr_parser parse method to handle catalog xarray dataset,
comment out calls to methods that may no longer be needed
add logging to DefaultDataParser class

* remove unused SingleVarFilePreprocessor class and read_dataset methods
refactor preprocessor parse method to only call xr_parser.parse
increase precision of datestring returned by CropDateRangeFunction excute logger
comment out AssociatedVariablesFunction since it is not yet implemented and lacks a use case at this time

* fix cmip6.py formatting

* set output_to_ncl preprocessor attribute
reorganize pp routines and remove unused methods
start refactoring write routines

* add pod runtime settings attribute to pod object

* update mdtf_framework function calls

* update config parameter defs and calls in output_manager
fix calls to preprocess method
refactor assocVariablesFunction
replace args with kwargs parm in preprocessor execute methods and pass keyword args to function calls

* add preliminary calls to environment and runtime managers to mdtf driver
add failed and active properties to pod_setup

* update enviroment and output manager modules to use multirun config base classes
add preliminary calls and routines to handle multirun html template generation to output manager

* refactor example_multicase html templates into separate header and plot files

* Remove unused modules

* move tempdirmanager to util/filesystem.py

* update calls to tempdirmanager methods with config parameter

* clean up preprocessor and data_sources modules
add assoc_files attribute to varlist_util.Varlist class

* refactor environment_manager subprocessruntimemanager methods
update calls in mdtf_framework.py

* update modules in toc rst doc files
start updating fmwk_cli.rst
update dev_start.rst

* comment out calls to attributes that are not set in logs module
add case_dict parm to data_sources init in pod setup
set new_work_dir to True in paths init in pod setup
add iter_vars_only method to data_model.py
define iter_vars_only attribute in data_sources DataSourceBase class
update environment setup and subprocess spawn calls in environment_manager
start refactoring output_manager module
update methods calls in mdtf_framework.py

* rename example_multicase_header.html to example_multicase.html

* remove unused code from pod_setup
continue refactoring output_manager
remove dry_run parameter from subprocess methods and calls
update output_manager calls in mdtf_framework

* add type hints to cli.read_config_files and make parms lowercase

* change WK_DIR to WORK_DIR in environment_manager

* continue refactoring output_manager to work with ctx.config informaton

* refactor tempdir class to work with ctx.config information

* add TEMP_DIR_ROOT, unit_test, and _configs attributes to ctx.config
move backup_config method and ConfigTuple defs from core module to mdtf_framework

* remove config parm from tempdir_cleanup_handler and calls
add keep_temp attribute to TempDirManager
define attributes in TempDirManager
clean up logs.py formatting

* fix formatting in filesystem, path_utils, and enviroment_manager
fix pod data output dir definition so that it doesn't doubly append dates case directories in path_utils

* add placeholder method for pp catalog creation to preprocessor.py

* add catalog module to src/util with methods for postprocessed data catalog creation

* clean up datelabel module

* modify find_json method to accept full filepath and do a simple search for a file in a directory
consolidate read_config_file and read_config_files methods to parse a json using the MDTF root direcory, subdirectory tree, and file name

* refine output file catalog attributes and assets definitions

* clean up path_utils and date_label modules

* add methods to parse output file directories and split file name parts to define attributes to catalog.py
refine calls to catalog methods in preprocessor

* set new_workdir option to False in pod_setup pathutils initialization

* remove unused imports from cli.py

* work on defining regex to isolate time_range in file name in catalog module

* refine write_pp_catalog method
move define_pp_catalog to catalog module

* update columns in pp catalog

* fix order of imports in util init file to avoid circular import error

* refine catalog assets setup
add output file path to catalog asset definition method

* add calls to update ctx.config WORK_DIR and OUTPUT_DIR with values defined by model_paths atts

* fix csv file name in os.path.join call in catalog.define_pp_catalog_assets

* add logic to PathManager to check for existing MDTF_output subdirectory before appending it to a directory attribute

* change find mindepth to 1
remove duplicate entries from filelist before returning it in catalog.get_file_list

* add calls to validate catalog to preprocessor

* add hacked version of save function to catalog.py to try an work around output file name issue

* add call to new save method and debug

* move case object from pod to its own dictionary in the main program and update usage in data_sources and pod_setup
finalize catalog save method and update comments

* add logging and error handling to preprocessor write_pp_catalog method

* refactor environment manager routines to use separate case dictionary
add CATALOG_FILE environment variable to case_info.yaml

* update intake-esm versions in base and python3 base envs

* replace call to custom catalog save util with esm-intake serialize method in preprocessor since version update fixed the fsspec issue

* remove save_catalog from catalog.py since it is not needed with esm-intake version update

* fix data_sources formatting

* get rid of VarlistEntryMixin class and consolidate with VarlistEntry class
make env_vars a class attribute instead of a property to avoid collision with the attribute defined in the data_sources parent class
create new set_env_vars method to define VarlistEntry env_vars and add the call to Varlist.setup_var method

* resolve merge conflicts

* fix the CASE_LIST key in the case_info.yaml creation
fix the micromamba_exe parameter call in the subprocess command definition

* delete cli_plugins and template jsons

* remove member_id from groupby_attrs def in catalog module

* remove debugging lines and fix file close and memory cleanup in example_multicase POD

* replace case_info.yml with updated example_case_info_output.yml in example_multicase directory

* change file name from case_info.yaml to case_info.yml in environment_manager

* update the information in the example_multicase html template

* update env vars in albedofb.py

* update env vars and clean up formatting in blocking_neale.py

* update env vars and fix formatting in convective_transition_diag scripts

* update env vars in enso mse and enso rws drivers

* update env variables in eulerian_storm_track scripts, and clean up formatting in POD files

* update env vars and fix formatting in example POD files

* add realm to varlist env_vars

* update mixed_layer_depth env vars and clean up formatting in driver and html files

* clean up formatting in MJO_prop_amp driver and html files

* update env_vars and fix formatting in MJO_prop_amp NCL scripts

* update env_vars and clean up ENSO_RWS scripts

* update blocking_neale scripts

* update and clean up ENSO_MSE scripts

* update MJO suite env vars

* update and clean up MJO_teleconnection scripts

* update ocn_surf_flux_diag env vars and clean up formatting

* update precip_diurnal_cycle env_vars and fix file formatting

* update seaice_suite env_vars and clean up file formatting

* update SM_ET_coupling env dirs and fix formatting in files

* update stc_annluar_modes env vars and fix file formatting

* update stc_eddy_heat_fluxes env vars and clean up file formatting

* clean up and update env vars in stc_eddy_heat_fluxes scripts

* clean up and update env vars in stc_spv_extremes scripts

* clean up and update env vars in stc_vert_wave_coupling scripts

* clean up and update env vars in TC_MSE scripts

* clean up and update env vars in TC_Rain scripts

* clean up and update env vars in temp_extremes_distshape scripts

* clean up and update env vars in top_heaviness_metric scripts

* clean up and update env vars in tropical_pacific_sea_level scripts

* clean up and update env vars in Wheeler-Kiladis scripts

* update env_vars and clean up formatting in stc_qbo_enso scripts

* change WK_DIR to WORK_DIR in forcing_feedback python files

* bump intake_esm to v 2024.2.6 in base and python3_base conda env files

* update precip_buoy_diag env vars and clean up formatting in html and python files

* replace _query_error_handler calls with generic error logging
add check for empty dataframe returned by initial catalog search
add comments with more advanced regex queries to test if esm-intake fixes issues in _search method

* remove core.py

* add POD env vars to subprocess env for single-case PODS
make the case_name env var CASENAME for consistency with existing definition

* fix type checks in write_data_log_file and fix file cleanup logic in output_manager

* bump netcdf4, h5py, matplotlib, pip, dask, and xarray versions in base and python3_base env files

* fix output path for netcdf file and add checks to example_diag.py

* close case_info.yml file after writing

* add print_summary routine to mdtf_framework.py

* clean up verify_links formatting

* replace PodObject.cases with PodObject.multi_case_dict attribute to match the att that contains the necessary case info in environment_manager

* remove redundant checks in output_manager

* add log argument and pass main log file to print_summary in driver script

* add calls to close varlist loggers to driver script

* add env vars from pod settings to PodObject pod_env_vars object in pod_setup module

* comment out log string with file path since it is causing issues in the io stream

* formatting cleanup in environment_manager

* add routine to append case list atts to html template dict for single-case config, and placeholder for multicase config to output_manager

* clean up formatting in logs and exceptions modules

* add comments to mdtf_framework.py

* clean up lines that were too long in environment_manager

* Create Test_Notebook.ipynb

* add block to read in case_info yaml generated by framework to the test notebook

* update figure name convention in example_multicase html and driver scripts

* add _DoubleBraceTemplate to util/__init__.py and clean up filesystem.py formatting

* refine generate_html_file_case_loop procedure, add file name parm to make_html,
add error message to assertion statement in output_manager

* update comments in example_multicase.html template

* update multirun_config_template.jsonc

* add flag to append html code for 1 figure per case to runtime_config.yml

* add missing parenthesis to multirun_config_template.jsonc

* update runtime_config.jsonc

* update multirun_config_template.jsonc

* move case information template generation to a separate method
add boolean attribute to determine whether to run the case loop template generator
clean up formatting in output_manager

* clean up units.py formatting

* change file arg to io stream of open handler and open the output file in append mode before writing the case inafo to the html output file in output_manager

* remove old travis tests

* remove unused test script

* clean up pod_setup.py formatting

* clean up and remove unused class from dataclass.py

* remove unused class from and clean up basic.py

* clean up util test scripts

* rename src/tests/test_core.py and start refactoring

* start refactoring unit tests

* delete old tests and refacotr data_manager, diagnostic, and units tests

* move user_pp_scripts attribute to from pod_setup to preprocessor
add module loader procedure to import custom preprocessing python scripts to DaskMultFilePreprocessor init method

* create an example custom preprocessing script

* update comments in example pp script and refine the main routine loop

* move json routines from filesystem to json_utils module

* remove unused MDTFEnumInt class and fix _MDTFMixin init method

* update util __init__.py to reflect module modifications

* remove unused MDTFIntEnum test

* remove unused VarlistEntryStage attribute and calls from varlist_utils
remove unused deactivate_data_key method from VarlistEntry

* add __init__.py to user_scripts

* add json_utils methods to read in config file to example_pp_script

* change example_pp_script.py to work on daily data and finish debugging main routine

* set progressbar to fales in to_dataset_dict call example pp script

* set progressbar to false in to_dataset_dict call in preprocessor
debug the custom module load method

* add check for unit_test attribute in config param to filesystem tempdirmanager

* stop tracking example_multicase catalog
update example_multicase config and environment yamls

* rearrange user_pp_scripts call
still need to debug calling custom module on xr ds and variable

* add realm parameter to from_CF_name calls in translation and test/test_translation modules

* add init module to main repo directory

* remove unused add_row function from basic module

* add test routine to example_pp_script.py

* finalize custom script import procedure in preprocessor
change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths

* fix edit_request an execute functions and calls so that they return varlistentries
or xarray datasets whether they perform operations or act as dummy functions

* update example_multirun_demo notebook

---------

Co-authored-by: Aparna Radhakrishnan <[email protected]>
add refactor_pp branch to push block in mdtf_tests
@wrongkindofdoctor wrongkindofdoctor self-assigned this Mar 8, 2024
@wrongkindofdoctor wrongkindofdoctor marked this pull request as draft March 8, 2024 16:35
@wrongkindofdoctor wrongkindofdoctor added the framework Issue pertains to the framework code label Mar 8, 2024
@wrongkindofdoctor wrongkindofdoctor marked this pull request as ready for review March 8, 2024 16:42
@wrongkindofdoctor wrongkindofdoctor merged commit ea5d840 into NOAA-GFDL:refactor_pp Mar 8, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework Issue pertains to the framework code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants