MDTF-diagnostics preprocessor update (#509) #510

wrongkindofdoctor · 2024-03-08T16:34:27Z

fix logging classes and procedure for datasourcebase class add routine to check the conda env specified by the POD for the required packages
Finalize the POD python package check in setup_pod
Change ref from util.Singleton to metaclass=util.Singleton since this is the correct way to inherit from the modified Singleton definition
modify the pod_setup conda package check to just verify the packages in PODs that require the python environment, since packages for NCL are not explicitly imported in the driver scripts
fix the ref to the coordinate table in the CMIP fieldlist
add calls to instantiate and populate translator object to the driver script
modify the VariableTranslator read_convention method and dependencies to handle objects in the Fieldlist files
clean up formatting add calls to register methods for abstract atts back to data_model.py
rework the register_coords method to accommodate multiple entries for a single standard name working on the call to Fieldlist.from_struct
add RegexDict function to basic module and util init module
change HorizontalCoord classes back to separate X and Y classes add try except block to catch duplicates to _DMDimensionsMixin build_axes method
update data model horizontal coord class refs in translation module move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import
remove atmos_realm and ocean_realm from modifers table and fieldlists change modeling_realm to realm in fieldlists
make realm a non-mandatory string attribute that defaults to an empty string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do
Refine _process_var and _process_coord methods to properly handle modifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there
change HorizonatalCoordinate refs to X and Y Coordinate refs to match the change in class names in varlist_util.py
add type hints and placeholder methods to DataSourceBase class
remove varlist_util.varlistCoordinateMixin class and move need_bounds attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util
add units defs to lat and lon dims in pod settings files
add axis defs to the lat and lon dimensions in the pod settings files
pass pod object to get_varlist instead of just the pod_vars in pod_setup define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call
add varlist_util with changes from prior commit
clean up dataclass.py formatting
replace var_dict parameter with parent in get_varlist method definition
minor cleanup of fieldlist and core modules
remove commented out line from varlist_util.py comment out convention match check for varlist setup development in pod_setup
check for time and set axis to T in data_model.coordinate_from_struct
add type hints to translation methods add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup
change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods
fix dest_path definition in varlist_util add dummy attributes required by abstract methods in data_model clean up formatting
add realm to from_CF method in translation
remove unnecessary pod_convention parameter from translation calls change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup
start re-working preprocessor module to make previous multirun classes the default
rename my_scripts to user_pp_scripts in template files
fix pod_setup status checks, add user_pp_scripts att and routine to set the att to the pod object
add call to add user-defined pp scripts to workflow to mdtf_framework.py
clean up pp module some more and add placeholder class for user-defined pp
add calls to instantiate MODEL_WORK_DIR and preprocessor objects to driver script refactor path_utils to divide MODEL_WORK_DIR and POD_WORK_DIR into separate objects, since the model work dir will be used by all PODs in a run, and does not need to be copied to each pod directory ancticipatory cleanup of preprocessor module
add todos to query_fetch_preprocess module
add translate_data option to runtime config files refactor pod_setup to configure paths using podpathmanager and run translation based on new translate_data cli flag
refine path definitions in reconfigured path_utils setup update path object dependencies
move main log to framework run subdirectory
change PodObject._children to return case list values instead of None
edit preprocessor init methods and make DaskMultiFilePreprocessor the default pp class reorder the preprocessor init and pod setup calls in the driver script edit the runtime config template to point to oar.gfdl.mdtf conda installation
start defining proprocessor.query_catalog function and call
refine preprocessor catalog search criteria
move deactivate routine from core to util/basic move objectstatus from log module to basic module
update src init module to reflect util mods
update deactivate calls to reflect routine changes in pod_setup and varlist_util
fix logic in translation.translate_coord and add assertion error message
define standard_name for esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv entries
fix standard_name defs in esm_catalog_CMIP_synthetic fit catalog path def in runtime_config.jsonc
keep working on catalog query in preprocessor
rearrange calls in driver script
fix data path regex pattern and create aggregate catalog to return in preprocessor.query_catalog
rework catalog query and function parameters refactor edit_request calls in preprocessing routines
fix preprocessor edit_request interfaces
update edit_request and execute calls for each pp func remove refs to edit_request_wrapper-will prob delete since it is a PITA and can be replaced with something less confusing to handle alternates begin figuring out the whole dataset open-read procedure
add arguments for catalog_subset to preprocessor read_file functions remove extraneous classes from xr_parser and preprocessor add type hints to translation functions
update runtime_config_template for local testing
notebook added
supporting configs added that was used for a test
2 cases compared, with catalog from Ciheim generated by CatalogBuilder
fix formatting in output_manager and processes
add micromamba_exe parm to runtime_config templates
update demo

remove old demo and update new demo with 2 cases

remove call to conda check from mdtf_framework add support for micromamba to pod_setup module
add micromamba_exe parms to config templates
add temporary comments to config jsonc template
fix typos in util modules add routine to append row to pandas dataframe to util.basic
add procedure to create dataframe from preliminary intake catalog query work on modding check_group_range to create DateRange object from catalog start_time and end_time and append it existing dataframe
update _parse_input_string in datelabel module to accept colon delimiter, and extend accepted date string format description
add check_date_format routine to cli.py with additional accepted date string formats for startdate and enddate input data
fix catalog dataframe update procedure in query_catalog work on passing xarray dataset from catalog query to preprocessing functions
update init method for xr_parser DefaultDataParser and calls refine preprocessing rouinte
start updating xr_parser methods
remove unused preprocesor load and read methods
add handling for microseconds to datelabel DatePrecision
rorganize xr_parser parse method to handle catalog xarray dataset, comment out calls to methods that may no longer be needed add logging to DefaultDataParser class
remove unused SingleVarFilePreprocessor class and read_dataset methods refactor preprocessor parse method to only call xr_parser.parse increase precision of datestring returned by CropDateRangeFunction excute logger comment out AssociatedVariablesFunction since it is not yet implemented and lacks a use case at this time
fix cmip6.py formatting
set output_to_ncl preprocessor attribute reorganize pp routines and remove unused methods
start refactoring write routines
add pod runtime settings attribute to pod object
update mdtf_framework function calls
update config parameter defs and calls in output_manager fix calls to preprocess method
refactor assocVariablesFunction
replace args with kwargs parm in preprocessor execute methods and pass keyword args to function calls
add preliminary calls to environment and runtime managers to mdtf driver add failed and active properties to pod_setup
update enviroment and output manager modules to use multirun config base classes add preliminary calls and routines to handle multirun html template generation to output manager
refactor example_multicase html templates into separate header and plot files
Remove unused modules
move tempdirmanager to util/filesystem.py
update calls to tempdirmanager methods with config parameter
clean up preprocessor and data_sources modules add assoc_files attribute to varlist_util.Varlist class
refactor environment_manager subprocessruntimemanager methods update calls in mdtf_framework.py
update modules in toc rst doc files start updating fmwk_cli.rst
update dev_start.rst
comment out calls to attributes that are not set in logs module add case_dict parm to data_sources init in pod setup set new_work_dir to True in paths init in pod setup add iter_vars_only method to data_model.py
define iter_vars_only attribute in data_sources DataSourceBase class update environment setup and subprocess spawn calls in environment_manager start refactoring output_manager module
update methods calls in mdtf_framework.py
rename example_multicase_header.html to example_multicase.html
remove unused code from pod_setup continue refactoring output_manager
remove dry_run parameter from subprocess methods and calls update output_manager calls in mdtf_framework
add type hints to cli.read_config_files and make parms lowercase
change WK_DIR to WORK_DIR in environment_manager
continue refactoring output_manager to work with ctx.config informaton
refactor tempdir class to work with ctx.config information
add TEMP_DIR_ROOT, unit_test, and _configs attributes to ctx.config move backup_config method and ConfigTuple defs from core module to mdtf_framework
remove config parm from tempdir_cleanup_handler and calls add keep_temp attribute to TempDirManager
define attributes in TempDirManager
clean up logs.py formatting
fix formatting in filesystem, path_utils, and enviroment_manager fix pod data output dir definition so that it doesn't doubly append dates case directories in path_utils
add placeholder method for pp catalog creation to preprocessor.py
add catalog module to src/util with methods for postprocessed data catalog creation
clean up datelabel module
modify find_json method to accept full filepath and do a simple search for a file in a directory consolidate read_config_file and read_config_files methods to parse a json using the MDTF root direcory, subdirectory tree, and file name
refine output file catalog attributes and assets definitions
clean up path_utils and date_label modules
add methods to parse output file directories and split file name parts to define attributes to catalog.py refine calls to catalog methods in preprocessor
set new_workdir option to False in pod_setup pathutils initialization
remove unused imports from cli.py
work on defining regex to isolate time_range in file name in catalog module
refine write_pp_catalog method move define_pp_catalog to catalog module
update columns in pp catalog
fix order of imports in util init file to avoid circular import error
refine catalog assets setup add output file path to catalog asset definition method
add calls to update ctx.config WORK_DIR and OUTPUT_DIR with values defined by model_paths atts
fix csv file name in os.path.join call in catalog.define_pp_catalog_assets
add logic to PathManager to check for existing MDTF_output subdirectory before appending it to a directory attribute
change find mindepth to 1 remove duplicate entries from filelist before returning it in catalog.get_file_list
add calls to validate catalog to preprocessor
add hacked version of save function to catalog.py to try an work around output file name issue
add call to new save method and debug
move case object from pod to its own dictionary in the main program and update usage in data_sources and pod_setup finalize catalog save method and update comments
add logging and error handling to preprocessor write_pp_catalog method
refactor environment manager routines to use separate case dictionary add CATALOG_FILE environment variable to case_info.yaml
update intake-esm versions in base and python3 base envs
replace call to custom catalog save util with esm-intake serialize method in preprocessor since version update fixed the fsspec issue
remove save_catalog from catalog.py since it is not needed with esm-intake version update
fix data_sources formatting
get rid of VarlistEntryMixin class and consolidate with VarlistEntry class make env_vars a class attribute instead of a property to avoid collision with the attribute defined in the data_sources parent class create new set_env_vars method to define VarlistEntry env_vars and add the call to Varlist.setup_var method
resolve merge conflicts
fix the CASE_LIST key in the case_info.yaml creation fix the micromamba_exe parameter call in the subprocess command definition
delete cli_plugins and template jsons
remove member_id from groupby_attrs def in catalog module
remove debugging lines and fix file close and memory cleanup in example_multicase POD
replace case_info.yml with updated example_case_info_output.yml in example_multicase directory
change file name from case_info.yaml to case_info.yml in environment_manager
update the information in the example_multicase html template
update env vars in albedofb.py
update env vars and clean up formatting in blocking_neale.py
update env vars and fix formatting in convective_transition_diag scripts
update env vars in enso mse and enso rws drivers
update env variables in eulerian_storm_track scripts, and clean up formatting in POD files
update env vars and fix formatting in example POD files
add realm to varlist env_vars
update mixed_layer_depth env vars and clean up formatting in driver and html files
clean up formatting in MJO_prop_amp driver and html files
update env_vars and fix formatting in MJO_prop_amp NCL scripts
update env_vars and clean up ENSO_RWS scripts
update blocking_neale scripts
update and clean up ENSO_MSE scripts
update MJO suite env vars
update and clean up MJO_teleconnection scripts
update ocn_surf_flux_diag env vars and clean up formatting
update precip_diurnal_cycle env_vars and fix file formatting
update seaice_suite env_vars and clean up file formatting
update SM_ET_coupling env dirs and fix formatting in files
update stc_annluar_modes env vars and fix file formatting
update stc_eddy_heat_fluxes env vars and clean up file formatting
clean up and update env vars in stc_eddy_heat_fluxes scripts
clean up and update env vars in stc_spv_extremes scripts
clean up and update env vars in stc_vert_wave_coupling scripts
clean up and update env vars in TC_MSE scripts
clean up and update env vars in TC_Rain scripts
clean up and update env vars in temp_extremes_distshape scripts
clean up and update env vars in top_heaviness_metric scripts
clean up and update env vars in tropical_pacific_sea_level scripts
clean up and update env vars in Wheeler-Kiladis scripts
update env_vars and clean up formatting in stc_qbo_enso scripts
change WK_DIR to WORK_DIR in forcing_feedback python files
bump intake_esm to v 2024.2.6 in base and python3_base conda env files
update precip_buoy_diag env vars and clean up formatting in html and python files
replace _query_error_handler calls with generic error logging add check for empty dataframe returned by initial catalog search add comments with more advanced regex queries to test if esm-intake fixes issues in _search method
remove core.py
add POD env vars to subprocess env for single-case PODS make the case_name env var CASENAME for consistency with existing definition
fix type checks in write_data_log_file and fix file cleanup logic in output_manager
bump netcdf4, h5py, matplotlib, pip, dask, and xarray versions in base and python3_base env files
fix output path for netcdf file and add checks to example_diag.py
close case_info.yml file after writing
add print_summary routine to mdtf_framework.py
clean up verify_links formatting
replace PodObject.cases with PodObject.multi_case_dict attribute to match the att that contains the necessary case info in environment_manager
remove redundant checks in output_manager
add log argument and pass main log file to print_summary in driver script
add calls to close varlist loggers to driver script
add env vars from pod settings to PodObject pod_env_vars object in pod_setup module
comment out log string with file path since it is causing issues in the io stream
formatting cleanup in environment_manager
add routine to append case list atts to html template dict for single-case config, and placeholder for multicase config to output_manager
clean up formatting in logs and exceptions modules
add comments to mdtf_framework.py
clean up lines that were too long in environment_manager
Create Test_Notebook.ipynb
add block to read in case_info yaml generated by framework to the test notebook
update figure name convention in example_multicase html and driver scripts
add _DoubleBraceTemplate to util/init.py and clean up filesystem.py formatting
refine generate_html_file_case_loop procedure, add file name parm to make_html, add error message to assertion statement in output_manager
update comments in example_multicase.html template
update multirun_config_template.jsonc
add flag to append html code for 1 figure per case to runtime_config.yml
add missing parenthesis to multirun_config_template.jsonc
update runtime_config.jsonc
update multirun_config_template.jsonc
move case information template generation to a separate method add boolean attribute to determine whether to run the case loop template generator clean up formatting in output_manager
clean up units.py formatting
change file arg to io stream of open handler and open the output file in append mode before writing the case inafo to the html output file in output_manager
remove old travis tests
remove unused test script
clean up pod_setup.py formatting
clean up and remove unused class from dataclass.py
remove unused class from and clean up basic.py
clean up util test scripts
rename src/tests/test_core.py and start refactoring
start refactoring unit tests
delete old tests and refacotr data_manager, diagnostic, and units tests
move user_pp_scripts attribute to from pod_setup to preprocessor add module loader procedure to import custom preprocessing python scripts to DaskMultFilePreprocessor init method
create an example custom preprocessing script
update comments in example pp script and refine the main routine loop
move json routines from filesystem to json_utils module
remove unused MDTFEnumInt class and fix _MDTFMixin init method
update util init.py to reflect module modifications
remove unused MDTFIntEnum test
remove unused VarlistEntryStage attribute and calls from varlist_utils remove unused deactivate_data_key method from VarlistEntry
add init.py to user_scripts
add json_utils methods to read in config file to example_pp_script
change example_pp_script.py to work on daily data and finish debugging main routine
set progressbar to fales in to_dataset_dict call example pp script
set progressbar to false in to_dataset_dict call in preprocessor debug the custom module load method
add check for unit_test attribute in config param to filesystem tempdirmanager
stop tracking example_multicase catalog update example_multicase config and environment yamls
rearrange user_pp_scripts call still need to debug calling custom module on xr ds and variable
add realm parameter to from_CF_name calls in translation and test/test_translation modules
add init module to main repo directory
remove unused add_row function from basic module
add test routine to example_pp_script.py
finalize custom script import procedure in preprocessor change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths
fix edit_request an execute functions and calls so that they return varlistentries or xarray datasets whether they perform operations or act as dummy functions
update example_multirun_demo notebook

Description
Include a summary of the change, and link the associated issue (if applicable).
List any dependencies that are required for this change, including libraries and variables,
and their metadata (units, frequencies, etc...). Be sure to separate PRs and issues for new PODs and PODs currently under development.

Associated issue # (replace this phrase and parentheses with the issue number)

How Has This Been Tested?
Please describe the tests that you ran to verify your changes in enough detail that
someone can reproduce them. Include any relevant details for your test configuration
such as the Python version, package versions, expected POD wallclock time, and the
operating system(s) you ran your tests on.

Checklist:

… to handle objects in the Fieldlist files

add calls to register methods for abstract atts back to data_model.py

… a single standard name working on the call to Fieldlist.from_struct

add try except block to catch duplicates to _DMDimensionsMixin build_axes method

move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import

change modeling_realm to realm in fieldlists

… string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do

…ifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there

… the change in class names in varlist_util.py

… attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util

define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call

comment out convention match check for varlist setup development in pod_setup

add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods

add dummy attributes required by abstract methods in data_model clean up formatting add realm to from_CF method in translation

change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup

…s the default

…et the att to the pod object

remove unused deactivate_data_key method from VarlistEntry

…g main routine

debug the custom module load method

…irmanager

update example_multicase config and environment yamls

still need to debug calling custom module on xr ds and variable

…t_translation modules

change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths

…arlistentries or xarray datasets whether they perform operations or act as dummy functions

* fix logging classes and procedure for datasourcebase class add routine to check the conda env specified by the POD for the required packages * Finalize the POD python package check in setup_pod * Change ref from util.Singleton to metaclass=util.Singleton since this is the correct way to inherit from the modified Singleton definition * modify the pod_setup conda package check to just verify the packages in PODs that require the python environment, since packages for NCL are not explicitly imported in the driver scripts * fix the ref to the coordinate table in the CMIP fieldlist * add calls to instantiate and populate translator object to the driver script * modify the VariableTranslator read_convention method and dependencies to handle objects in the Fieldlist files * clean up formatting add calls to register methods for abstract atts back to data_model.py * rework the register_coords method to accommodate multiple entries for a single standard name working on the call to Fieldlist.from_struct * add RegexDict function to basic module and util init module * change HorizontalCoord classes back to separate X and Y classes add try except block to catch duplicates to _DMDimensionsMixin build_axes method * update data model horizontal coord class refs in translation module move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import * remove atmos_realm and ocean_realm from modifers table and fieldlists change modeling_realm to realm in fieldlists * make realm a non-mandatory string attribute that defaults to an empty string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do * Refine _process_var and _process_coord methods to properly handle modifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there * change HorizonatalCoordinate refs to X and Y Coordinate refs to match the change in class names in varlist_util.py * add type hints and placeholder methods to DataSourceBase class * remove varlist_util.varlistCoordinateMixin class and move need_bounds attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util * add units defs to lat and lon dims in pod settings files * add axis defs to the lat and lon dimensions in the pod settings files * pass pod object to get_varlist instead of just the pod_vars in pod_setup define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call * add varlist_util with changes from prior commit * clean up dataclass.py formatting * replace var_dict parameter with parent in get_varlist method definition * minor cleanup of fieldlist and core modules * remove commented out line from varlist_util.py comment out convention match check for varlist setup development in pod_setup * check for time and set axis to T in data_model.coordinate_from_struct * add type hints to translation methods add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods * fix dest_path definition in varlist_util add dummy attributes required by abstract methods in data_model clean up formatting add realm to from_CF method in translation * remove unnecessary pod_convention parameter from translation calls change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup * start re-working preprocessor module to make previous multirun classes the default * rename my_scripts to user_pp_scripts in template files * fix pod_setup status checks, add user_pp_scripts att and routine to set the att to the pod object * add call to add user-defined pp scripts to workflow to mdtf_framework.py * clean up pp module some more and add placeholder class for user-defined pp * add calls to instantiate MODEL_WORK_DIR and preprocessor objects to driver script refactor path_utils to divide MODEL_WORK_DIR and POD_WORK_DIR into separate objects, since the model work dir will be used by all PODs in a run, and does not need to be copied to each pod directory ancticipatory cleanup of preprocessor module add todos to query_fetch_preprocess module * add translate_data option to runtime config files refactor pod_setup to configure paths using podpathmanager and run translation based on new translate_data cli flag * refine path definitions in reconfigured path_utils setup update path object dependencies * move main log to framework run subdirectory * change PodObject._children to return case list values instead of None * edit preprocessor init methods and make DaskMultiFilePreprocessor the default pp class reorder the preprocessor init and pod setup calls in the driver script edit the runtime config template to point to oar.gfdl.mdtf conda installation * start defining proprocessor.query_catalog function and call * refine preprocessor catalog search criteria * move deactivate routine from core to util/basic move objectstatus from log module to basic module update src init module to reflect util mods * update deactivate calls to reflect routine changes in pod_setup and varlist_util * fix logic in translation.translate_coord and add assertion error message * define standard_name for esm_catalog_CMIP_synthetic_r1i1p1f1_gr1.csv entries * fix standard_name defs in esm_catalog_CMIP_synthetic fit catalog path def in runtime_config.jsonc keep working on catalog query in preprocessor * rearrange calls in driver script * fix data path regex pattern and create aggregate catalog to return in preprocessor.query_catalog * rework catalog query and function parameters refactor edit_request calls in preprocessing routines * fix preprocessor edit_request interfaces * update edit_request and execute calls for each pp func remove refs to edit_request_wrapper-will prob delete since it is a PITA and can be replaced with something less confusing to handle alternates begin figuring out the whole dataset open-read procedure * add arguments for catalog_subset to preprocessor read_file functions remove extraneous classes from xr_parser and preprocessor add type hints to translation functions update runtime_config_template for local testing * notebook added * supporting configs added that was used for a test * 2 cases compared, with catalog from Ciheim generated by CatalogBuilder * fix formatting in output_manager and processes * add micromamba_exe parm to runtime_config templates * update demo remove old demo and update new demo with 2 cases * remove call to conda check from mdtf_framework add support for micromamba to pod_setup module add micromamba_exe parms to config templates add temporary comments to config jsonc template * fix typos in util modules add routine to append row to pandas dataframe to util.basic * add procedure to create dataframe from preliminary intake catalog query work on modding check_group_range to create DateRange object from catalog start_time and end_time and append it existing dataframe * update _parse_input_string in datelabel module to accept colon delimiter, and extend accepted date string format description * add check_date_format routine to cli.py with additional accepted date string formats for startdate and enddate input data * fix catalog dataframe update procedure in query_catalog work on passing xarray dataset from catalog query to preprocessing functions * update init method for xr_parser DefaultDataParser and calls refine preprocessing rouinte start updating xr_parser methods remove unused preprocesor load and read methods * add handling for microseconds to datelabel DatePrecision * rorganize xr_parser parse method to handle catalog xarray dataset, comment out calls to methods that may no longer be needed add logging to DefaultDataParser class * remove unused SingleVarFilePreprocessor class and read_dataset methods refactor preprocessor parse method to only call xr_parser.parse increase precision of datestring returned by CropDateRangeFunction excute logger comment out AssociatedVariablesFunction since it is not yet implemented and lacks a use case at this time * fix cmip6.py formatting * set output_to_ncl preprocessor attribute reorganize pp routines and remove unused methods start refactoring write routines * add pod runtime settings attribute to pod object * update mdtf_framework function calls * update config parameter defs and calls in output_manager fix calls to preprocess method refactor assocVariablesFunction replace args with kwargs parm in preprocessor execute methods and pass keyword args to function calls * add preliminary calls to environment and runtime managers to mdtf driver add failed and active properties to pod_setup * update enviroment and output manager modules to use multirun config base classes add preliminary calls and routines to handle multirun html template generation to output manager * refactor example_multicase html templates into separate header and plot files * Remove unused modules * move tempdirmanager to util/filesystem.py * update calls to tempdirmanager methods with config parameter * clean up preprocessor and data_sources modules add assoc_files attribute to varlist_util.Varlist class * refactor environment_manager subprocessruntimemanager methods update calls in mdtf_framework.py * update modules in toc rst doc files start updating fmwk_cli.rst update dev_start.rst * comment out calls to attributes that are not set in logs module add case_dict parm to data_sources init in pod setup set new_work_dir to True in paths init in pod setup add iter_vars_only method to data_model.py define iter_vars_only attribute in data_sources DataSourceBase class update environment setup and subprocess spawn calls in environment_manager start refactoring output_manager module update methods calls in mdtf_framework.py * rename example_multicase_header.html to example_multicase.html * remove unused code from pod_setup continue refactoring output_manager remove dry_run parameter from subprocess methods and calls update output_manager calls in mdtf_framework * add type hints to cli.read_config_files and make parms lowercase * change WK_DIR to WORK_DIR in environment_manager * continue refactoring output_manager to work with ctx.config informaton * refactor tempdir class to work with ctx.config information * add TEMP_DIR_ROOT, unit_test, and _configs attributes to ctx.config move backup_config method and ConfigTuple defs from core module to mdtf_framework * remove config parm from tempdir_cleanup_handler and calls add keep_temp attribute to TempDirManager define attributes in TempDirManager clean up logs.py formatting * fix formatting in filesystem, path_utils, and enviroment_manager fix pod data output dir definition so that it doesn't doubly append dates case directories in path_utils * add placeholder method for pp catalog creation to preprocessor.py * add catalog module to src/util with methods for postprocessed data catalog creation * clean up datelabel module * modify find_json method to accept full filepath and do a simple search for a file in a directory consolidate read_config_file and read_config_files methods to parse a json using the MDTF root direcory, subdirectory tree, and file name * refine output file catalog attributes and assets definitions * clean up path_utils and date_label modules * add methods to parse output file directories and split file name parts to define attributes to catalog.py refine calls to catalog methods in preprocessor * set new_workdir option to False in pod_setup pathutils initialization * remove unused imports from cli.py * work on defining regex to isolate time_range in file name in catalog module * refine write_pp_catalog method move define_pp_catalog to catalog module * update columns in pp catalog * fix order of imports in util init file to avoid circular import error * refine catalog assets setup add output file path to catalog asset definition method * add calls to update ctx.config WORK_DIR and OUTPUT_DIR with values defined by model_paths atts * fix csv file name in os.path.join call in catalog.define_pp_catalog_assets * add logic to PathManager to check for existing MDTF_output subdirectory before appending it to a directory attribute * change find mindepth to 1 remove duplicate entries from filelist before returning it in catalog.get_file_list * add calls to validate catalog to preprocessor * add hacked version of save function to catalog.py to try an work around output file name issue * add call to new save method and debug * move case object from pod to its own dictionary in the main program and update usage in data_sources and pod_setup finalize catalog save method and update comments * add logging and error handling to preprocessor write_pp_catalog method * refactor environment manager routines to use separate case dictionary add CATALOG_FILE environment variable to case_info.yaml * update intake-esm versions in base and python3 base envs * replace call to custom catalog save util with esm-intake serialize method in preprocessor since version update fixed the fsspec issue * remove save_catalog from catalog.py since it is not needed with esm-intake version update * fix data_sources formatting * get rid of VarlistEntryMixin class and consolidate with VarlistEntry class make env_vars a class attribute instead of a property to avoid collision with the attribute defined in the data_sources parent class create new set_env_vars method to define VarlistEntry env_vars and add the call to Varlist.setup_var method * resolve merge conflicts * fix the CASE_LIST key in the case_info.yaml creation fix the micromamba_exe parameter call in the subprocess command definition * delete cli_plugins and template jsons * remove member_id from groupby_attrs def in catalog module * remove debugging lines and fix file close and memory cleanup in example_multicase POD * replace case_info.yml with updated example_case_info_output.yml in example_multicase directory * change file name from case_info.yaml to case_info.yml in environment_manager * update the information in the example_multicase html template * update env vars in albedofb.py * update env vars and clean up formatting in blocking_neale.py * update env vars and fix formatting in convective_transition_diag scripts * update env vars in enso mse and enso rws drivers * update env variables in eulerian_storm_track scripts, and clean up formatting in POD files * update env vars and fix formatting in example POD files * add realm to varlist env_vars * update mixed_layer_depth env vars and clean up formatting in driver and html files * clean up formatting in MJO_prop_amp driver and html files * update env_vars and fix formatting in MJO_prop_amp NCL scripts * update env_vars and clean up ENSO_RWS scripts * update blocking_neale scripts * update and clean up ENSO_MSE scripts * update MJO suite env vars * update and clean up MJO_teleconnection scripts * update ocn_surf_flux_diag env vars and clean up formatting * update precip_diurnal_cycle env_vars and fix file formatting * update seaice_suite env_vars and clean up file formatting * update SM_ET_coupling env dirs and fix formatting in files * update stc_annluar_modes env vars and fix file formatting * update stc_eddy_heat_fluxes env vars and clean up file formatting * clean up and update env vars in stc_eddy_heat_fluxes scripts * clean up and update env vars in stc_spv_extremes scripts * clean up and update env vars in stc_vert_wave_coupling scripts * clean up and update env vars in TC_MSE scripts * clean up and update env vars in TC_Rain scripts * clean up and update env vars in temp_extremes_distshape scripts * clean up and update env vars in top_heaviness_metric scripts * clean up and update env vars in tropical_pacific_sea_level scripts * clean up and update env vars in Wheeler-Kiladis scripts * update env_vars and clean up formatting in stc_qbo_enso scripts * change WK_DIR to WORK_DIR in forcing_feedback python files * bump intake_esm to v 2024.2.6 in base and python3_base conda env files * update precip_buoy_diag env vars and clean up formatting in html and python files * replace _query_error_handler calls with generic error logging add check for empty dataframe returned by initial catalog search add comments with more advanced regex queries to test if esm-intake fixes issues in _search method * remove core.py * add POD env vars to subprocess env for single-case PODS make the case_name env var CASENAME for consistency with existing definition * fix type checks in write_data_log_file and fix file cleanup logic in output_manager * bump netcdf4, h5py, matplotlib, pip, dask, and xarray versions in base and python3_base env files * fix output path for netcdf file and add checks to example_diag.py * close case_info.yml file after writing * add print_summary routine to mdtf_framework.py * clean up verify_links formatting * replace PodObject.cases with PodObject.multi_case_dict attribute to match the att that contains the necessary case info in environment_manager * remove redundant checks in output_manager * add log argument and pass main log file to print_summary in driver script * add calls to close varlist loggers to driver script * add env vars from pod settings to PodObject pod_env_vars object in pod_setup module * comment out log string with file path since it is causing issues in the io stream * formatting cleanup in environment_manager * add routine to append case list atts to html template dict for single-case config, and placeholder for multicase config to output_manager * clean up formatting in logs and exceptions modules * add comments to mdtf_framework.py * clean up lines that were too long in environment_manager * Create Test_Notebook.ipynb * add block to read in case_info yaml generated by framework to the test notebook * update figure name convention in example_multicase html and driver scripts * add _DoubleBraceTemplate to util/__init__.py and clean up filesystem.py formatting * refine generate_html_file_case_loop procedure, add file name parm to make_html, add error message to assertion statement in output_manager * update comments in example_multicase.html template * update multirun_config_template.jsonc * add flag to append html code for 1 figure per case to runtime_config.yml * add missing parenthesis to multirun_config_template.jsonc * update runtime_config.jsonc * update multirun_config_template.jsonc * move case information template generation to a separate method add boolean attribute to determine whether to run the case loop template generator clean up formatting in output_manager * clean up units.py formatting * change file arg to io stream of open handler and open the output file in append mode before writing the case inafo to the html output file in output_manager * remove old travis tests * remove unused test script * clean up pod_setup.py formatting * clean up and remove unused class from dataclass.py * remove unused class from and clean up basic.py * clean up util test scripts * rename src/tests/test_core.py and start refactoring * start refactoring unit tests * delete old tests and refacotr data_manager, diagnostic, and units tests * move user_pp_scripts attribute to from pod_setup to preprocessor add module loader procedure to import custom preprocessing python scripts to DaskMultFilePreprocessor init method * create an example custom preprocessing script * update comments in example pp script and refine the main routine loop * move json routines from filesystem to json_utils module * remove unused MDTFEnumInt class and fix _MDTFMixin init method * update util __init__.py to reflect module modifications * remove unused MDTFIntEnum test * remove unused VarlistEntryStage attribute and calls from varlist_utils remove unused deactivate_data_key method from VarlistEntry * add __init__.py to user_scripts * add json_utils methods to read in config file to example_pp_script * change example_pp_script.py to work on daily data and finish debugging main routine * set progressbar to fales in to_dataset_dict call example pp script * set progressbar to false in to_dataset_dict call in preprocessor debug the custom module load method * add check for unit_test attribute in config param to filesystem tempdirmanager * stop tracking example_multicase catalog update example_multicase config and environment yamls * rearrange user_pp_scripts call still need to debug calling custom module on xr ds and variable * add realm parameter to from_CF_name calls in translation and test/test_translation modules * add init module to main repo directory * remove unused add_row function from basic module * add test routine to example_pp_script.py * finalize custom script import procedure in preprocessor change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths * fix edit_request an execute functions and calls so that they return varlistentries or xarray datasets whether they perform operations or act as dummy functions * update example_multirun_demo notebook --------- Co-authored-by: Aparna Radhakrishnan <[email protected]>

add refactor_pp branch to push block in mdtf_tests

…ostics into refactor_pp

wrongkindofdoctor added 30 commits July 21, 2023 15:54

modify the VariableTranslator read_convention method and dependencies…

eebba1f

… to handle objects in the Fieldlist files

clean up formatting

b13d72d

add calls to register methods for abstract atts back to data_model.py

rework the register_coords method to accommodate multiple entries for…

8512ac2

… a single standard name working on the call to Fieldlist.from_struct

add RegexDict function to basic module and util init module

4540c29

change HorizontalCoord classes back to separate X and Y classes

6fb6b0d

add try except block to catch duplicates to _DMDimensionsMixin build_axes method

update data model horizontal coord class refs in translation module

889de21

move VariableTranslator class and modifier check from data_model to translation.FieldlistEntry __post_init method to avoid circular import

remove atmos_realm and ocean_realm from modifers table and fieldlists

247cf4e

change modeling_realm to realm in fieldlists

make realm a non-mandatory string attribute that defaults to an empty…

13fea1c

… string in DMDependentvariable because this class is used to define coordinates, which do not have realm atts, and variables, which do

Refine _process_var and _process_coord methods to properly handle mod…

10e2d83

…ifier and realm entries when defining fieldlist lookup tables remove mandatory modifier and ndim units from fieldlist class atts since they are not needed there

change HorizonatalCoordinate refs to X and Y Coordinate refs to match…

2ec63ee

… the change in class names in varlist_util.py

add type hints and placeholder methods to DataSourceBase class

e5bebb4

remove varlist_util.varlistCoordinateMixin class and move need_bounds…

a50175b

… attribute to data_model._dmcoordinatshared class which is inherited by the same child classes in varlist_util

add units defs to lat and lon dims in pod settings files

a6e1fdd

add axis defs to the lat and lon dimensions in the pod settings files

d6c7ab5

pass pod object to get_varlist instead of just the pod_vars in pod_setup

af0db2e

define pod_dims using the _pod_dims_from_struct function in Varlist from_struct call

add varlist_util with changes from prior commit

83b918d

clean up dataclass.py formatting

5d74eb4

replace var_dict parameter with parent in get_varlist method definition

97b9580

minor cleanup of fieldlist and core modules

c2f0979

Merge branch 'main' into refactor_pp

a01e9ce

remove commented out line from varlist_util.py

49383c2

comment out convention match check for varlist setup development in pod_setup

check for time and set axis to T in data_model.coordinate_from_struct

0ebf242

add type hints to translation methods

6017ca7

add date_range attribute to datasourcebase and a method to set the date range add call to set_date_range to pod_setup change varlist_util.setup_var to accept date_range and convention parameters to pass to called methods

fix dest_path definition in varlist_util

5267233

add dummy attributes required by abstract methods in data_model clean up formatting add realm to from_CF method in translation

Merge branch 'main' into refactor_pp

2f93301

remove unnecessary pod_convention parameter from translation calls

34da246

change _NO_TRANSLATION_FIELDLIST definition from 'None' to 'no_translation' refine logic when choosing to translate or not based on convention match in pod_setup

start re-working preprocessor module to make previous multirun classe…

1b3b065

…s the default

rename my_scripts to user_pp_scripts in template files

94718b8

fix pod_setup status checks, add user_pp_scripts att and routine to s…

362fa39

…et the att to the pod object

add call to add user-defined pp scripts to workflow to mdtf_framework.py

c449d8e

wrongkindofdoctor and others added 20 commits February 28, 2024 16:31

remove unused MDTFIntEnum test

f61733c

remove unused VarlistEntryStage attribute and calls from varlist_utils

d2a4428

remove unused deactivate_data_key method from VarlistEntry

add __init__.py to user_scripts

87ae76f

add json_utils methods to read in config file to example_pp_script

61c00c3

change example_pp_script.py to work on daily data and finish debuggin…

a4e2336

…g main routine

set progressbar to fales in to_dataset_dict call example pp script

a80c13c

set progressbar to false in to_dataset_dict call in preprocessor

63777ca

debug the custom module load method

add check for unit_test attribute in config param to filesystem tempd…

5317c0b

…irmanager

stop tracking example_multicase catalog

05d6732

update example_multicase config and environment yamls

rearrange user_pp_scripts call

4aa0f87

still need to debug calling custom module on xr ds and variable

add realm parameter to from_CF_name calls in translation and test/tes…

0433bdb

…t_translation modules

add init module to main repo directory

f244222

remove unused add_row function from basic module

c2bbd45

add test routine to example_pp_script.py

e9eafaf

finalize custom script import procedure in preprocessor

9dbc939

change user_pp_scripts preprocessor attribute to just be a list of script names and not the full paths

fix edit_request an execute functions and calls so that they return v…

c4a37bd

…arlistentries or xarray datasets whether they perform operations or act as dummy functions

update example_multirun_demo notebook

524ac2c

remove macos-12 test from mdtf_tests.yml

bec09f5

add refactor_pp branch to push block in mdtf_tests

reformat the test workflow runtime config files

0f2c39d

wrongkindofdoctor requested review from jkrasting and aradhakrishnanGFDL as code owners March 8, 2024 16:34

wrongkindofdoctor self-assigned this Mar 8, 2024

wrongkindofdoctor marked this pull request as draft March 8, 2024 16:35

wrongkindofdoctor added 3 commits March 8, 2024 11:37

Merge branch 'refactor_pp' of github.com:wrongkindofdoctor/MDTF-diagn…

6437d30

…ostics into refactor_pp

update test runtime config scripts

ca039a8

update github workflow file

f5b0550

wrongkindofdoctor added the framework Issue pertains to the framework code label Mar 8, 2024

wrongkindofdoctor marked this pull request as ready for review March 8, 2024 16:42

wrongkindofdoctor merged commit ea5d840 into NOAA-GFDL:refactor_pp Mar 8, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDTF-diagnostics preprocessor update (#509) #510

MDTF-diagnostics preprocessor update (#509) #510

wrongkindofdoctor commented Mar 8, 2024

MDTF-diagnostics preprocessor update (#509) #510

MDTF-diagnostics preprocessor update (#509) #510

Conversation

wrongkindofdoctor commented Mar 8, 2024