Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate ADF config file #142

Merged
merged 23 commits into from
Nov 18, 2024
Merged

generate ADF config file #142

merged 23 commits into from
Nov 18, 2024

Conversation

brianpm
Copy link
Collaborator

@brianpm brianpm commented Oct 14, 2024

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally prior to submission?

Not sure about tests... but yes, linted code locally.

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully tested your changes locally?

This adds a script that is a first attempt to take the CUPID config file to generate an ADF config file. The script works locally.

The idea is to be able to run the ADF without the user needing to manually edit the ADF config file.

I think we need to discuss what are the requirements for such an input file. It seems like ADF is expecting more information than CUPID right now... which is fine, but might mean that this script needs to make quite a few assumptions along the way.

@brianpm
Copy link
Collaborator Author

brianpm commented Oct 23, 2024

I just pushed some updates (mostly untested). I think this gets us pretty close to a version that will run ADF.

This script would be run as (for example):
python generate_adf_config_file.py --cupid_file ~/Code/CUPiD/examples/key_metrics/config.yml --adf_template ~/Code/CAM_diagnostics/config_amwg_default_plots.yaml --out_file ~/Desktop/testfile2.yaml

Then, in principle, ADF would be run like:

> cd $ADFSOURCE
> ./run_adf_diag ~/Desktop/testfile2.yaml

Note that I am pretty sure the path for ADF output won't work yet. I set it to CUPiD's "sname", but I think it needs to be expanded to be a full path.

@TeaganKing
Copy link
Collaborator

TeaganKing commented Nov 4, 2024

These changes were made:

  • a few minor formatting tweaks
  • adjusted the path to ADF output based on Create a new example that will run full ADF package (not from notebook) #141
  • updated the file paths to include the base or test case name after the CESM output dir
  • specified a full path based on the cupid_file path for outputting plots (otherwise this directory may not be found from the ADF root)
  • updated the directory structure in the config file to casename/atm/proc/tseries instead of casename/proc/tseries (we could alternatively update how we are doing this in CUPiD)

A few notes:

  • I was able to create an ADF config file with a similar command as the one you shared above (just using my own files) and started running ADF in the cupid-analysis environment.
  • While this did start running and generated timeseries and climo files after hist_str was manually updated, it appears to be hanging before plots finish generating and ADF runs to completion.

A few questions:

  • ADF does not seem to find all of the files needed for input because hist_str is set to cam.h0 and the adf lib expects to find data with the following format (note the period after h0!), but only finds cam.h0i files. We should probably update this, but I'm not sure if there's a consistent format we'd expect?
    /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing/b.e23_alpha17f.BLT1850.ne30_t232.092/atm/hist/*cam.h0.*
  • ADF expects a top level 'basic diag info' hist_str (which matches the current cupid config setup), but there are case-specific hist_str included in the adf config file (it seems like theses may not be used). Can we remove these?
  • Do we also want to include just a subset of variables to get started, or is it best to run timeseries, etc, for all of them?

@TeaganKing
Copy link
Collaborator

TeaganKing commented Nov 4, 2024

@justin-richling or @brianpm if either of you have suggestions regarding what our default hist_str should be and whether this should be global or case-specific (see comment above), that would be helpful! FYI I'll continue working on this on Wednesday afternoon and could implement any relevant suggestions at that time.

@TeaganKing
Copy link
Collaborator

Some updates on plot generation: the hanging after generating ts and climo files is related to processing the 3D variables. There also is a KeyError when looking for lat while generating zonal mean plots.

@TeaganKing
Copy link
Collaborator

TeaganKing commented Nov 8, 2024

The default config file for ADF has been updated (thanks to Justin!) and now the hist_str is just set to h0a to use averaged fields.

When running just PRECT and using an updated case for comparison (b.e30_beta02.BLT1850.ne30_t232.104 vs f.e23_alpha17f.FLTHIST_ne30.roughtopo.099 -- this is updated in the config file in #141 ), this runs fine and doesn't hang. It looks like that was a grid-related issue.

@TeaganKing
Copy link
Collaborator

A variable list could be added to the cupid config file that overwrites the default ADF variables.

@TeaganKing
Copy link
Collaborator

The external_diag_packages config.yml in #141 has been updated to include same-grid cases and a list of adf_vars which now can be used by this helper script (but is not necessary so that the script is compatible with other yaml files).

Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm still putting together a test case to try it out myself, but it sounds like @TeaganKing has already done that so we are confident the script works as advertised :)

Two last consistency points:

  1. generate_cupid_config_for_cesm_case.py is executable, and my inclination is that generate_adf_config_file.py should be as well (but I'm open to the idea of requiring python generate_cupid_config_for_cesm_case.py instead of allowing ./generate_cupid_config_for_cesm_case.py if there is a good reason to avoid the latter)
  2. The scripts in cupid/ all use click instead of argparse and the scripts in this directory probably should as well. That's best handled in a separate PR, I just wanted to make a note here while I'm thinking about it.

helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved
helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved

# Set case names for ADF config
a_dict["diag_cam_climo"]["cam_case_name"] = test_case_name
a_dict["diag_cam_baseline_climo"]["cam_case_name"] = base_case_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay for this value to be None? Do we need something like

if base_case_name:
  a_dict["diag_cam_baseline_climo"]["cam_case_name"] = base_case_name
elif "cam_case_name" in "cam_case_name":
  del a_dict["diag_cam_baseline_climo"]["cam_case_name"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up because I think the github formatting made it hard to see the context of my comment. This is specifically referring to a_dict["diag_cam_baseline_climo"]["cam_case_name"] = base_case_name, since base_case_name = c_dict["global_params"].get("base_case_name") -- what is the expected behavior if base_case_name is not in global_params?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ADF config file, compare_obs determines whether the model run is compared to observations or another model run. If this variable is false in the ADF config file, then diag_cam_baseline_climo is used and is a required config file section. If not present, ADF fails.

I'm curious if we would want to support comparison to obs in addition to model-model comparison-- @brianpm and @justin-richling do you have thoughts on this?

With all that said, I think that we could just use base_case_name = c_dict["global_params"]["base_case_name"] instead of the .get() since it is an expected value for model-model comparisons. I'll update this now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow for the model vs obs comparisons. In adf_info.py there is the variable data_name that gets set for the baseline case name if model vs model, but data_name gets set to "Obs" if its a model vs obs case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to include this feature within this PR if we want to get this in before Wednesday. I think it will require an additional default configuration file from ADF as well as a number of additional parameters in the cupid configuration file (including compare_obs) as well as parameters that are added to the script to generate the ADF config file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See issue #150

helper_scripts/generate_adf_config_file.py Show resolved Hide resolved
base_case_cupid_ts_index = (
ts_case_names.index(base_case_name) if base_case_name in ts_case_names else None
)
if base_case_name is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if base_case_name: would be consistent with the ts_case_names check above:

    ts_case_names = c_ts.get("case_name")
    if not ts_case_names:
        raise ValueError("CUPiD file does not have timeseries case_name array.")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement is not necessary if we expect base_case_name to always be in the CUPiD config

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated this

helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved
helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved
helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved
helper_scripts/generate_adf_config_file.py Outdated Show resolved Hide resolved
@TeaganKing
Copy link
Collaborator

Thanks for these suggestions, @mnlevy1981 ! I think I have addressed everything (with a few notes in response to your direct in-line comments); let me know if you have any further requests.

Instead of looking for a variable list in

compute_notebooks.atm.link_to_ADF.parameter_groups.none.adf_vars

Loop through all compute_notebooks.${component}.${notebook} and look for

compute_notebooks.${component}.${notebook}.external_tool

if external_tool.tool_name is 'ADF', append everything in external_tool.vars to
diag_var_list. Also append everything in external_tool.plotting_scripts to
plotting_scripts (to make it easier to control what plots get made).

for both cases, if variable isn't found in any external_tool then the default
list values from ADF will be used.
Didn't realize pre-commit hadn't run on my last commit
Comment on lines 86 to 90
if base_case_name:
a_dict["diag_cam_baseline_climo"]["cam_case_name"] = base_case_name

elif "cam_case_name" in "cam_case_name":
del a_dict["diag_cam_baseline_climo"]["cam_case_name"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you dropped the .get() for base_case_name, I think we want to drop the if/else and just keep line 87

Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think all that's left is to hear @brianpm and @justin-richling's thoughts on #142 (comment) and then update a few spots accordingly.

Comment on lines 88 to 89
if "cam_case_name" in "cam_case_name":
del a_dict["diag_cam_baseline_climo"]["cam_case_name"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to remove these two lines. The if statement isn't posed very well (it should probably be if "cam_case_name" in a_dict["diag_cam_baseline_climo"]:), but it was originally meant to catch the case where a_dict["diag_cam_baseline_climo"]["cam_case_name"] was read in from the template file and we wanted to remove it because base_case_name wasn't defined. Now we are always setting a_dict["diag_cam_baseline_climo"]["cam_case_name"] = base_case_name so we don't want to remove the dictionary entry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@TeaganKing TeaganKing merged commit b9e0964 into NCAR:main Nov 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants