Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run one executable for soca2cice (instead of two) #3118

Merged

Conversation

shlyaeva
Copy link
Contributor

@shlyaeva shlyaeva commented Nov 21, 2024

Description

Run a single executable to add soca increments to cice restart files, processing arctic and antarctic simultaneously to save on runtime and I/O.
Resolves NOAA-EMC/GDASApp#1367

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? YES
    • GDAS TBD

How has this been tested?

  • marine gdasapp tests on orion with intel

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@RussTreadon-NOAA RussTreadon-NOAA marked this pull request as ready for review November 21, 2024 22:34
Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shlyaeva !

@RussTreadon-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA , the GDASApp hash has been updated. This PR is ready for CI testing and review.

@RussTreadon-NOAA RussTreadon-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Nov 22, 2024
@RussTreadon-NOAA
Copy link
Contributor

GDASApp ctests, some of which include g-w jobs, passed on Hera. See GDASApp PR #1380

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Approve.

@RussTreadon-NOAA
Copy link
Contributor

Since automated CI is not working for me, launch the following CI cases under my username on Hera

  • C96C48_hybatmDA as prgsi_pr3118
  • C96C48_ufs_hybatmDA as prjedi_pr3118
  • C96C48_hybatmaerosnowDA as praero_pr3118
  • C48mx500_3DVarAOWCDA as prwcda_pr3118
  • C96_atm3DVar as pratm3dvar_pr3118

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Nov 22, 2024
@RussTreadon-NOAA
Copy link
Contributor

Hera g-w CI results

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prgsi_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 22 2024 01:50:31    Nov 22 2024 02:15:22
202112210000        Done    Nov 22 2024 01:50:31    Nov 22 2024 05:25:15
202112210600        Done    Nov 22 2024 01:50:31    Nov 22 2024 04:50:15

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prjedi_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Nov 22 2024 01:50:39    Nov 22 2024 02:25:23
202402240000        Done    Nov 22 2024 01:50:39    Nov 22 2024 06:05:22
202402240600        Done    Nov 22 2024 01:50:39    Nov 22 2024 06:10:20

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/praero_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201200      Active    Nov 22 2024 01:50:52             -
202112201800      Active    Nov 22 2024 01:50:52             -
202112210000      Active    Nov 22 2024 01:50:52             -

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prwcda_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103241200        Done    Nov 22 2024 01:50:54    Nov 22 2024 02:30:29
202103241800        Done    Nov 22 2024 01:50:54    Nov 22 2024 04:00:31

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/pratm3dvar_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 22 2024 01:50:59    Nov 22 2024 02:30:30
202112210000        Done    Nov 22 2024 01:50:59    Nov 22 2024 05:45:26
202112210600        Done    Nov 22 2024 01:50:59    Nov 22 2024 04:55:20

All DA streams successfully completed except praero_pr3118.

Job gdas_aeroanlgenb failed. GDASApp issue #1381 has been opened to report this failure and document its resolution.

@RussTreadon-NOAA RussTreadon-NOAA removed the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Nov 22, 2024
@emcbot emcbot added the CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera label Nov 22, 2024
@RussTreadon-NOAA
Copy link
Contributor

GDASApp PR #1382 commented out the section of GDASApp parm/aero/berror/aero_diagb.yaml.j2 which caused the aeroanlgenb failure.

@CoryMartin-NOAA commented

It will take a nontrivial amount of time to stage the rescaling file and make workflow changes such that it just made more sense to comment this out for now

Update sorc/gdas.cd in a working copy of shlyaeva:feature/refactor_soca2cice to bring in the updated GDASApp hash. Now aeroanlgenb runs to completion. Resume C96C48_hybatmaerosnowDA running on Hera under my username.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Nov 22, 2024
@RussTreadon-NOAA RussTreadon-NOAA dismissed stale reviews from WalterKolczynski-NOAA and themself via b116bf8 November 22, 2024 13:55
@emcbot
Copy link

emcbot commented Nov 22, 2024

Experiment C96C48_hybatmDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3118/RUNTESTS/EXPDIR/C96C48_hybatmDA_bcc7ece1

@emcbot
Copy link

emcbot commented Nov 22, 2024

Experiment C96C48_ufs_hybatmDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3118/RUNTESTS/EXPDIR/C96C48_ufs_hybatmDA_bcc7ece1

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Nov 22, 2024
@emcbot
Copy link

emcbot commented Nov 22, 2024

CI Failed on Hera in Build# 3
Built and ran in directory /scratch1/NCEPDEV/global/CI/3118


Experiment C96_S2SWA_gefs_replay_ics_bcc7ece1 Completed 1 Cycles: *SUCCESS* at Fri Nov 22 19:37:11 UTC 2024
Experiment C48mx500_3DVarAOWCDA_bcc7ece1 Completed 2 Cycles: *SUCCESS* at Fri Nov 22 20:01:46 UTC 2024
Experiment C48_ATM_bcc7ece1 Completed 2 Cycles: *SUCCESS* at Fri Nov 22 20:03:16 UTC 2024

@WalterKolczynski-NOAA
Copy link
Contributor

Ignore the failure, I'm fighting Jenkins this afternoon.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Nov 22, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Nov 22, 2024
@emcbot
Copy link

emcbot commented Nov 22, 2024

Build FAILED on Hera in Build# 4 with error logs:

/scratch1/NCEPDEV/global/CI/3118/gfs/sorc/logs/build_gdas.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Nov 22, 2024
@RussTreadon-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA : Where do we go from here?

I repeated my build of this PR on Hera in /scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/pr3118. build_all.log successfully ran to completion

Hera(hfe11):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/pr3118/sorc$ cat build_all.log
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None
Building gsi_enkf, ufs, gfs_utils, gdas, ww3prepost, ufs_utils, gsi_utils, gsi_monitor, upp
Starting build_gsi_enkf.sh
Starting build_ufs.sh
Starting build_gfs_utils.sh
Starting build_gdas.sh
Starting build_ww3prepost.sh
Starting build_ufs_utils.sh
Starting build_gsi_utils.sh
Starting build_gsi_monitor.sh
Starting build_upp.sh
build_gfs_utils.sh completed successfully!
build_ufs_utils.sh completed successfully!
build_gsi_utils.sh completed successfully!
build_gsi_monitor.sh completed successfully!
build_ww3prepost.sh completed successfully!
build_upp.sh completed successfully!
build_gsi_enkf.sh completed successfully!
build_ufs.sh completed successfully!
build_gdas.sh completed successfully!

 .... Build system finished ....

I do not know the intricacies of Jenkins but it seems to have frequent hiccups when running g-w CI. What can we do to make Jenkins based CI more robust?

I have greater success manually running ./workflow/create_experiment.py to set up CI streams and then enabling a cron to rocotorun each stream to completion. The following g-w CI configurations

  • C96C48_hybatmDA as prgsi_pr3118
  • C96C48_ufs_hybatmDA as prjedi_pr3118
  • C96C48_hybatmaerosnowDA as praero_pr3118
  • C48mx500_3DVarAOWCDA as prwcda_pr3118
  • C96_atm3DVar as pratm3dvar_pr3118

successfully ran for this PR via this approach.

The following CI cases remain to be run on Hera

  • C48_ATM
  • C48_S2SW

I assume we do not need to run the GEFS CI cases for this PR

  • C48_S2SWA_gefs
  • C96_S2SWA_gefs_replay_ics

I went ahead and manually set up C48_ATM and C48_S2SW for this PR. Both streams are now running on Hera via cron. Assuming these two streams pass, is their success along with success of the other five streams sufficient to allow this PR to be merged into develop?

Does g-w CI need to be run on other platforms?

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Nov 23, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Nov 23, 2024
@RussTreadon-NOAA
Copy link
Contributor

C48_ATM and C48_S2SW are complete.

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/pratm_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103231200        Done    Nov 23 2024 12:29:20    Nov 23 2024 13:50:16
202103231800        Done    Nov 23 2024 12:29:20    Nov 23 2024 14:00:20

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prs2sw_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103231200        Done    Nov 23 2024 12:29:18    Nov 23 2024 15:30:27
202103231800        Done    Nov 23 2024 12:29:18    Nov 23 2024 15:36:20

All jobs in both streams successfully ran to completion

@RussTreadon-NOAA
Copy link
Contributor

Here g-w CI
The following g-w CI has been run and passed on Hera

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prgsi_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 22 2024 01:50:31    Nov 22 2024 02:15:22
202112210000        Done    Nov 22 2024 01:50:31    Nov 22 2024 05:25:15
202112210600        Done    Nov 22 2024 01:50:31    Nov 22 2024 04:50:15

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prjedi_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Nov 22 2024 01:50:39    Nov 22 2024 02:25:23
202402240000        Done    Nov 22 2024 01:50:39    Nov 22 2024 06:05:22
202402240600        Done    Nov 22 2024 01:50:39    Nov 22 2024 06:10:20

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/praero_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201200        Done    Nov 22 2024 01:50:52    Nov 22 2024 13:50:21
202112201800        Done    Nov 22 2024 01:50:52    Nov 22 2024 17:10:24
202112210000        Done    Nov 22 2024 01:50:52    Nov 22 2024 18:30:25

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prwcda_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103241200        Done    Nov 22 2024 01:50:54    Nov 22 2024 02:30:29
202103241800        Done    Nov 22 2024 01:50:54    Nov 22 2024 04:00:31

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/pratm3dvar_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 22 2024 01:50:59    Nov 22 2024 02:30:30
202112210000        Done    Nov 22 2024 01:50:59    Nov 22 2024 05:45:26
202112210600        Done    Nov 22 2024 01:50:59    Nov 22 2024 04:55:20

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prs2sw_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103231200        Done    Nov 23 2024 12:29:18    Nov 23 2024 15:30:27
202103231800        Done    Nov 23 2024 12:29:18    Nov 23 2024 15:36:20

rocotostat /scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/pratm_pr3118
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103231200        Done    Nov 23 2024 12:29:20    Nov 23 2024 13:50:16
202103231800        Done    Nov 23 2024 12:29:20    Nov 23 2024 14:00:20

@emcbot emcbot added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Nov 23, 2024
@emcbot
Copy link

emcbot commented Nov 23, 2024

CI Passed on Hera in Build# 5
Built and ran in directory /scratch1/NCEPDEV/global/CI/3118


Experiment C96_S2SWA_gefs_replay_ics_bcc7ece1 Completed 1 Cycles: *SUCCESS* at Sat Nov 23 15:27:17 UTC 2024
Experiment C48_ATM_bcc7ece1 Completed 2 Cycles: *SUCCESS* at Sat Nov 23 15:45:13 UTC 2024
Experiment C48mx500_3DVarAOWCDA_bcc7ece1 Completed 2 Cycles: *SUCCESS* at Sat Nov 23 15:45:46 UTC 2024
Experiment C48_S2SWA_gefs_bcc7ece1 Completed 1 Cycles: *SUCCESS* at Sat Nov 23 16:59:52 UTC 2024
Experiment C96C48_hybatmaerosnowDA_bcc7ece1 Completed 3 Cycles: *SUCCESS* at Sat Nov 23 17:11:53 UTC 2024
Experiment C96_atm3DVar_bcc7ece1 Completed 3 Cycles: *SUCCESS* at Sat Nov 23 17:17:59 UTC 2024
Experiment C96C48_hybatmDA_bcc7ece1 Completed 3 Cycles: *SUCCESS* at Sat Nov 23 17:17:59 UTC 2024
Experiment C48_S2SW_bcc7ece1 Completed 2 Cycles: *SUCCESS* at Sat Nov 23 17:29:36 UTC 2024
Experiment C96C48_ufs_hybatmDA_bcc7ece1 Completed 3 Cycles: *SUCCESS* at Sat Nov 23 18:31:37 UTC 2024

@RussTreadon-NOAA
Copy link
Contributor

Automated CI passed on Hera. Yeah!

@RussTreadon-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA : Is the Hera success sufficient? If not, on what other machines should I run?

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 1ba8985 into NOAA-EMC:develop Nov 23, 2024
10 of 11 checks passed
@RussTreadon-NOAA
Copy link
Contributor

Thank you @WalterKolczynski-NOAA !

@shlyaeva
Copy link
Contributor Author

Thank you all so much for very quick reviews, testing and turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate soca -> cice increments into one run
7 participants