Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate soca -> cice increments into one run #1367

Closed
shlyaeva opened this issue Nov 12, 2024 · 10 comments · Fixed by NOAA-EMC/global-workflow#3118 or #1380
Closed

Consolidate soca -> cice increments into one run #1367

shlyaeva opened this issue Nov 12, 2024 · 10 comments · Fixed by NOAA-EMC/global-workflow#3118 or #1380
Assignees
Labels

Comments

@shlyaeva
Copy link
Collaborator

Consolidate converting soca increments + backgrounds -> cice restarts into one task (both for arctic and antarctic).

@shlyaeva shlyaeva self-assigned this Nov 12, 2024
@shlyaeva shlyaeva added the soca label Nov 12, 2024
@shlyaeva
Copy link
Collaborator Author

Update: I have soca, jcb, gdasapp and global workflow branches that run soca convertstate with a global yaml (instead of two runs with arctic and antarctic yamls). Low res test gives the same results in the cice restart files. Next: testing with higher res, cleaning up branches and issuing PRs.

@shlyaeva
Copy link
Collaborator Author

soca PR was merged; we'll need to update soca to at least this hash JCSDA-internal/soca@44c4d6e

@RussTreadon-NOAA
Copy link
Contributor

Updated working copy of feature/nightly-stable with soca @ 44c4d6e5. Build GDASApp in g-w develop at 1563594.

test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 failed with

0:   Y-AXIS =    9   8
0: NOTE from PE     0: close_param_file: MOM_input has been closed successfully.
0: Exception: Bad parameter: /: Mandatory parameter 'arctic' not found  (/scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/bundle/oops/src/oops/util/parameters/RequiredParameter.h +135 deserialize)
2: Exception: Bad parameter: /: Mandatory parameter 'arctic' not found  (/scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/bundle/oops/src/oops/util/parameters/RequiredParameter.h +135 deserialize)

Full log file is /scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/build/gdas/test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/logs/2021032418/gdas_marineanlchkpt.log

The remaining 63 of 64 test_gdasapp passed with the updated feature/nightly-stable

Test project /scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/build
      Start 1580: test_gdasapp_util_coding_norms
 1/64 Test #1580: test_gdasapp_util_coding_norms ......................................   Passed    2.85 sec
      Start 1581: test_gdasapp_util_ioda_example
 2/64 Test #1581: test_gdasapp_util_ioda_example ......................................   Passed    3.03 sec
      Start 1582: test_gdasapp_util_prepdata
 3/64 Test #1582: test_gdasapp_util_prepdata ..........................................   Passed    3.12 sec

...

      Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
30/64 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....   Passed  152.29 sec
      Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
31/64 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...***Failed  230.87 sec
booting gdas_marineanlchkpt for cycle 202103241800
task 'gdas_marineanlchkpt' for cycle '202103241800' has been booted
The task is in state: QUEUED
The task is in state: QUEUED
The task is in state: QUEUED
The task is in state: RUNNING
The task is in state: RUNNING
The task is in state: RUNNING
The task is in state: RUNNING
The task is in state: SUBMITTING
The task is in state: QUEUED
The task is in state: QUEUED
The task is in state: QUEUED
The task is in state: RUNNING
The task is in state: RUNNING
The task is in state: RUNNING
The task is in state: RUNNING
11/21/24 01:55:05 UTC :: WCDA-3DVAR-C48mx500.xml :: Cycle 202103241800, Task gdas_marineanlchkpt, jobid=2831423, in state DEAD (FAILED), ran for 56.0 seconds, exit status=1, try=2 (of 2)
The task is dead.

      Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
32/64 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed  122.47 sec

...

      Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
62/64 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd ........................   Passed    1.79 sec
      Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
63/64 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................   Passed    1.50 sec
      Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
64/64 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................   Passed    1.55 sec

98% tests passed, 1 tests failed out of 64

Label Time Summary:
gdas-utils    =  12.71 sec*proc (14 tests)
manual        = 4412.59 sec*proc (18 tests)
script        =  12.71 sec*proc (14 tests)

Total Test time (real) = 5762.60 sec

The following tests FAILED:
        1967 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 (Failed)

@shlyaeva
Copy link
Collaborator Author

shlyaeva commented Nov 21, 2024

@RussTreadon-NOAA thank you so much! I'll issue PRs in jcb-gdas, gdasapp and global workflow to update yamls and fix this tomorrow am!

@RussTreadon-NOAA
Copy link
Contributor

Look at changes to test/testinput/convertstate_soca2cice.yml in soca @ 44c4d6e. Add these changes to

        modified:   soca_2cice_antarctic.yaml.j2
        modified:   soca_2cice_arctic.yaml.j2

in working copy of gdas.cd/parm/jcb-gdas/algorithm/marine. Rerun WCDA-3DVAR-C48mx500 marineanl init, var, and chkpt for 202103241800. The chkpt job passed.

    Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
1/1 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...   Passed   76.82 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
manual    =  76.82 sec*proc (1 test)

Total Test time (real) =  77.31 sec

Modified files are on Hera in /scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/parm/jcb-gdas/algorithm/marine

@RussTreadon-NOAA
Copy link
Contributor

@shlyaeva: @guillaumevernieres explained that while my test results are encouraging, this is not the solution. You combined the Arctic and Antarctic functions into a single yaml file. As you mentioned changes are required in jcb-gdas gdasapp, and g-w. Let me know when the pieces are ready to be tested. I can test & review.

@shlyaeva
Copy link
Collaborator Author

@RussTreadon-NOAA will do! I'm running one test now and should be able to push the changes soon.
Thank you so much for tracking failures and finding fixes, and also so quickly! I know the tremendous amount of effort and overall knowledge of the system that goes into this and I really appreciate it.
I didn't think through the sequence of PRs and consequences of merges well this time. If that's OK, I'll ping you in the next weeks to chat so I can understand the process and do better next time.

@shlyaeva
Copy link
Collaborator Author

I pushed some changes:
jcb-gdas: NOAA-EMC/jcb-gdas#47
missing gdasapp: needs update to hashes for jcb-gdas (above) and soca (JCSDA-internal/soca@44c4d6e)
global-workflow: NOAA-EMC/global-workflow#3118 - it's missing an update to gdasapp hash.

@RussTreadon-NOAA
Copy link
Contributor

Create GDASApp branch feature/refactor_soca2cice in which to update jcb-gdas and soca hashes.

@RussTreadon-NOAA
Copy link
Contributor

Hera test

Do the following

All three tests passed.

Confirm that chkpt job ran with soca_2cice_global.yaml

$ grep "Executing srun -l" gdas_marineanlchkpt.log |grep INFO
2024-11-21 17:53:02,205 - INFO     - marine_da_utils: Executing srun -l --export=ALL --hint=nomultithread -n 8 /scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/gdas_incr_handler.x socaincr2mom6.yaml
2024-11-21 17:53:11,188 - INFO     - marine_da_utils: Executing srun -l --export=ALL --hint=nomultithread -n 8 /scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/gdas.x soca convertstate soca_2cice_global.yaml

Rerun test_gdasapp. 64/64 tests pass.

Test project /scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241121/global-workflow/sorc/gdas.cd/build
      Start 1580: test_gdasapp_util_coding_norms
 1/64 Test #1580: test_gdasapp_util_coding_norms ......................................   Passed    2.94 sec
      Start 1581: test_gdasapp_util_ioda_example
 2/64 Test #1581: test_gdasapp_util_ioda_example ......................................   Passed    6.08 sec

...

      Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
63/64 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................   Passed    1.32 sec
      Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
64/64 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................   Passed    1.18 sec

100% tests passed, 0 tests failed out of 64

Label Time Summary:
gdas-utils    =  16.43 sec*proc (14 tests)
manual        = 3432.03 sec*proc (18 tests)
script        =  16.43 sec*proc (14 tests)

Total Test time (real) = 4708.59 sec

WalterKolczynski-NOAA pushed a commit to NOAA-EMC/global-workflow that referenced this issue Nov 23, 2024
Run a single executable to add soca increments to cice restart files,
processing arctic and antarctic simultaneously to save on runtime and
I/O.
Resolves NOAA-EMC/GDASApp#1367

---------

Co-authored-by: shlyaeva <[email protected]>
Co-authored-by: RussTreadon-NOAA <[email protected]>
Co-authored-by: RussTreadon-NOAA <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants