Adds IOStream capabilities for omega input/output #132

philipwjones · 2024-09-26T15:17:12Z

Here is finally the long-promised IOStream capability. With this addition, users define IOStreams in the input configuration file to specify input and output files, which fields should be included and at what time frequency (including one-time read/writes). See the included documentation for details.

This has been tested successfully on Chrysalis, but not yet on any GPU-enabled machines. I will begin testing that today. Also, I would like to add some additional tests to the unit test to better verify correctness.

Checklist

philipwjones · 2024-09-26T17:43:18Z

Tests on Frontier CPU/GPU pass after a minor fix for some missing return statements

philipwjones · 2024-09-26T17:49:57Z

Well, actually, I lied above. The IOStreams unit test is passing, but for some reason the TEND_PLANE_TEST and TEND_SPHERE_TEST are both failing now. Since the new mods did not touch these codes and shouldn't impact them, I thought maybe it was just at the edge of the error tolerance, so tried increasing the error tolerance but that didn't help and indeed the errors are huge so something is really wrong. @brian-oneill @sbrus89 - any idea why the mods in this PR are changing answers in TendencyTermsTest? All other tests are passing.

brian-oneill · 2024-09-26T18:13:35Z

@philipwjones Your branch is missing #130. When we moved to just one default yaml file, those tests got broke and needed to be altered to use the viscosity coeffs from the config file.

philipwjones · 2024-09-26T18:52:12Z

Doh...I thought I had rebased, but must not have fetched the latest...

philipwjones · 2024-09-26T19:52:26Z

After rebasing, all tests now pass on both Chrysalis and Frontier

xylar

I'm wanting to test more with my associated Polaris changes E3SM-Project/polaris#231 but ran into #137 (unrelated to these changes) when I tried.

In the meantime, I have a few questions and comments.

components/omega/configs/Default.yml

xylar · 2024-09-30T09:22:09Z

components/omega/configs/Default.yml

+  IOStreams:
+    InitialState:
+      UsePointerFile: false
+      Filename: OmegaMesh.nc


I know it is the intention to keep the horizontal mesh separate from the initial conditions and restart files.
It doesn't seem like there is currently a stream for reading the horizontal mesh and the name of the mesh file is still hard coded to OmegaMesh.nc:
https://github.com/philipwjones/E3SM/blob/omega/iostream/components/omega/src/base/Decomp.h#L243
Is there the intention of adding some sort of Mesh group and providing a stream for reading it (so that a filename other than OmegaMesh.nc can be used)?

Yes, that's correct. This reflects the current status, but we do intend to separate Mesh stream and Mesh group.

Also, the initial decomposition (Decomp) can't use streams because parallel IO can't be set up until after Decomp. But I might still be able to at least read the mesh file name from the config.

Okay, it sounds like having the mesh filename be read from a stream is silly because that would not be a stream you could modify (except for the filename). So it sounds like the mesh filename should be a normal config option somewhere earlier in the yaml file. It still would be convenient for Polaris if the mesh didn't always have to be named OmegaMesh.nc because it requires otherwise unnecessary logic specific to Omega. (Obviously, this isn't a good name for the mesh/initial condition for MPAS-Ocean.)

Not a problem - the important routines take the filename as an argument assuming we were going that route eventually. Early on, it was just easy to go with the hardwire-soft link route.

components/omega/src/infra/IOStream.cpp

components/omega/doc/userGuide/IOStreams.md

hyungyukang · 2024-09-30T17:46:18Z

All tests passed on PM-CPU.

@philipwjones , I noticed several NetCDF files generated after running ctest, all of which are separated in time and lack time information. I understand these files are created specifically by ctest, but I am wondering if it is possible to stack the history output over time within the current IOStream implementation. If so, could you please provide a short and brief guidance on how to do this? It would be greatly helpful for my testing. If this is not a planned implementation at this time, I can also proceed with my temporary IO codes.

components/omega/test/infra/IOStreamTest.cpp

philipwjones · 2024-09-30T20:30:09Z

@hyungyukang Yes, I was mostly testing basic read/write. We do want to support this and there is, in principle, the pieces in place for both the unlimited time dimension and appending multiple time slices in a file. But I have not done that before so didn't know how to implement without doing some research first. The IOStreams options are basically using the IfExists: append option, defining the time dimension appropriately and making sure the correct metadata are there. So I think I have not designed it out, but don't think it will work currently (may have even added a "not supported" message). I can certainly take a look at how you've done it and see if I can implement in a future PR. Or if you want to take a shot at it...

hyungyukang · 2024-09-30T20:53:11Z

@hyungyukang Yes, I was mostly testing basic read/write. We do want to support this and there is, in principle, the pieces in place for both the unlimited time dimension and appending multiple time slices in a file. But I have not done that before so didn't know how to implement without doing some research first. The IOStreams options are basically using the IfExists: append option, defining the time dimension appropriately and making sure the correct metadata are there. So I think I have not designed it out, but don't think it will work currently (may have even added a "not supported" message). I can certainly take a look at how you've done it and see if I can implement in a future PR. Or if you want to take a shot at it...

@philipwjones , Thanks a lot. I understand. My implementation is very simple and temporary. But I'll try something similar in this PR and let you know how it goes! I believe it will be much simpler and more intuitive thanks to your great work on this PR!

hyungyukang · 2024-10-02T01:39:11Z

@philipwjones , I have been testing this IOStreams PR by writing some fields and have identified a bug in the Offset. It appears that the linear address of the offset vector Add was not computed correctly.

Before the fix, the written tracer1 field (a cosine bell) looks like this:

After the fix,

I will suggest the fix soon.

philipwjones · 2024-10-02T03:53:01Z

Thanks @hyungyukang I was worried about that - none of the unit tests actually tested correctness, only successful read/write consistency and the same index calculation is used for both. Happy to see your fix.

sbrus89 · 2024-10-02T14:08:56Z

components/omega/doc/userGuide/IOStreams.md

+- **PointerFilename:** Only required if UsePointerFile is true and should
+be set to the full filename (with path) for the pointer file. Each stream
+using a pointer file must define a unique pointer file name.
+- **Filename:** Required in all cases except input streams using a pointer


One question I have from reading the user guide is if there's a way to specify the frequency of creating a new file vs. the output frequency. It would be helpful to clarify when output gets written to an existing file and when it is written to a new file.

Yeah, clearly I haven't really thought through the multiple time slice in a file case. The more I look into it, the more changes I need to make. Suspect I'll need to push this to a subsequent PR so we can get this base capability in this week.

That sounds like a reasonable assessment to me.

mwarusz

Unit tests passed on chrysalis, pm-cpu, pm-gpu, frontier-cpu and frontier-gpu. I have some minor comments and questions.

components/omega/src/base/IO.h

components/omega/src/infra/IOStream.cpp

mwarusz · 2024-10-02T21:58:01Z

components/omega/src/infra/IOStream.cpp

+   // Extract and write the array of data based on the type, dimension and
+   // memory location. The IO routines require a contiguous data pointer on
+   // the host. Kokkos array types do not guarantee contigous memory for
+   // multi-dimensional arrays. Here we create a contiguous space and perform
+   // any other transformations (host-device data transfer, reduce precision).


I know that this PR this is critical for the work to progress, so I am not requesting any changes right now, but I would like to suggest some potential improvements to this code. The packing into a contiguous buffer could be separated into a function or a function template. There is a way to check if a Kokkos array is contiguous (using array.span_is_contiguous()), which could be used to create a faster code path. Lastly, the packing should probably be done on the device using parallelFor and the buffer copied afterwards to the host.

Yes, we can do this on a later pass and in a separate PR once we have the working version merged.

hyungyukang

I left my suggestion for fixing the bug in writing order yesterday, but I forgot to submit my review.

components/omega/src/infra/IOStream.cpp

hyungyukang · 2024-10-03T19:41:40Z

@philipwjones , I think I found a way to implement time-stackable output streams:

netcdf ocn.hist.0001-01-01_01\:00\:00 {
dimensions:
        MaxCellsOnEdge = 2 ;
        MaxEdges = 6 ;
        NCells = 2562 ;
        NEdges = 7680 ;
        NVertLevels = 3 ;
        NVertices = 5120 ;
        Time = UNLIMITED ; // (10 currently)
        VertexDegree = 3 ;
variables:
        double tracer1(Time, NCells, NVertLevels) ;
                tracer1:Description = "tracer1 at cell centers" ;
                tracer1:FillValue = -1.2345e-30 ;
                tracer1:Name = "tracer1" ;
                tracer1:StdName = "none" ;
                tracer1:Units = "" ;
                tracer1:ValidMax = 100. ;
                tracer1:ValidMin = 0. ;
                tracer1:_FillValue = -1.2345e-30 ;
                tracer1:long_name = "tracer1 at cell centers" ;
                tracer1:name = "tracer1" ;
                tracer1:standard_name = "none" ;
                tracer1:units = "" ;
                tracer1:valid_max = 100. ;
                tracer1:valid_min = 0. ;

// global attributes:
                :SimStartTime = "0001-01-01_00:00:00.00" ;
                :SimulationTime = "0001-01-01_01:00:00" ;
}

But the code needs to be discussed and organized further with you, so for now, let's leave it for the next IOStream PR. I wanted to implement this feature for Polaris testing. For now, I believe I can use this toy code for Polaris after modifying a bit more.

mark-petersen · 2024-10-04T18:58:33Z

The current head, 7516da1, fails the iostreams test on Frontier cpu using crayclang:

./omega_ctest.sh -R IOSTREAM_TEST

Test project /lustre/orion/cli115/scratch/mpetersen/runs/241001_omega/build
    Start 18: IOSTREAM_TEST
1/1 Test #18: IOSTREAM_TEST ....................***Failed    5.35 sec
srun: error: frontier10341: tasks 0-7: Exited with exit code 1
srun: Terminating StepId=2640357.29

Do I need to copy a different file for this test to run successfully? Right now I copy the files

ls -lh /ccs/home/mpetersen/meshes/omega/O*nc
lrwxrwxrwx 1 mpetersen mpetersen 24 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaMesh.nc -> ocean.QU.240km.151209.nc
lrwxrwxrwx 1 mpetersen mpetersen 22 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaPlanarMesh.nc -> PlanarPeriodic48x48.nc
lrwxrwxrwx 1 mpetersen mpetersen 43 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaSphereMesh.nc -> cosine_bell_icos480_initial_state.230220.nc

Here are my steps:


######### Frontier cpu ############
CODE_DIR=opr
RUNDIR=241001_omega
mkdir /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR
cd !$

cd /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}
git submodule update --init --recursive externals/YAKL externals/ekat externals/scorpio cime
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR

module load cmake
rm -rf build
mkdir build
cd build
export PARMETIS_ROOT=$PROJWORK/cli115/pwjones/frontierlibs-cray/parmetis
cmake \
   -DOMEGA_CIME_COMPILER=crayclang \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -S /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}/components/omega -B .
./omega_build.sh

# linking:
cd test
ln -isf /ccs/home/mpetersen/meshes/omega/O*nc .
cp /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}/components/omega/configs/Default.yml omega.yml


salloc -A cli115 -J inter -t 2:00:00 -q debug -N 1 -S 0
RUNDIR=241001_omega
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR/build
./omega_ctest.sh

philipwjones · 2024-10-04T20:05:43Z

@mark-petersen Sorry - I pushed a test to capture the indexing error that Hyun saw, but have not yet fixed the indexing problem itself so the new unit test is correctly failing. Hopefully have a fix for the indexing soon...

components/omega/configs/Default.yml

philipwjones · 2024-10-07T23:44:35Z

The index offset calculation has been fixed and I added a read test to make sure the read/write is consistent and returns a correct field. I also over-wrote the salinity field with an index-based value and manually checked the output to make sure the indexing is correct.

Passes new unit tests on Chrysalis/Intel and Frontier cpu/gpu so please feel free to re-review

hyungyukang · 2024-10-08T15:13:06Z

@philipwjones , I just applied your fix to my test code, and it worked like a charm. Thanks for fixing it!

Because the Metadata uses std::any, when a string literal is passed as an argument to the Metadata add routines, they are stored as static char* rather than std::string. This adds changes to treat this case properly.

This adds a new IOStreams capability in which different input and output streams can be defined to read or write a custom list of fields and related metadata at desired time frequencies.

- and added comments to better describe use of initialization streams

- also fixed one dimension read bug

philipwjones · 2024-10-08T16:27:57Z

Rebased to pick up new tracer infra and modified the formatting of a couple of time entries per @hyungyukang suggestion. Still passes all tests.

mwarusz

My comments have been addressed. I retested on pm-cpu and pm-gpu and everything passed. Thanks for your work on this @philipwjones.

hyungyukang

@philipwjones , thank you very much for your dedicated efforts on this PR. I have tested this PR in conjunction with #133 and #140 simultaneously by conducting a cosine bell advection test case. Through this PR, I was able to write tracer history output files at different intervals, and it worked correctly after @philipwjones 's fix for index offset calculations (#132 (comment)).

I'm approving this PR based on my testing and the testing and visual inspection of others. All of my comments have been addressed, and @philipwjones and I will open a subsequent IOStream PR to introduce additional functionality, including time-stackable output streams and the generation of new output files at specified frequencies (or interval). Thanks again @philipwjones !

xylar

My main review has been of the conceptual design and the interactions with the YAML file and Polaris. From that perspective, I think this is in good shape. As @philipwjones and I have discussed on Slack, there are some loose ends that need follow-up in future PRs to make full testing of this capability and its integration with Polaris possible. For example for now, no streams are being written out without explicit modification of the Omega code.

philipwjones added the Omega label Sep 26, 2024

philipwjones requested review from sbrus89 and mark-petersen September 26, 2024 15:17

philipwjones self-assigned this Sep 26, 2024

mark-petersen requested review from brian-oneill and mwarusz September 26, 2024 16:29

xylar self-requested a review September 26, 2024 17:25

philipwjones force-pushed the omega/iostream branch from 60000e0 to 3628310 Compare September 26, 2024 19:30

xylar mentioned this pull request Sep 27, 2024

Updates related to Omega IOStreams E3SM-Project/polaris#231

Draft

1 task

xylar reviewed Sep 30, 2024

View reviewed changes

brian-oneill reviewed Sep 30, 2024

View reviewed changes

components/omega/test/infra/IOStreamTest.cpp Outdated Show resolved Hide resolved

brian-oneill reviewed Sep 30, 2024

View reviewed changes

components/omega/test/infra/IOStreamTest.cpp Outdated Show resolved Hide resolved

hyungyukang self-requested a review October 2, 2024 00:11

sbrus89 reviewed Oct 2, 2024

View reviewed changes

mwarusz reviewed Oct 2, 2024

View reviewed changes

hyungyukang reviewed Oct 3, 2024

View reviewed changes

hyungyukang mentioned this pull request Oct 4, 2024

Add Tracers infrastructure #133

Merged

6 tasks

hyungyukang reviewed Oct 7, 2024

View reviewed changes

components/omega/configs/Default.yml Outdated Show resolved Hide resolved

philipwjones added 12 commits October 8, 2024 10:54

Add proper treatment of string literals in MetaData

8e241b2

Because the Metadata uses std::any, when a string literal is passed as an argument to the Metadata add routines, they are stored as static char* rather than std::string. This adds changes to treat this case properly.

Adds IOStreams capability

995a23b

This adds a new IOStreams capability in which different input and output streams can be defined to read or write a custom list of fields and related metadata at desired time frequencies.

Add IOStream documentation and fix linting issues

4320d57

fix missing return statements in IOStream

a8fccd2

added documentation of IOStream never option

f3d9287

removed commented code in IOStreams test

13448be

modified Default config file streams for initializing state

8fd8c8d

- and added comments to better describe use of initialization streams

removed unnecessary code

09bf010

added stream correctness test and force read/write option

2753115

- also fixed one dimension read bug

fix linting errors

1cca91e

fix index offset calculation in IOStream

bec9e2d

modified some time entries for more consistent formatting

785766e

philipwjones force-pushed the omega/iostream branch from 7e738f3 to 785766e Compare October 8, 2024 16:26

mwarusz approved these changes Oct 9, 2024

View reviewed changes

hyungyukang approved these changes Oct 9, 2024

View reviewed changes

xylar approved these changes Oct 9, 2024

View reviewed changes

philipwjones merged commit aa4fe34 into E3SM-Project:develop Oct 10, 2024
2 checks passed

mark-petersen removed their request for review October 14, 2024 13:59

philipwjones deleted the omega/iostream branch October 30, 2024 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds IOStream capabilities for omega input/output #132

Adds IOStream capabilities for omega input/output #132

philipwjones commented Sep 26, 2024 •

edited

Loading

philipwjones commented Sep 26, 2024

philipwjones commented Sep 26, 2024

brian-oneill commented Sep 26, 2024

philipwjones commented Sep 26, 2024

philipwjones commented Sep 26, 2024 •

edited

Loading

xylar left a comment

xylar Sep 30, 2024

philipwjones Sep 30, 2024

philipwjones Sep 30, 2024

xylar Sep 30, 2024

philipwjones Oct 1, 2024

hyungyukang commented Sep 30, 2024 •

edited

Loading

philipwjones commented Sep 30, 2024

hyungyukang commented Sep 30, 2024

hyungyukang commented Oct 2, 2024

philipwjones commented Oct 2, 2024

sbrus89 Oct 2, 2024

philipwjones Oct 2, 2024

xylar Oct 2, 2024

mwarusz left a comment

mwarusz Oct 2, 2024

philipwjones Oct 2, 2024

hyungyukang left a comment •

edited

Loading

hyungyukang commented Oct 3, 2024

mark-petersen commented Oct 4, 2024 •

edited

Loading

philipwjones commented Oct 4, 2024

philipwjones commented Oct 7, 2024

hyungyukang commented Oct 8, 2024

philipwjones commented Oct 8, 2024

mwarusz left a comment

hyungyukang left a comment

xylar left a comment

Adds IOStream capabilities for omega input/output #132

Adds IOStream capabilities for omega input/output #132

Conversation

philipwjones commented Sep 26, 2024 • edited Loading

philipwjones commented Sep 26, 2024

philipwjones commented Sep 26, 2024

brian-oneill commented Sep 26, 2024

philipwjones commented Sep 26, 2024

philipwjones commented Sep 26, 2024 • edited Loading

xylar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyungyukang commented Sep 30, 2024 • edited Loading

philipwjones commented Sep 30, 2024

hyungyukang commented Sep 30, 2024

hyungyukang commented Oct 2, 2024

philipwjones commented Oct 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwarusz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hyungyukang left a comment • edited Loading

Choose a reason for hiding this comment

hyungyukang commented Oct 3, 2024

mark-petersen commented Oct 4, 2024 • edited Loading

philipwjones commented Oct 4, 2024

philipwjones commented Oct 7, 2024

hyungyukang commented Oct 8, 2024

philipwjones commented Oct 8, 2024

mwarusz left a comment

Choose a reason for hiding this comment

hyungyukang left a comment

Choose a reason for hiding this comment

xylar left a comment

Choose a reason for hiding this comment

philipwjones commented Sep 26, 2024 •

edited

Loading

philipwjones commented Sep 26, 2024 •

edited

Loading

hyungyukang commented Sep 30, 2024 •

edited

Loading

hyungyukang left a comment •

edited

Loading

mark-petersen commented Oct 4, 2024 •

edited

Loading