Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds IOStream capabilities for omega input/output #132

Merged
merged 12 commits into from
Oct 10, 2024

Conversation

philipwjones
Copy link

@philipwjones philipwjones commented Sep 26, 2024

Here is finally the long-promised IOStream capability. With this addition, users define IOStreams in the input configuration file to specify input and output files, which fields should be included and at what time frequency (including one-time read/writes). See the included documentation for details.

This has been tested successfully on Chrysalis, but not yet on any GPU-enabled machines. I will begin testing that today. Also, I would like to add some additional tests to the unit test to better verify correctness.

Checklist

  • Documentation:
    • Design document has been generated and added to the docs
    • User's Guide has been updated
    • Developer's Guide has been updated
    • Documentation has been built locally and changes look as expected
  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • CTest unit tests for new features have been added per the approved design.
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.
    • Performance related PRs: Please include a relevant PACE experiment link documenting performance before and after.

@philipwjones
Copy link
Author

Tests on Frontier CPU/GPU pass after a minor fix for some missing return statements

@philipwjones
Copy link
Author

Well, actually, I lied above. The IOStreams unit test is passing, but for some reason the TEND_PLANE_TEST and TEND_SPHERE_TEST are both failing now. Since the new mods did not touch these codes and shouldn't impact them, I thought maybe it was just at the edge of the error tolerance, so tried increasing the error tolerance but that didn't help and indeed the errors are huge so something is really wrong. @brian-oneill @sbrus89 - any idea why the mods in this PR are changing answers in TendencyTermsTest? All other tests are passing.

@brian-oneill
Copy link

@philipwjones Your branch is missing #130. When we moved to just one default yaml file, those tests got broke and needed to be altered to use the viscosity coeffs from the config file.

@philipwjones
Copy link
Author

Doh...I thought I had rebased, but must not have fetched the latest...

@philipwjones
Copy link
Author

philipwjones commented Sep 26, 2024

After rebasing, all tests now pass on both Chrysalis and Frontier

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wanting to test more with my associated Polaris changes E3SM-Project/polaris#231 but ran into #137 (unrelated to these changes) when I tried.

In the meantime, I have a few questions and comments.

components/omega/configs/Default.yml Outdated Show resolved Hide resolved
IOStreams:
InitialState:
UsePointerFile: false
Filename: OmegaMesh.nc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is the intention to keep the horizontal mesh separate from the initial conditions and restart files.
It doesn't seem like there is currently a stream for reading the horizontal mesh and the name of the mesh file is still hard coded to OmegaMesh.nc:
https://github.com/philipwjones/E3SM/blob/omega/iostream/components/omega/src/base/Decomp.h#L243
Is there the intention of adding some sort of Mesh group and providing a stream for reading it (so that a filename other than OmegaMesh.nc can be used)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. This reflects the current status, but we do intend to separate Mesh stream and Mesh group.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the initial decomposition (Decomp) can't use streams because parallel IO can't be set up until after Decomp. But I might still be able to at least read the mesh file name from the config.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it sounds like having the mesh filename be read from a stream is silly because that would not be a stream you could modify (except for the filename). So it sounds like the mesh filename should be a normal config option somewhere earlier in the yaml file. It still would be convenient for Polaris if the mesh didn't always have to be named OmegaMesh.nc because it requires otherwise unnecessary logic specific to Omega. (Obviously, this isn't a good name for the mesh/initial condition for MPAS-Ocean.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem - the important routines take the filename as an argument assuming we were going that route eventually. Early on, it was just easy to go with the hardwire-soft link route.

components/omega/src/infra/IOStream.cpp Show resolved Hide resolved
components/omega/doc/userGuide/IOStreams.md Outdated Show resolved Hide resolved
@hyungyukang
Copy link

hyungyukang commented Sep 30, 2024

All tests passed on PM-CPU.

@philipwjones , I noticed several NetCDF files generated after running ctest, all of which are separated in time and lack time information. I understand these files are created specifically by ctest, but I am wondering if it is possible to stack the history output over time within the current IOStream implementation. If so, could you please provide a short and brief guidance on how to do this? It would be greatly helpful for my testing. If this is not a planned implementation at this time, I can also proceed with my temporary IO codes.

@philipwjones
Copy link
Author

@hyungyukang Yes, I was mostly testing basic read/write. We do want to support this and there is, in principle, the pieces in place for both the unlimited time dimension and appending multiple time slices in a file. But I have not done that before so didn't know how to implement without doing some research first. The IOStreams options are basically using the IfExists: append option, defining the time dimension appropriately and making sure the correct metadata are there. So I think I have not designed it out, but don't think it will work currently (may have even added a "not supported" message). I can certainly take a look at how you've done it and see if I can implement in a future PR. Or if you want to take a shot at it...

@hyungyukang
Copy link

@hyungyukang Yes, I was mostly testing basic read/write. We do want to support this and there is, in principle, the pieces in place for both the unlimited time dimension and appending multiple time slices in a file. But I have not done that before so didn't know how to implement without doing some research first. The IOStreams options are basically using the IfExists: append option, defining the time dimension appropriately and making sure the correct metadata are there. So I think I have not designed it out, but don't think it will work currently (may have even added a "not supported" message). I can certainly take a look at how you've done it and see if I can implement in a future PR. Or if you want to take a shot at it...

@philipwjones , Thanks a lot. I understand. My implementation is very simple and temporary. But I'll try something similar in this PR and let you know how it goes! I believe it will be much simpler and more intuitive thanks to your great work on this PR!

@hyungyukang hyungyukang self-requested a review October 2, 2024 00:11
@hyungyukang
Copy link

@philipwjones , I have been testing this IOStreams PR by writing some fields and have identified a bug in the Offset. It appears that the linear address of the offset vector Add was not computed correctly.

Before the fix, the written tracer1 field (a cosine bell) looks like this:
image

After the fix,
image

I will suggest the fix soon.

@philipwjones
Copy link
Author

Thanks @hyungyukang I was worried about that - none of the unit tests actually tested correctness, only successful read/write consistency and the same index calculation is used for both. Happy to see your fix.

- **PointerFilename:** Only required if UsePointerFile is true and should
be set to the full filename (with path) for the pointer file. Each stream
using a pointer file must define a unique pointer file name.
- **Filename:** Required in all cases except input streams using a pointer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have from reading the user guide is if there's a way to specify the frequency of creating a new file vs. the output frequency. It would be helpful to clarify when output gets written to an existing file and when it is written to a new file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, clearly I haven't really thought through the multiple time slice in a file case. The more I look into it, the more changes I need to make. Suspect I'll need to push this to a subsequent PR so we can get this base capability in this week.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reasonable assessment to me.

Copy link
Member

@mwarusz mwarusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests passed on chrysalis, pm-cpu, pm-gpu, frontier-cpu and frontier-gpu. I have some minor comments and questions.

components/omega/src/base/IO.h Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Show resolved Hide resolved
Comment on lines +915 to +938
// Extract and write the array of data based on the type, dimension and
// memory location. The IO routines require a contiguous data pointer on
// the host. Kokkos array types do not guarantee contigous memory for
// multi-dimensional arrays. Here we create a contiguous space and perform
// any other transformations (host-device data transfer, reduce precision).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this PR this is critical for the work to progress, so I am not requesting any changes right now, but I would like to suggest some potential improvements to this code. The packing into a contiguous buffer could be separated into a function or a function template. There is a way to check if a Kokkos array is contiguous (using array.span_is_contiguous()), which could be used to create a faster code path. Lastly, the packing should probably be done on the device using parallelFor and the buffer copied afterwards to the host.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do this on a later pass and in a separate PR once we have the working version merged.

Copy link

@hyungyukang hyungyukang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left my suggestion for fixing the bug in writing order yesterday, but I forgot to submit my review.

components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
@hyungyukang
Copy link

@philipwjones , I think I found a way to implement time-stackable output streams:

netcdf ocn.hist.0001-01-01_01\:00\:00 {
dimensions:
        MaxCellsOnEdge = 2 ;
        MaxEdges = 6 ;
        NCells = 2562 ;
        NEdges = 7680 ;
        NVertLevels = 3 ;
        NVertices = 5120 ;
        Time = UNLIMITED ; // (10 currently)
        VertexDegree = 3 ;
variables:
        double tracer1(Time, NCells, NVertLevels) ;
                tracer1:Description = "tracer1 at cell centers" ;
                tracer1:FillValue = -1.2345e-30 ;
                tracer1:Name = "tracer1" ;
                tracer1:StdName = "none" ;
                tracer1:Units = "" ;
                tracer1:ValidMax = 100. ;
                tracer1:ValidMin = 0. ;
                tracer1:_FillValue = -1.2345e-30 ;
                tracer1:long_name = "tracer1 at cell centers" ;
                tracer1:name = "tracer1" ;
                tracer1:standard_name = "none" ;
                tracer1:units = "" ;
                tracer1:valid_max = 100. ;
                tracer1:valid_min = 0. ;

// global attributes:
                :SimStartTime = "0001-01-01_00:00:00.00" ;
                :SimulationTime = "0001-01-01_01:00:00" ;
}

But the code needs to be discussed and organized further with you, so for now, let's leave it for the next IOStream PR. I wanted to implement this feature for Polaris testing. For now, I believe I can use this toy code for Polaris after modifying a bit more.

@hyungyukang hyungyukang mentioned this pull request Oct 4, 2024
6 tasks
@mark-petersen
Copy link

mark-petersen commented Oct 4, 2024

The current head, 7516da1, fails the iostreams test on Frontier cpu using crayclang:

./omega_ctest.sh -R IOSTREAM_TEST

Test project /lustre/orion/cli115/scratch/mpetersen/runs/241001_omega/build
    Start 18: IOSTREAM_TEST
1/1 Test #18: IOSTREAM_TEST ....................***Failed    5.35 sec
srun: error: frontier10341: tasks 0-7: Exited with exit code 1
srun: Terminating StepId=2640357.29

Do I need to copy a different file for this test to run successfully? Right now I copy the files

ls -lh /ccs/home/mpetersen/meshes/omega/O*nc
lrwxrwxrwx 1 mpetersen mpetersen 24 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaMesh.nc -> ocean.QU.240km.151209.nc
lrwxrwxrwx 1 mpetersen mpetersen 22 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaPlanarMesh.nc -> PlanarPeriodic48x48.nc
lrwxrwxrwx 1 mpetersen mpetersen 43 Jul 17 16:41 /ccs/home/mpetersen/meshes/omega/OmegaSphereMesh.nc -> cosine_bell_icos480_initial_state.230220.nc
Here are my steps:


######### Frontier cpu ############
CODE_DIR=opr
RUNDIR=241001_omega
mkdir /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR
cd !$

cd /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}
git submodule update --init --recursive externals/YAKL externals/ekat externals/scorpio cime
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR

module load cmake
rm -rf build
mkdir build
cd build
export PARMETIS_ROOT=$PROJWORK/cli115/pwjones/frontierlibs-cray/parmetis
cmake \
   -DOMEGA_CIME_COMPILER=crayclang \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -S /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}/components/omega -B .
./omega_build.sh

# linking:
cd test
ln -isf /ccs/home/mpetersen/meshes/omega/O*nc .
cp /ccs/home/mpetersen/repos/E3SM/${CODE_DIR}/components/omega/configs/Default.yml omega.yml


salloc -A cli115 -J inter -t 2:00:00 -q debug -N 1 -S 0
RUNDIR=241001_omega
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR/build
./omega_ctest.sh

@philipwjones
Copy link
Author

@mark-petersen Sorry - I pushed a test to capture the indexing error that Hyun saw, but have not yet fixed the indexing problem itself so the new unit test is correctly failing. Hopefully have a fix for the indexing soon...

@philipwjones
Copy link
Author

The index offset calculation has been fixed and I added a read test to make sure the read/write is consistent and returns a correct field. I also over-wrote the salinity field with an index-based value and manually checked the output to make sure the indexing is correct.

Passes new unit tests on Chrysalis/Intel and Frontier cpu/gpu so please feel free to re-review

@hyungyukang
Copy link

@philipwjones , I just applied your fix to my test code, and it worked like a charm. Thanks for fixing it!

image

  Because the Metadata uses std::any, when a string literal is passed
  as an argument to the Metadata add routines, they are stored as
  static char* rather than std::string. This adds changes to treat
  this case properly.
  This adds a new IOStreams capability in which different input
  and output streams can be defined to read or write a custom list
  of fields and related metadata at desired time frequencies.
  - and added comments to better describe use of initialization streams
@philipwjones
Copy link
Author

Rebased to pick up new tracer infra and modified the formatting of a couple of time entries per @hyungyukang suggestion. Still passes all tests.

Copy link
Member

@mwarusz mwarusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have been addressed. I retested on pm-cpu and pm-gpu and everything passed. Thanks for your work on this @philipwjones.

Copy link

@hyungyukang hyungyukang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipwjones , thank you very much for your dedicated efforts on this PR. I have tested this PR in conjunction with #133 and #140 simultaneously by conducting a cosine bell advection test case. Through this PR, I was able to write tracer history output files at different intervals, and it worked correctly after @philipwjones 's fix for index offset calculations (#132 (comment)).

I'm approving this PR based on my testing and the testing and visual inspection of others. All of my comments have been addressed, and @philipwjones and I will open a subsequent IOStream PR to introduce additional functionality, including time-stackable output streams and the generation of new output files at specified frequencies (or interval). Thanks again @philipwjones !

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main review has been of the conceptual design and the interactions with the YAML file and Polaris. From that perspective, I think this is in good shape. As @philipwjones and I have discussed on Slack, there are some loose ends that need follow-up in future PRs to make full testing of this capability and its integration with Polaris possible. For example for now, no streams are being written out without explicit modification of the Omega code.

@philipwjones philipwjones merged commit aa4fe34 into E3SM-Project:develop Oct 10, 2024
2 checks passed
@mark-petersen mark-petersen removed their request for review October 14, 2024 13:59
@philipwjones philipwjones deleted the omega/iostream branch October 30, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants