Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds IOStream capabilities for omega input/output #132

Merged
merged 12 commits into from
Oct 10, 2024
Merged
66 changes: 66 additions & 0 deletions components/omega/configs/Default.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,69 @@ Omega:
Tracers:
Base: [Temp, Salt]
Debug: [Debug1, Debug2, Debug3]
IOStreams:
# InitialState should only be used when starting from scratch
# After the simulations initial start, the frequency should be
# changed to never so that the initial state file is not read.
InitialState:
UsePointerFile: false
Filename: OmegaMesh.nc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is the intention to keep the horizontal mesh separate from the initial conditions and restart files.
It doesn't seem like there is currently a stream for reading the horizontal mesh and the name of the mesh file is still hard coded to OmegaMesh.nc:
https://github.com/philipwjones/E3SM/blob/omega/iostream/components/omega/src/base/Decomp.h#L243
Is there the intention of adding some sort of Mesh group and providing a stream for reading it (so that a filename other than OmegaMesh.nc can be used)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. This reflects the current status, but we do intend to separate Mesh stream and Mesh group.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the initial decomposition (Decomp) can't use streams because parallel IO can't be set up until after Decomp. But I might still be able to at least read the mesh file name from the config.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it sounds like having the mesh filename be read from a stream is silly because that would not be a stream you could modify (except for the filename). So it sounds like the mesh filename should be a normal config option somewhere earlier in the yaml file. It still would be convenient for Polaris if the mesh didn't always have to be named OmegaMesh.nc because it requires otherwise unnecessary logic specific to Omega. (Obviously, this isn't a good name for the mesh/initial condition for MPAS-Ocean.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem - the important routines take the filename as an argument assuming we were going that route eventually. Early on, it was just easy to go with the hardwire-soft link route.

Mode: read
Precision: double
Freq: 1
FreqUnits: OnStartup
UseStartEnd: false
Contents:
- Restart
# Restarts are used to initialize for all job submissions after the very
# first startup job. We use UseStartEnd with a start time just after the
# simulation start time so that omega does not attempt to use a restart
# for the first startup job.
RestartRead:
UsePointerFile: true
PointerFilename: ocn.pointer
Mode: read
Precision: double
Freq: 1
FreqUnits: OnStartup
UseStartEnd: true
StartTime: 0001-01-01_00:00:01
EndTime: 99999-12-31_00:00:00
Contents:
- Restart
RestartWrite:
UsePointerFile: true
PointerFilename: ocn.pointer
Filename: ocn.restart.$Y-$M-$D_$h.$m.$s
Mode: write
IfExists: replace
Precision: double
Freq: 6
FreqUnits: months
UseStartEnd: false
Contents:
- Restart
History:
UsePointerFile: false
Filename: ocn.hist.$SimTime
Mode: write
IfExists: replace
Precision: double
Freq: 1
FreqUnits: months
UseStartEnd: false
Contents:
- Tracers
Highfreq:
UsePointerFile: false
Filename: ocn.hifreq.$Y-$M-$D_$h.$m.$s
Mode: write
IfExists: replace
Precision: single
Freq: 10
FreqUnits: days
UseStartEnd: true
StartTime: 0001-06-01_00:00:00
EndTime: 0001-06-30_00:00:00
Contents:
- Tracers
89 changes: 89 additions & 0 deletions components/omega/doc/devGuide/IOStreams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
(omega-dev-iostreams)=

## IO Streams (IOStream)

Most input and output for Omega occurs through IOStreams. Each stream
defines a file, the contents to be read/written and the time frequency
for reading and writing. Defining streams via the input configuration
file is described in the [User Guide](#omega-user-iostreams). IOStreams
are built on top of the parallel IO infrastructure described in the
[IO Section](#omega-dev-IO) and the Field and Metadata described in the
[Field Section](#omega-dev-Field). Here we describe the classes and functions
used to implement IOStreams. Any module accessing an IOStream instance
or related functions must include the ``IOStream.h`` header file.

All IOStreams are initialized in a two-step process. A call to the
init routine should take place early in the Omega initialization after
the ModelClock has been initialized using:
```c++
int Err = IOStream::init(ModelClock);
```
This routine extracts all the stream definitions from the input configuration
file and creates all the Streams. This initialization also defines the
contents of each Stream but does not yet validate those contents against all
the defined Fields. The contents of all streams should be validated at the
end of initialization (when all Fields have been defined) using the call:
```c++
bool AllValidate = IOStream::validateAll();
```
However, if a stream is needed (eg a read stream) during initialization
before the validateAll call, a single stream can be validated using
```c++
bool Validated = MyStream.validate();
```
and the validation status can be checked with
```c++
bool Validate = MyStream.isValidated();
```
All streams must be validated before use to make sure the Fields have
been defined and the relevant data arrays have been attached to Fields and
are available to access. At the end of a simulation, IOStreams must be
finalized using
```c++
int Err = IOStream::finalize(ModelClock);
```
so that any final writes can take place for the OnShutdown streams and to
deallocate all defined streams and arrays. If a stream needs to be removed
before that time, an erase function is provided:
```c++
IOStream::erase(StreamName);
```

For most output streams, we provide a writeAll interface that should be placed
at an appropriate time during the time step loop:
```c++
int Err = IOStream::writeAll(ModelClock);
```
This function checks each write stream and writes the file if it is time, based
on a time manager alarm that is defined during initialization for each stream
based on the time frequency in the streams configuration. After writing the
file, the alarm is reset for the next write time. If a file must be written
outside of this routine, a single-stream write can take place using:
```c++
int Err = IOStream::write(StreamName, ModelClock);
```

Reading files (eg for initialization, restart or forcing) does not often
take place all at once, so no readAll interface is provided. Instead, each
input stream is read using:
```c++
int Err = IOStream::read(StreamName, ModelClock, ReqMetadata);
```
where ReqMetadata is a variable of type Metadata (defined in Field but
essentially a ``std::map<std::string, std::any>`` for the name/value pair).
This variable should incude the names of global metadata that are desired
from the input file. For example, if a time string is needed to verify the
input file corresponds to a desired time, the required metadata can be
initialized with
```c++
Metadata ReqMetadata;
ReqMetadata["ForcingTime"] = "";
```
The Metadata corresponding to ForcingTime will then be read from the file
and inserted as the Metadata value. If no metadata is to be read from the
file, then an empty ReqMetadata variable can be passed.

As described in the [User Guide](#omega-user-iostreams), all streams are
defined in the input configuration file and most other IOStream functions
are associated either with that initialization or to support the read/write
functions above.
2 changes: 2 additions & 0 deletions components/omega/doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ userGuide/Decomp
userGuide/Dimension
userGuide/Field
userGuide/IO
userGuide/IOStreams
userGuide/Halo
userGuide/HorzMesh
userGuide/HorzOperators
Expand Down Expand Up @@ -66,6 +67,7 @@ devGuide/Decomp
devGuide/Dimension
devGuide/Field
devGuide/IO
devGuide/IOStreams
devGuide/Halo
devGuide/HorzMesh
devGuide/HorzOperators
Expand Down
142 changes: 142 additions & 0 deletions components/omega/doc/userGuide/IOStreams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
(omega-user-iostreams)=

## IO Streams (IOStream)

IO Streams are the primary mechanism for users to specify input and output
for Omega. An IOStream can be defined for any number of fields and at desired
time frequencies (including one-time or at startup/shutdown). IOStreams are
defined in the Omega input configuration file in an IOStreams section:

```yaml
Omega:
# other config options removed for brevity
IOStreams:
InitialState:
UsePointerFile: false
Filename: OmegaMesh.nc
Mode: read
Precision: double
Freq: 1
FreqUnits: OnStartup
UseStartEnd: false
Contents:
- Restart
RestartWrite:
UsePointerFile: true
PointerFilename: ocn.pointer
Filename: ocn.restart.$Y-$M-$D_$h.$m.$s
Mode: write
IfExists: replace
Precision: double
Freq: 6
FreqUnits: months
UseStartEnd: false
Contents:
- Restart
History:
UsePointerFile: false
Filename: ocn.hist.$SimTime
Mode: write
IfExists: replace
Precision: double
Freq: 1
FreqUnits: months
UseStartEnd: false
Contents:
- Tracers
Highfreq:
UsePointerFile: false
Filename: ocn.hifreq.$Y-$M-$D_$h.$m.$s
Mode: write
IfExists: replace
Precision: single
Freq: 10
FreqUnits: days
UseStartEnd: true
StartTime: 0001-06-01_00.00.00
EndTime: 0001-06-30_00.00.00
Contents:
- Tracers
```

Each stream has a number of required and optional parameters for customizing
input and output. These options are indented below the stream name as shown
in the sample YAML entries above. They include:
- **UsePointerFile:** A required flag that is either true or false. A pointer
file is used for cases like restart files where the last file written can
be stored for the next job submission so that the configuration file does
not need to be edited between job submissions.
- **PointerFilename:** Only required if UsePointerFile is true and should
be set to the full filename (with path) for the pointer file. Each stream
using a pointer file must define a unique pointer file name.
- **Filename:** Required in all cases except input streams using a pointer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have from reading the user guide is if there's a way to specify the frequency of creating a new file vs. the output frequency. It would be helpful to clarify when output gets written to an existing file and when it is written to a new file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, clearly I haven't really thought through the multiple time slice in a file case. The more I look into it, the more changes I need to make. Suspect I'll need to push this to a subsequent PR so we can get this base capability in this week.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reasonable assessment to me.

file. This is the complete name (with path) of the file to be read or written.
A filename template is also supported in which simulation (or real) time
can be used in the file name. As the examples above show, accepted keys for
a template can be:
- $SimTime for the current simulation time in a standard time string (note
that this time string may include colon separators that can be a problem
for filenames so using the individual keys below is preferred).
- $Y for the current simulation year
- $M for the current simulation month
- $D for the current simulation day
- $h for the current simulation hour
- $m for the current simulation minute
- $s for the current simulation second
- $WallTime for the time IRL for use when you might need the actual time for
a debug time stamp
- **Mode:** A required field that is either read or write. There is no
readwrite option (eg for restarts) so a separate stream should be
defined for such cases as in the examples above.
- **IfExists:** A required field for write streams that determines behavior
if the file already exists. Acceptable options are:
- Fail if you want the code to exit with an error
- Replace if you want to replace the existing file with the new file
- Append if you want to append (eg multiple time slices) to the existing
file (this option is not currently supported).
- **Precision:** A field that determines whether floating point numbers are
written in full (double) precision or reduced (single). Acceptable values
are double or single. If not present, double is assumed, but a warning
message will be generated so it is best to explicitly include it.
- **Freq:** A required integer field that determines the frequency of
input/output in units determined by the next FreqUnits entry.
- **FreqUnits:** A required field that, combined with the integer frequency,
determines the frequency of input/output. Acceptable values include:
- OnStartup for files read/written once on startup
- OnShutdown for files read/written only once on model exit
- AtTime or OnTime or Time or TimeInstant for a one-time read or write
at the time specified in the StartTime entry
- Never if the stream should not be used but you wish to retain the
entry in the config file (a warning will be output to log file)
- Years for a frequency every Freq years (*not* Freq times per year)
- Months for a frequency every Freq months (*not* Freq times per month)
- Days for a frequency every Freq days (*not* Freq times per day)
- Hours for a frequency every Freq hours (*not* Freq times per hour)
- Minutes for a frequency every Freq minutes (*not* Freq times per minute)
- Seconds for a frequency every Freq seconds (*not* Freq times per seconds)
- **UseStartEnd:** A required entry that is true or false and is used if the
I/O is desired only within a certain time interval. An example might be
for specifying high-frequency output within a certain period of a simulation.
- **StartTime:** A field only required when UseStartEnd is true or if
the FreqUnits request a one-time read/write. The StartTime must be a time
string of the format YYYY-MM-DD_hh.mm.ss (though the delimiters can be
any non-numeric character). The year entry is the integer year and can be
four or more digits. The StartTime is inclusive - the I/O will occur at or
after that date/time.
- **EndTime:** A field that is only required when UseStartEnd is true. It
requires the same format as StartTime but unlike StartTime, the EndTime
is not inclusive and I/O only occurs for times before the EndTime. If a
file is desired at the EndTime, the user should specify an EndTime slightly
later (less than a time step) than the desired end time.
- **Contents:** This is a required field that contains an itemized list of
each Field or FieldGroup that is desired in the output. The name must
match a name of a defined Field or Group within Omega. Group names are
preferred to keep the list of fields short so Omega will define convenient
FieldGroups like Restart, State, Tracers that will include all members
of the group. If only a subset of Fields from a Group is desired, the
individual Field names should be specified and not the Group name.

This streams configuration should be sufficient to define all input and output
from the model and provides a relatively simple interface for a typical user.
However, if necessary (eg before streams have been defined), the specific
interfaces in the lower level [IO](#omega-user-IO) module can be used.
37 changes: 31 additions & 6 deletions components/omega/src/base/IO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -297,8 +297,18 @@ int openFile(
// Closes an open file using the fileID, returns an error code
int closeFile(int &FileID /// [in] ID of the file to be closed
) {
// Just calls the PIO close routine
int Err = PIOc_closefile(FileID);
int Err = 0;

// Make sure all operations completed before closing
Err = PIOc_sync(FileID);
if (Err != PIO_NOERR)
LOG_ERROR("Error syncing file before closing");

// Call the PIO close routine
Err = PIOc_closefile(FileID);
if (Err != PIO_NOERR)
LOG_ERROR("Error closing file {} in PIO", FileID);

return Err;

} // End closeFile
Expand Down Expand Up @@ -454,9 +464,9 @@ int writeMeta(const std::string &MetaName, // [in] name of metadata

} // End writeMeta (R8)

int writeMeta(const std::string &MetaName, // [in] name of metadata
std::string MetaValue, // [in] value of metadata
int FileID, // [in] ID of the file for writing
int writeMeta(const std::string &MetaName, // [in] name of metadata
const std::string &MetaValue, // [in] value of metadata
int FileID, // [in] ID of the file for writing
int VarID // [in] ID for variable associated with metadata
) {

Expand Down Expand Up @@ -690,8 +700,10 @@ int readArray(void *Array, // [out] array to be read

// Find variable ID from file
Err = PIOc_inq_varid(FileID, VarName.c_str(), &VarID);
if (Err != PIO_NOERR)
if (Err != PIO_NOERR) {
LOG_ERROR("IO::readArray: Error finding varid for variable {}", VarName);
return Err;
}

// PIO Read array call to read the distributed array
PIO_Offset ASize = Size;
Expand Down Expand Up @@ -721,6 +733,19 @@ int writeArray(void *Array, // [in] array to be written
PIO_Offset Asize = Size;

Err = PIOc_write_darray(FileID, VarID, DecompID, Asize, Array, FillValue);
if (Err != PIO_NOERR) {
LOG_ERROR("Error in PIO writing distributed array");
return Err;
}

// Make sure write is complete before returning
// We may be able to remove this for efficiency later but it was
// needed during testing
Err = PIOc_sync(FileID);
if (Err != PIO_NOERR) {
LOG_ERROR("Error in PIO sychronizing file after write");
return Err;
}

return Err;

Expand Down
6 changes: 3 additions & 3 deletions components/omega/src/base/IO.h
Original file line number Diff line number Diff line change
Expand Up @@ -194,9 +194,9 @@ int writeMeta(const std::string &MetaName, ///< [in] name of metadata
int FileID, ///< [in] ID of the file for writing
int VarID ///< [in] ID for variable associated with metadata
);
int writeMeta(const std::string &MetaName, ///< [in] name of metadata
std::string MetaValue, ///< [in] value of metadata
int FileID, ///< [in] ID of the file for writing
int writeMeta(const std::string &MetaName, ///< [in] name of metadata
const std::string &MetaValue, ///< [in] value of metadata
int FileID, ///< [in] ID of the file for writing
int VarID ///< [in] ID for variable associated with metadata
);

Expand Down
Loading
Loading