-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds IOStream capabilities for omega input/output #132
Changes from all commits
8e241b2
995a23b
4320d57
a8fccd2
f3d9287
13448be
8fd8c8d
09bf010
2753115
1cca91e
bec9e2d
785766e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
(omega-dev-iostreams)= | ||
|
||
## IO Streams (IOStream) | ||
|
||
Most input and output for Omega occurs through IOStreams. Each stream | ||
defines a file, the contents to be read/written and the time frequency | ||
for reading and writing. Defining streams via the input configuration | ||
file is described in the [User Guide](#omega-user-iostreams). IOStreams | ||
are built on top of the parallel IO infrastructure described in the | ||
[IO Section](#omega-dev-IO) and the Field and Metadata described in the | ||
[Field Section](#omega-dev-Field). Here we describe the classes and functions | ||
used to implement IOStreams. Any module accessing an IOStream instance | ||
or related functions must include the ``IOStream.h`` header file. | ||
|
||
All IOStreams are initialized in a two-step process. A call to the | ||
init routine should take place early in the Omega initialization after | ||
the ModelClock has been initialized using: | ||
```c++ | ||
int Err = IOStream::init(ModelClock); | ||
``` | ||
This routine extracts all the stream definitions from the input configuration | ||
file and creates all the Streams. This initialization also defines the | ||
contents of each Stream but does not yet validate those contents against all | ||
the defined Fields. The contents of all streams should be validated at the | ||
end of initialization (when all Fields have been defined) using the call: | ||
```c++ | ||
bool AllValidate = IOStream::validateAll(); | ||
``` | ||
However, if a stream is needed (eg a read stream) during initialization | ||
before the validateAll call, a single stream can be validated using | ||
```c++ | ||
bool Validated = MyStream.validate(); | ||
``` | ||
and the validation status can be checked with | ||
```c++ | ||
bool Validate = MyStream.isValidated(); | ||
``` | ||
All streams must be validated before use to make sure the Fields have | ||
been defined and the relevant data arrays have been attached to Fields and | ||
are available to access. At the end of a simulation, IOStreams must be | ||
finalized using | ||
```c++ | ||
int Err = IOStream::finalize(ModelClock); | ||
``` | ||
so that any final writes can take place for the OnShutdown streams and to | ||
deallocate all defined streams and arrays. If a stream needs to be removed | ||
before that time, an erase function is provided: | ||
```c++ | ||
IOStream::erase(StreamName); | ||
``` | ||
|
||
For most output streams, we provide a writeAll interface that should be placed | ||
at an appropriate time during the time step loop: | ||
```c++ | ||
int Err = IOStream::writeAll(ModelClock); | ||
``` | ||
This function checks each write stream and writes the file if it is time, based | ||
on a time manager alarm that is defined during initialization for each stream | ||
based on the time frequency in the streams configuration. After writing the | ||
file, the alarm is reset for the next write time. If a file must be written | ||
outside of this routine, a single-stream write can take place using: | ||
```c++ | ||
int Err = IOStream::write(StreamName, ModelClock); | ||
``` | ||
|
||
Reading files (eg for initialization, restart or forcing) does not often | ||
take place all at once, so no readAll interface is provided. Instead, each | ||
input stream is read using: | ||
```c++ | ||
int Err = IOStream::read(StreamName, ModelClock, ReqMetadata); | ||
``` | ||
where ReqMetadata is a variable of type Metadata (defined in Field but | ||
essentially a ``std::map<std::string, std::any>`` for the name/value pair). | ||
This variable should incude the names of global metadata that are desired | ||
from the input file. For example, if a time string is needed to verify the | ||
input file corresponds to a desired time, the required metadata can be | ||
initialized with | ||
```c++ | ||
Metadata ReqMetadata; | ||
ReqMetadata["ForcingTime"] = ""; | ||
``` | ||
The Metadata corresponding to ForcingTime will then be read from the file | ||
and inserted as the Metadata value. If no metadata is to be read from the | ||
file, then an empty ReqMetadata variable can be passed. | ||
|
||
As described in the [User Guide](#omega-user-iostreams), all streams are | ||
defined in the input configuration file and most other IOStream functions | ||
are associated either with that initialization or to support the read/write | ||
functions above. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
(omega-user-iostreams)= | ||
|
||
## IO Streams (IOStream) | ||
|
||
IO Streams are the primary mechanism for users to specify input and output | ||
for Omega. An IOStream can be defined for any number of fields and at desired | ||
time frequencies (including one-time or at startup/shutdown). IOStreams are | ||
defined in the Omega input configuration file in an IOStreams section: | ||
|
||
```yaml | ||
Omega: | ||
# other config options removed for brevity | ||
IOStreams: | ||
InitialState: | ||
UsePointerFile: false | ||
Filename: OmegaMesh.nc | ||
Mode: read | ||
Precision: double | ||
Freq: 1 | ||
FreqUnits: OnStartup | ||
UseStartEnd: false | ||
Contents: | ||
- Restart | ||
RestartWrite: | ||
UsePointerFile: true | ||
PointerFilename: ocn.pointer | ||
Filename: ocn.restart.$Y-$M-$D_$h.$m.$s | ||
Mode: write | ||
IfExists: replace | ||
Precision: double | ||
Freq: 6 | ||
FreqUnits: months | ||
UseStartEnd: false | ||
Contents: | ||
- Restart | ||
History: | ||
UsePointerFile: false | ||
Filename: ocn.hist.$SimTime | ||
Mode: write | ||
IfExists: replace | ||
Precision: double | ||
Freq: 1 | ||
FreqUnits: months | ||
UseStartEnd: false | ||
Contents: | ||
- Tracers | ||
Highfreq: | ||
UsePointerFile: false | ||
Filename: ocn.hifreq.$Y-$M-$D_$h.$m.$s | ||
Mode: write | ||
IfExists: replace | ||
Precision: single | ||
Freq: 10 | ||
FreqUnits: days | ||
UseStartEnd: true | ||
StartTime: 0001-06-01_00.00.00 | ||
EndTime: 0001-06-30_00.00.00 | ||
Contents: | ||
- Tracers | ||
``` | ||
|
||
Each stream has a number of required and optional parameters for customizing | ||
input and output. These options are indented below the stream name as shown | ||
in the sample YAML entries above. They include: | ||
- **UsePointerFile:** A required flag that is either true or false. A pointer | ||
file is used for cases like restart files where the last file written can | ||
be stored for the next job submission so that the configuration file does | ||
not need to be edited between job submissions. | ||
- **PointerFilename:** Only required if UsePointerFile is true and should | ||
be set to the full filename (with path) for the pointer file. Each stream | ||
using a pointer file must define a unique pointer file name. | ||
- **Filename:** Required in all cases except input streams using a pointer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One question I have from reading the user guide is if there's a way to specify the frequency of creating a new file vs. the output frequency. It would be helpful to clarify when output gets written to an existing file and when it is written to a new file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, clearly I haven't really thought through the multiple time slice in a file case. The more I look into it, the more changes I need to make. Suspect I'll need to push this to a subsequent PR so we can get this base capability in this week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds like a reasonable assessment to me. |
||
file. This is the complete name (with path) of the file to be read or written. | ||
A filename template is also supported in which simulation (or real) time | ||
can be used in the file name. As the examples above show, accepted keys for | ||
a template can be: | ||
- $SimTime for the current simulation time in a standard time string (note | ||
that this time string may include colon separators that can be a problem | ||
for filenames so using the individual keys below is preferred). | ||
- $Y for the current simulation year | ||
- $M for the current simulation month | ||
- $D for the current simulation day | ||
- $h for the current simulation hour | ||
- $m for the current simulation minute | ||
- $s for the current simulation second | ||
- $WallTime for the time IRL for use when you might need the actual time for | ||
a debug time stamp | ||
- **Mode:** A required field that is either read or write. There is no | ||
readwrite option (eg for restarts) so a separate stream should be | ||
defined for such cases as in the examples above. | ||
- **IfExists:** A required field for write streams that determines behavior | ||
if the file already exists. Acceptable options are: | ||
- Fail if you want the code to exit with an error | ||
- Replace if you want to replace the existing file with the new file | ||
- Append if you want to append (eg multiple time slices) to the existing | ||
file (this option is not currently supported). | ||
- **Precision:** A field that determines whether floating point numbers are | ||
written in full (double) precision or reduced (single). Acceptable values | ||
are double or single. If not present, double is assumed, but a warning | ||
message will be generated so it is best to explicitly include it. | ||
- **Freq:** A required integer field that determines the frequency of | ||
input/output in units determined by the next FreqUnits entry. | ||
- **FreqUnits:** A required field that, combined with the integer frequency, | ||
determines the frequency of input/output. Acceptable values include: | ||
- OnStartup for files read/written once on startup | ||
- OnShutdown for files read/written only once on model exit | ||
- AtTime or OnTime or Time or TimeInstant for a one-time read or write | ||
at the time specified in the StartTime entry | ||
- Never if the stream should not be used but you wish to retain the | ||
entry in the config file (a warning will be output to log file) | ||
- Years for a frequency every Freq years (*not* Freq times per year) | ||
- Months for a frequency every Freq months (*not* Freq times per month) | ||
- Days for a frequency every Freq days (*not* Freq times per day) | ||
- Hours for a frequency every Freq hours (*not* Freq times per hour) | ||
- Minutes for a frequency every Freq minutes (*not* Freq times per minute) | ||
- Seconds for a frequency every Freq seconds (*not* Freq times per seconds) | ||
- **UseStartEnd:** A required entry that is true or false and is used if the | ||
I/O is desired only within a certain time interval. An example might be | ||
for specifying high-frequency output within a certain period of a simulation. | ||
- **StartTime:** A field only required when UseStartEnd is true or if | ||
the FreqUnits request a one-time read/write. The StartTime must be a time | ||
string of the format YYYY-MM-DD_hh.mm.ss (though the delimiters can be | ||
any non-numeric character). The year entry is the integer year and can be | ||
four or more digits. The StartTime is inclusive - the I/O will occur at or | ||
after that date/time. | ||
- **EndTime:** A field that is only required when UseStartEnd is true. It | ||
requires the same format as StartTime but unlike StartTime, the EndTime | ||
is not inclusive and I/O only occurs for times before the EndTime. If a | ||
file is desired at the EndTime, the user should specify an EndTime slightly | ||
later (less than a time step) than the desired end time. | ||
- **Contents:** This is a required field that contains an itemized list of | ||
each Field or FieldGroup that is desired in the output. The name must | ||
match a name of a defined Field or Group within Omega. Group names are | ||
preferred to keep the list of fields short so Omega will define convenient | ||
FieldGroups like Restart, State, Tracers that will include all members | ||
of the group. If only a subset of Fields from a Group is desired, the | ||
individual Field names should be specified and not the Group name. | ||
|
||
This streams configuration should be sufficient to define all input and output | ||
from the model and provides a relatively simple interface for a typical user. | ||
However, if necessary (eg before streams have been defined), the specific | ||
interfaces in the lower level [IO](#omega-user-IO) module can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is the intention to keep the horizontal mesh separate from the initial conditions and restart files.
It doesn't seem like there is currently a stream for reading the horizontal mesh and the name of the mesh file is still hard coded to
OmegaMesh.nc
:https://github.com/philipwjones/E3SM/blob/omega/iostream/components/omega/src/base/Decomp.h#L243
Is there the intention of adding some sort of
Mesh
group and providing a stream for reading it (so that a filename other thanOmegaMesh.nc
can be used)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. This reflects the current status, but we do intend to separate Mesh stream and Mesh group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the initial decomposition (Decomp) can't use streams because parallel IO can't be set up until after Decomp. But I might still be able to at least read the mesh file name from the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it sounds like having the mesh filename be read from a stream is silly because that would not be a stream you could modify (except for the filename). So it sounds like the mesh filename should be a normal config option somewhere earlier in the yaml file. It still would be convenient for Polaris if the mesh didn't always have to be named
OmegaMesh.nc
because it requires otherwise unnecessary logic specific to Omega. (Obviously, this isn't a good name for the mesh/initial condition for MPAS-Ocean.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem - the important routines take the filename as an argument assuming we were going that route eventually. Early on, it was just easy to go with the hardwire-soft link route.