-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated Runtime Land Block Elimination #263
Conversation
- Add sum_across_PEs_int4_2d to the sum_across_PEs interface - Allow mask_table file to be placed in run directory (now, the first dir that is looked at).
- Determine masked blocks. - Evenly distribute eliminated cells. - Fill ESMF gindex array accordingly. - During Export phase, set fields of eliminated cells to zero.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev/ncar #263 +/- ##
============================================
- Coverage 37.90% 37.85% -0.06%
============================================
Files 269 269
Lines 77176 77302 +126
Branches 14170 14194 +24
============================================
+ Hits 29255 29263 +8
- Misses 42641 42754 +113
- Partials 5280 5285 +5 ☔ View full report in Codecov by Sentry. |
2ac14d6
to
ee17218
Compare
This PR is fully tested and ready to be merged, but if @marshallward or others have comments, particularly regarding the to-do items listed in the PR description, please let us know. |
Thanks for this @alperaltuntas, this looks potentially very useful in contexts where one is not constraints to a particular layout. I added some specific comments in the code, and have some general thoughts below.
Unfortunately I am short on time right now, but these are my thoughts for the moment. Feel free to act on them as you wish 😄. Also, this has absolutely no bearing on the PR, but I like using |
Thanks, @marshallward! I couldn't locate your inline comments, but here are my quick responses to your bulletpoints above.
Right,
Right,
Indeed,
Sounds good!
It refers to the reduction of runtime when compared to a run with no (static or automated) masking. So, it just shows the impact of masking on the wallclock runtime.
Correct.
Unfortunately, I can't see your inline comments for some reason.
I agree! |
src/framework/MOM_domains.F90
Outdated
logical :: tripolar_N !< A flag indicating whether there is n. tripolar connectivity | ||
integer, intent(in) :: npes !< The desired number of active PEs. | ||
type(param_file_type), intent(in) :: param_file !< A structure to parse for run-time parameters | ||
character(len=128), intent(in) :: inputdir !< INPUTDIR parameter> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>
typo on the end?
"example of mask_table masks out 2 processors, (1,2) and (3,6), out of the 24 "//& | ||
"in a 4x6 layout: \n 2\n 4,6\n 1,2\n 3,6\n", default="MOM_mask_table", & | ||
layoutParam=.true.) | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description string is repeated, could it be saved to a variable?
call get_param(param_file, mdl, 'AUTO_MASKTABLE', auto_mask_table, & | ||
"Turn on automatic mask table generation to eliminate land blocks.", & | ||
default=.false., layoutParam=.true.) | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be called every time? Could the generated mask_table
potentially be saved and reused on subsequent runs?
@@ -1936,7 +1936,7 @@ subroutine get_global_shape(domain, niglobal, njglobal) | |||
njglobal = domain%njglobal | |||
end subroutine get_global_shape | |||
|
|||
!> Get the array ranges in one dimension for the divisions of a global index space | |||
!> Get the array ranges in one dimension for the divisions of a global index space (alternative to compute_extent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but if you can be bothered...
A better description of compute_block_extent
vs compute_extent
would be nice here, although I would struggle to write one myself. From what I could tell, compute_extent
is much more complex (and presumably safer).
src/framework/MOM_domains.F90
Outdated
enddo | ||
call close_file(file_ascii) | ||
|
||
call MOM_error(NOTE, "Wrote an auto-generated mask table at "//trim(filename)//".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up becoming more spam to stdout, which we are trying to reduce. Maybe only print this in debug mode?
src/framework/MOM_domains.F90
Outdated
|
||
! Read in bathymetric depth. | ||
D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere. | ||
call read_field(topo_filepath, trim(topo_varname), D, start=(/1, 1/), nread=n_global, no_domain=.true.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use modern array syntax? start=[1,1]
Also see comments above WRT scale=...
.
src/framework/MOM_domains.F90
Outdated
D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere. | ||
call read_field(topo_filepath, trim(topo_varname), D, start=(/1, 1/), nread=n_global, no_domain=.true.) | ||
|
||
allocate( mask(nx+2*ibuf, ny+2*jbuf), source=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after allocate(
src/framework/MOM_domains.F90
Outdated
character(len=:), allocatable, intent(in) :: filename !< Mask table file path (to be auto-generated.) | ||
integer, dimension(2), intent(out) :: layout !< The generated layout of PEs (incl. masked blocks) | ||
!local | ||
real, dimension(n_global(1), n_global(2)) :: D ! Bathymetric depth (to be read in from TOPO_FILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs dimensions ([m]
, or maybe [Z]
; see below)
real :: min_depth ! The minimum ocean depth in the same units as D | ||
real :: mask_depth ! The depth shallower than which to mask a point as land. | ||
real :: glob_ocn_frac ! ratio of ocean points to total number of points | ||
real :: r_p ! aspect ratio for division count p. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should also have dimensions ([m]
or [nondim]
).
src/framework/MOM_domains.F90
Outdated
ny = n_global(2) | ||
|
||
! Read in bathymetric depth. | ||
D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this current form, I believe this should not be redimensionalized from m to Z
, since TOPO_FILE
will also be in meters.
However, I suspect that the "correct" way to do this is to use read_field(..., scale=GV%m_to_Z...)
(or something like that) so that topo_field
comes in with [Z]
units, and the whole calculation is done in rescaled units.
I also reckon that MOM_read_data
should be used in place of read_field
, to avoid the explicit reference to MOM_IO_infra
. This might also help sort out any potential issues around rotational testing.
If the output from read_field
/MOM_read_data
is rescaled, then the D
max value should also be rescaled. (Bob can set his 2**N
to very high values in his testing.)
Sorry about that, I thought my overall comment was in the report. I've just submitted it.
If the runtime is this small, then perhaps it's not too important. (I believe I misinterpreted a comment somewhere.) It might be meaningful to not enable the automasking if a Aside from the inline comments (which have now been submitted), I think this looks good. |
- Dimensionalize topographic depth variables used to determine cell masks in auto masktable routine. - Raise error if the user provided PE layout is inconsistent with auto masktable generation. - Save the masktable parameter description to a string variable to avoid repetition. - Fix typos, whitespaces, use modern array syntax.
@gustavo-marques This PR is ready to be reviewed and merged. I believe @marshallward is working on a fix for the failing macOS tests, so I suppose we can ignore those CI failures for now. |
Due to poor handling of floating point in HDF5 1.14.3, it is currently not possible to use floating point exceptions (FPEs) whenever this version is present. The GitHub Actions CI nodes would randomly select either 1.14.2 or 1.14.3, and would raise an FPE error if 1.14.3 was selected. Additionally, the homebrew installation does not provide a clean method for selecting a different version of HDF5. Thus, for now we disable FPEs in the MacOS testing, and hope to catch any legitimate FP errors in the Ubuntu version. We will restore these tests as soon as this has been fixed in an easily-accessible version of HDF5. As part of this PR, I have also moved the FCFLAGS configuration to the platform specific Actions files, allowing for independent compiler configuration for each platform.
This PR introduces two enhancements:
AUTO_MASKTABLE
parameter to True. Relevant commits:More on how automated land block elimination works:
npes
(set by the user), calculate an upper limit on the potential number of new domain divisions:npes/glob_ocn_frac
, where glob_ocn_frac is the ratio of total ocean cells to total cells. While this upper limit may be overly optimistic for realistic domains, it can be achievable for idealistic domains.p
to determine the layout and identify the maximum number of blocks that can be eliminated to meet the targetnpes
. Once this condition is met, finalize the iteration and adopt the determinedp
as the new number of divisions (wherep
-npes
correspond to the number of masked blocks.)This entire iteration is quite fast, taking less than 0.1 seconds for our 0.66-degree workhorse grid.
Performance results:
When auto land block elimination is turned on, we get 20 to 23% speed up. Below table summarizes the model throughput (simulated years per day) for 3-month long CMOM_JRA.TL319_t232 runs on derecho.intel with various target PE numbers.
896:
1408:
Potential to-do items:
MOM _domains
module to a new module, sayMOM_auto_mask_table
. Note that this would necessitate moving theMOM_define_layout
subroutine to somewhere else (MOM_domains_infra
?) to prevent circular module dependency.ibuf
,r_extreme
, andauto_mask_table_fname
) to runtime parameters (though, if possible, I'd prefer to keep them as hard-coded parameters to keep the UX simple.)Testing
Ongoing. No answer changes and no issues so far.