Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Runtime Land Block Elimination #263

Merged
merged 10 commits into from
Dec 20, 2023
Merged

Automated Runtime Land Block Elimination #263

merged 10 commits into from
Dec 20, 2023

Conversation

alperaltuntas
Copy link
Member

@alperaltuntas alperaltuntas commented Nov 5, 2023

This PR introduces two enhancements:

  1. Automatic Mask Table Generation: This feature allows for the elimination of land blocks by automatically generating a mask_table during domain initialization at runtime. It eliminates the need for any preprocessing steps and tools, simplifying the user experience. It also ensures that the initial PE count set by the user is fully utilized, thereby eliminating the need for a clean re-build in CESM. Users can activate this option by setting the AUTO_MASKTABLE parameter to True. Relevant commits:
  1. Land block elimination support in the NUOPC cap. Relevant commits:

More on how automated land block elimination works:

  • Based on the target PE count npes (set by the user), calculate an upper limit on the potential number of new domain divisions: npes/glob_ocn_frac, where glob_ocn_frac is the ratio of total ocean cells to total cells. While this upper limit may be overly optimistic for realistic domains, it can be achievable for idealistic domains.
  • Starting from this upper limit, iteratively explore different domain division counts p to determine the layout and identify the maximum number of blocks that can be eliminated to meet the target npes. Once this condition is met, finalize the iteration and adopt the determined p as the new number of divisions (where p - npes correspond to the number of masked blocks.)
  • For FMS to read back in, write out the mask table based on the number of divisions determined in the previous step and adopt the corresponding layout as the new domain layout.

This entire iteration is quite fast, taking less than 0.1 seconds for our 0.66-degree workhorse grid.

Performance results:

When auto land block elimination is turned on, we get 20 to 23% speed up. Below table summarizes the model throughput (simulated years per day) for 3-month long CMOM_JRA.TL319_t232 runs on derecho.intel with various target PE numbers.

MOM6 PEs throughput (base) throughput (auto LBE on)
640 18.89 24.29
896 25.75 32.11
1152 31.18 40.01
1280 34.62 42.15
1408 37.19 46.81
  • Below images display the land block elimination for two select PE counts, 896 (default) and 1408 (highest tried):

896:
pe896

1408:
pe1408

Potential to-do items:

  • We may consider moving the newly added subroutines (for auto masking) from the MOM _domains module to a new module, say MOM_auto_mask_table. Note that this would necessitate moving the MOM_define_layout subroutine to somewhere else (MOM_domains_infra?) to prevent circular module dependency.
  • We may consider turning some hard-coded parameters and filenames (e.g., ibuf, r_extreme, and auto_mask_table_fname) to runtime parameters (though, if possible, I'd prefer to keep them as hard-coded parameters to keep the UX simple.)
  • In my Python prototype, memoization of determine_land_blocks sped up the iteration massively. If iteration performance becomes an issue in high resolution domains (e.g., in 1/12 deg), we may consider memoizing the Fortran version of determine_land_blocks subroutine as well.

Testing

Ongoing. No answer changes and no issues so far.

 - Add sum_across_PEs_int4_2d to the sum_across_PEs interface
 - Allow mask_table file to be placed in run directory (now,
the first dir that is looked at).
- Determine masked blocks.
- Evenly distribute eliminated cells.
- Fill ESMF gindex array accordingly.
- During Export phase, set fields of eliminated cells to zero.
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2023

Codecov Report

Attention: 117 lines in your changes are missing coverage. Please review.

Comparison is base (d363034) 37.90% compared to head (ef3e5a6) 37.85%.
Report is 1 commits behind head on dev/ncar.

❗ Current head ef3e5a6 differs from pull request most recent head 05cd9b9. Consider uploading reports for the commit 05cd9b9 to get more accurate results

Files Patch % Lines
src/framework/MOM_domains.F90 9.60% 108 Missing and 5 partials ⚠️
config_src/infra/FMS1/MOM_domain_infra.F90 0.00% 3 Missing ⚠️
src/ocean_data_assim/MOM_oda_driver.F90 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           dev/ncar     #263      +/-   ##
============================================
- Coverage     37.90%   37.85%   -0.06%     
============================================
  Files           269      269              
  Lines         77176    77302     +126     
  Branches      14170    14194      +24     
============================================
+ Hits          29255    29263       +8     
- Misses        42641    42754     +113     
- Partials       5280     5285       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alperaltuntas
Copy link
Member Author

This PR is fully tested and ready to be merged, but if @marshallward or others have comments, particularly regarding the to-do items listed in the PR description, please let us know.

@marshallward
Copy link

Thanks for this @alperaltuntas, this looks potentially very useful in contexts where one is not constraints to a particular layout.

I added some specific comments in the code, and have some general thoughts below.

  • I am guessing that this ignores LAYOUT and IO_LAYOUT if the AUTO_MASKTABLE is turned on, meaning that it would grab npes from the launcher (mpirun, srun, etc.) What happens if LAYOUT is set and there's a conflict? Should there be an error? I would discourage a WARNING, since they often get ignored and lost in the stdout bloat.

    Note that LAYOUT can often be chosen for architectural reasons and might interfere with the ability to use the auto masktable.

  • It appears to me that gen_auto_mask_table would be run on every re-submission, even though the result should not change. This could potentially be stored in the output, right? (Maybe this is what you meant by memoization.) If I am wrong about this and it is stored and reused, then please disregard.

  • On that note, could the static MOM_mask_table and generated MOM_auto_mask_table be merged into some common file (or "format")? Seems like the information ought to be similar.

  • I don't think there's any problem with keeping these subroutines in the MOM_domains module. Moving to MOM_domains_infra would seem incorrect, since there are no framework-dependent operations here. And if there's a way to move them without circular dependencies, we can do it in the future.

  • No thoughts yet on converting ibuf, r_extreme, etc into parameters, expectially if you feel the current values are already well-tuned. If it does ever happen, I'd probably recommend storing them in a parameter block to avoid bloat to MOM_parameter_doc.all.

  • The 23% speedup: Does it refer to fewer CPU cycles? Or an actual reduction of runtime? In other words, if you had defined the layout and preprocessed this land mask, would you have gotten the same result?

  • See the inline comments on dimensionality, I have a feeling you will need to address this at some point.

Unfortunately I am short on time right now, but these are my thoughts for the moment. Feel free to act on them as you wish 😄.

Also, this has absolutely no bearing on the PR, but I like using ib/ie (as ibegin/end) in place of is/ie (as istart/end). Using is as a variable drives me crazy!

@alperaltuntas
Copy link
Member Author

Thanks, @marshallward!

I couldn't locate your inline comments, but here are my quick responses to your bulletpoints above.

am guessing that this ignores LAYOUT and IO_LAYOUT if the AUTO_MASKTABLE is turned on, meaning that it would grab npes from the launcher (mpirun, srun, etc.) What happens if LAYOUT is set and there's a conflict? Should there be an error? I would discourage a WARNING, since they often get ignored and lost in the stdout bloat.

Right, LAYOUT and IO_LAYOUT are ignored when AUTO_MASKTABLE is on, in which case npes is grabbed via MOM_coms_infra:: num_PEs. And, LAYOUT is auto-determined at runtime to maximize the number of eliminated land blocks. Similarly, IO_LAYOUT is determined at runtime if AUTO_IO_LAYOUT_FAC parameter is specified. Otherwise it's set to 1,1. It's a good idea to throw an error when there is a discrepancy. Will do.

It appears to me that gen_auto_mask_table would be run on every re-submission, even though the result should not change. This could potentially be stored in the output, right? (Maybe this is what you meant by memoization.) If I am wrong about this and it is stored and reused, then please disregard.

Right, gen_auto_mask_table is run on every re-submission if AUTO_MASKTABLE remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.

On that note, could the static MOM_mask_table and generated MOM_auto_mask_table be merged into some common file (or "format")? Seems like the information ought to be similar.

Indeed, MOM_auto_mask_table has the same format as MOM_mask_table. And, from FMS's point of view, there is no difference between the two.

I don't think there's any problem with keeping these subroutines in the MOM_domains module. Moving to MOM_domains_infra would seem incorrect, since there are no framework-dependent operations here. And if there's a way to move them without circular dependencies, we can do it in the future.

Sounds good!

The 23% speedup: Does it refer to fewer CPU cycles? Or an actual reduction of runtime?

It refers to the reduction of runtime when compared to a run with no (static or automated) masking. So, it just shows the impact of masking on the wallclock runtime.

In other words, if you had defined the layout and preprocessed this land mask, would you have gotten the same result?

Correct.

See the inline comments on dimensionality

Unfortunately, I can't see your inline comments for some reason.

Using is as a variable drives me crazy!

I agree!

logical :: tripolar_N !< A flag indicating whether there is n. tripolar connectivity
integer, intent(in) :: npes !< The desired number of active PEs.
type(param_file_type), intent(in) :: param_file !< A structure to parse for run-time parameters
character(len=128), intent(in) :: inputdir !< INPUTDIR parameter>
Copy link

@marshallward marshallward Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> typo on the end?

"example of mask_table masks out 2 processors, (1,2) and (3,6), out of the 24 "//&
"in a 4x6 layout: \n 2\n 4,6\n 1,2\n 3,6\n", default="MOM_mask_table", &
layoutParam=.true.)
endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description string is repeated, could it be saved to a variable?

call get_param(param_file, mdl, 'AUTO_MASKTABLE', auto_mask_table, &
"Turn on automatic mask table generation to eliminate land blocks.", &
default=.false., layoutParam=.true.)
endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be called every time? Could the generated mask_table potentially be saved and reused on subsequent runs?

@@ -1936,7 +1936,7 @@ subroutine get_global_shape(domain, niglobal, njglobal)
njglobal = domain%njglobal
end subroutine get_global_shape

!> Get the array ranges in one dimension for the divisions of a global index space
!> Get the array ranges in one dimension for the divisions of a global index space (alternative to compute_extent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but if you can be bothered...

A better description of compute_block_extent vs compute_extent would be nice here, although I would struggle to write one myself. From what I could tell, compute_extent is much more complex (and presumably safer).

enddo
call close_file(file_ascii)

call MOM_error(NOTE, "Wrote an auto-generated mask table at "//trim(filename)//".")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might end up becoming more spam to stdout, which we are trying to reduce. Maybe only print this in debug mode?


! Read in bathymetric depth.
D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere.
call read_field(topo_filepath, trim(topo_varname), D, start=(/1, 1/), nread=n_global, no_domain=.true.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use modern array syntax? start=[1,1]

Also see comments above WRT scale=....

D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere.
call read_field(topo_filepath, trim(topo_varname), D, start=(/1, 1/), nread=n_global, no_domain=.true.)

allocate( mask(nx+2*ibuf, ny+2*jbuf), source=0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after allocate(

character(len=:), allocatable, intent(in) :: filename !< Mask table file path (to be auto-generated.)
integer, dimension(2), intent(out) :: layout !< The generated layout of PEs (incl. masked blocks)
!local
real, dimension(n_global(1), n_global(2)) :: D ! Bathymetric depth (to be read in from TOPO_FILE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs dimensions ([m], or maybe [Z]; see below)

real :: min_depth ! The minimum ocean depth in the same units as D
real :: mask_depth ! The depth shallower than which to mask a point as land.
real :: glob_ocn_frac ! ratio of ocean points to total number of points
real :: r_p ! aspect ratio for division count p.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should also have dimensions ([m] or [nondim]).

ny = n_global(2)

! Read in bathymetric depth.
D(:,:) = -9.0e30 ! Initializing to a very large negative depth (tall mountains) everywhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this current form, I believe this should not be redimensionalized from m to Z, since TOPO_FILE will also be in meters.

However, I suspect that the "correct" way to do this is to use read_field(..., scale=GV%m_to_Z...) (or something like that) so that topo_field comes in with [Z] units, and the whole calculation is done in rescaled units.

I also reckon that MOM_read_data should be used in place of read_field, to avoid the explicit reference to MOM_IO_infra. This might also help sort out any potential issues around rotational testing.

If the output from read_field/MOM_read_data is rescaled, then the D max value should also be rescaled. (Bob can set his 2**N to very high values in his testing.)

@marshallward
Copy link

Sorry about that, I thought my overall comment was in the report. I've just submitted it.

Right, gen_auto_mask_table is run on every re-submission if AUTO_MASKTABLE remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.

If the runtime is this small, then perhaps it's not too important. (I believe I misinterpreted a comment somewhere.)

It might be meaningful to not enable the automasking if a MOM_mask_table is present and enabled in MOM_input. I don't have strong feelings about this, but maybe others do.

Aside from the inline comments (which have now been submitted), I think this looks good.

- Dimensionalize topographic depth variables used to determine cell masks in auto masktable routine.
- Raise error if the user provided PE layout is inconsistent with auto masktable generation.
- Save the masktable parameter description to a string variable to avoid repetition.
- Fix typos, whitespaces, use modern array syntax.
@alperaltuntas
Copy link
Member Author

@gustavo-marques This PR is ready to be reviewed and merged. I believe @marshallward is working on a fix for the failing macOS tests, so I suppose we can ignore those CI failures for now.

Due to poor handling of floating point in HDF5 1.14.3, it is currently
not possible to use floating point exceptions (FPEs) whenever this
version is present.

The GitHub Actions CI nodes would randomly select either 1.14.2 or
1.14.3, and would raise an FPE error if 1.14.3 was selected.
Additionally, the homebrew installation does not provide a clean method
for selecting a different version of HDF5.

Thus, for now we disable FPEs in the MacOS testing, and hope to catch
any legitimate FP errors in the Ubuntu version.  We will restore these
tests as soon as this has been fixed in an easily-accessible version of
HDF5.

As part of this PR, I have also moved the FCFLAGS configuration to the
platform specific Actions files, allowing for independent compiler
configuration for each platform.
@gustavo-marques gustavo-marques merged commit ab3b0aa into dev/ncar Dec 20, 2023
17 checks passed
@alperaltuntas alperaltuntas deleted the lbe branch August 27, 2024 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants