Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add floating point exception flags for BOPT=g. #148

Draft
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

theurich
Copy link
Member

@theurich theurich commented Jun 13, 2023

This PR originates from work by @danrosen25. The following compiler flags are being added to the ESMF_F90OPTFLAG_G and ESMF_CXXPTFLAG_G variables in respective build_rules.mk files:
* ifort -fpe0
* icpc -fp-trap=common
* gfortran -ffpe-trap=zero,overflow
* nagfor -ieee=stop
* nvhpc -Ktrap=fp

One outstanding issue we have noticed by activating the above flags is that in many of the ESMF regression tests that use RouteHandle Store() calls of any flavor (Regrid/Redist/SMM) do not initialize the source data before the Store() call. In a way this is on purpose, because user code might also do the same, and therefore we should test this scenario. However, this then leads to issues within the SMMStore() where actual SMM() is executed during the pre-compute step, but now has uninitialized data on the right hand side.

This issue must be resolved as part of this PR. Notice that the associated failures are system dependent, and can be non-deterministic as they depend on the specifics of the uninitialized data!

Here are the tests that crashed (intel + mpt)

CRASHED: mpt/g: src/Infrastructure/Array/tests/ESMF_ArrayCreateGetUTest.F90
CRASHED: mpt/g: src/Infrastructure/Array/tests/ESMF_ArrayRedistUTest.F90
CRASHED: mpt/g: src/Infrastructure/Field/tests/ESMF_FieldRegridCsrv2ndUTest.F90
CRASHED: mpt/g: src/Infrastructure/Field/tests/ESMF_FieldRegridCsrvUTest.F90
CRASHED: mpt/g: src/Infrastructure/Field/tests/ESMF_FieldRegridUTest.F90
CRASHED: mpt/g: src/Infrastructure/Field/tests/ESMF_FieldRegridXGSMMUTest.F90
CRASHED: mpt/g: src/Infrastructure/Field/tests/ESMF_FieldRegridXGUTest.F90
CRASHED: mpt/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleCrGetUTest.F90
CRASHED: mpt/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
CRASHED: mpt/g: src/Infrastructure/Grid/tests/ESMF_GridCreateUTest.F90
CRASHED: mpt/g: src/Infrastructure/LocalArray/tests/ESMF_LocalArrayUTest.F90
CRASHED: mpt/g: src/Infrastructure/Route/tests/ESMF_RouteHandleUTest.F90
CRASHED: mpt/g: src/Infrastructure/Util/tests/ESMF_FortranWordsizeUTest.F90
CRASHED: mpt/g: src/Infrastructure/Util/tests/ESMF_TypeKindGetUTest.F90
CRASHED: mpt/g: src/Infrastructure/VM/tests/ESMF_VMSendNbVMRecvNbUTest.F90
CRASHED: mpt/g: src/Infrastructure/VM/tests/ESMF_VMSendVMRecvUTest.F90
CRASHED: mpt/g: src/Infrastructure/XGrid/tests/ESMF_XGridMaskingUTest.F90
CRASHED: mpt/g: src/Infrastructure/XGrid/tests/ESMF_XGridUTest.F90
CRASHED: mpt/g: src/Superstructure/PreESMFMod/tests/ESMF_FileRegridUTest.F90
CRASHED: mpt/g: src/Superstructure/PreESMFMod/tests/ESMF_RegridWeightGenUTest.F90

RESEARCH NOTES

Intel
For the 'intel' combos add -fpe0 to the ESMF_F90OPTFLAG_G in the respective build_rules.mk.
For the 'intel' combos add -fp-trap=common to the ESMF_CXXPTFLAG_G in the respective build_rules.mk

GCC
For the 'gfortran' and 'gfortranclang' combos add -ffpe-trap=zero,overflow to the ESMF_F90OPTFLAG_G in the respective build_rules.mk. Notice that I don't think we can add the "invalid" option here, because (as far as I could figure out) gfortran for GCC < v12 will otherwise abort even on quite NANs!
For the g++ part, let's not make any changes here!

NAG
For the 'nag' combo add -ieee=stop to the ESMF_F90OPTFLAG_G in the respective build_rules.mk.

NVHPC
Since this is a run-time environment variable, let me work on that part via the esmf-test-scripts. I will do some testing. Thanks for doing the research!

Let's not worry about any of the other compilers for 8.5.0.

    * ifort -fpe0
    * icpc -fp-trap=common
    * gfortran -ffpe-trap=zero,overflow
    * nagfor -ieee=stop
    * nvhpc -Ktrap=fp
Copy link
Member

@danrosen25 danrosen25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. However I should have been a co-author on the commit 😆.

@theurich
Copy link
Member Author

@danrosen25 really it should be you owning the PR. Maybe we should close it, and you can open it?

Also, it cannot move forward until we have figured out what to do with the uninitialized src data we use in our regression testing.

@danrosen25 danrosen25 marked this pull request as draft June 13, 2023 17:00
@danrosen25 danrosen25 self-assigned this Sep 6, 2023
@danrosen25
Copy link
Member

@theurich
I'm having trouble finding failures. I ran systems tests using -ffpe-trap=zero,overflow,invalid -finit-real=snan and none failed. Next I plan to turn exhaustive testing on and see what fails. Do you have an example of a test failure?

@danrosen25
Copy link
Member

@theurich
The only failure I saw during exhaustive unit tests was related to ESMF exceptions testing. The deepThrow routine threw an exception and it caused the executable to sigabort. It's probably related to compiling with gcc-9. Anyway, I can't find anything crashing with the -finit-real=snan flag so if you can't point me to something that can replicate the crash that would help.

@theurich
Copy link
Member Author

theurich commented Sep 7, 2023

@danrosen25
I think the behavior you get is system-dependent. Probably best to rebase this branch against develop, and then have a few nightlies test the branch. I expect we will see failures on a few system/compiler combos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants