improve OpenMP usage #2

jeffhammond · 2017-02-07T20:00:46Z

reduced the number of fork-join per iteration

omp parallel for does a fork-join, which can get expensive at large thread-counts. when this construct is used many times in a function, it should be replaced with a single omp parallel around multiple omp for.

The code previously found between parallel regions is assumed to require serialization and uses pragma omp single for protection. single is used instead of master to allow the first encountering thread in the team to do the work, rather than waiting for the master thread.

Technically, but never in practice, single requires MPI_THREAD_SERIALIZED instead of MPI_THREAD_FUNNELED. master only requires MPI_THREAD_FUNNELED.

It is possible that single nowait is sufficient, in which case a few barriers can be eliminated. (aside: master does not imply a barrier).

pragma omp simd wherever pragma ivdep is used

The OpenMP standard defines pragma omp simd semantics identical to the convention meaning of the non-standard pragma ivdep.

The Intel compiler treats pragma omp simd as an assertion rather than a hint so if SIMD isn't appropriate, this pragma should be conditionalized using preprocessor (C99/C++11 _Pragma being the O(1) solution here).

1) reduced the number of fork-join per iteration 'omp parallel for' does a fork-join, which can get expensive at large thread-counts. when this construct is used many times in a function, it should be replaced with a single 'omp parallel' around multiple 'omp for'. the code previously found between parallel regions is assumed to require serialization and uses 'pragma omp single' for protectin. 'single' is used instead of 'master' to allow the first encountering thread in the team to do the work, rather than waiting for the master thread. technically, but never in practice, 'single' requires MPI_THREAD_SERIALZIED instead of MPI_THREAD_FUNNELED. 'master' only requires MPI_THREAD_FUNNELED. it is possible that 'single nowait' is sufficient, in which case a few barriers can be eliminated. (aside: 'master' does not imply a barrier). 2) pragma omp simd wherever pragma ivdep is used the OpenMP standard defines 'pragma omp simd' semantics identical to the convention meaning of the non-standard 'pragma ivdep'. Intel compiler treats 'pragma omp simd' as an assertion rather than a hint so if SIMD isn't appropriate, this pragma should be conditionalized using preprocessor (C99/C++11 _Pragma being the O(1) solution here).

jeffhammond · 2017-02-07T20:01:18Z

I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.

brobey · 2017-02-08T05:58:43Z

Jeff, Have you run Intel Thread Inspector on the code to be sure there are no race conditions? Pushing the parallel region up higher makes it more likely that a race condition occurs. From: Jeff Hammond <[email protected]> Reply-To: lanl/PENNANT <[email protected]> Date: Tuesday, February 7, 2017 at 1:01 PM To: lanl/PENNANT <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [lanl/PENNANT] improved OpenMP usage (#2) I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#2 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABPgLApGo_en5Moy9qLztiRq9lItmY4tks5raM2PgaJpZM4L59qE>.

jeffhammond · 2017-02-08T17:27:24Z

No, I haven't used that tool before. The side-by-side changes were simple enough that I felt I could reason about the race conditions from the OpenMP semantics.

I'll figure out Intel Thread Checker and try that. I hope it is not making any assumptions about x86 consistency in its analysis...

cferenba · 2017-02-09T04:13:41Z

Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).

brobey · 2017-02-09T04:26:47Z

You can run a gui version or the command line version of the Intel thread inspector. From our test script: inspxe-cl -collect=ti2 -result-dir ${TEST_NAME} -- ${executable} ${TEST_NAME}.in >& ${TEST_NAME}_openmp.out || true inspxe-cl -report problems -result-dir ${TEST_NAME} The gui is launched with inspxe-gui and is a little more intuitive to use. You need to use the intel compiler and the intel tools (intel-performance-tools/2017.1.024). It will flag race conditions that might not appear until later. From: Charles Ferenbaugh <[email protected]> Reply-To: lanl/PENNANT <[email protected]> Date: Wednesday, February 8, 2017 at 9:13 PM To: lanl/PENNANT <[email protected]> Cc: Robert Robey <[email protected]>, Comment <[email protected]> Subject: Re: [lanl/PENNANT] improve OpenMP usage (#2) Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely). — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#2 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABPgLDX091BKHe2QmZdk4aL3pIN7G3m0ks5rapJ1gaJpZM4L59qE>.

jeffhammond changed the title ~~improved OpenMP usage~~ improve OpenMP usage Feb 7, 2017

fuse a pair of parallel regions

09288b8

jeffhammond mentioned this pull request Feb 10, 2017

inspxe-cl: 4 Data race problem(s) detected #3

Open

omor1 mentioned this pull request Jul 17, 2019

MPI+OpenMP should use MPI_Init_thread #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve OpenMP usage #2

improve OpenMP usage #2

jeffhammond commented Feb 7, 2017 •

edited

Loading

jeffhammond commented Feb 7, 2017

brobey commented Feb 8, 2017 via email

jeffhammond commented Feb 8, 2017

cferenba commented Feb 9, 2017

brobey commented Feb 9, 2017 via email

improve OpenMP usage #2

Are you sure you want to change the base?

improve OpenMP usage #2

Conversation

jeffhammond commented Feb 7, 2017 • edited Loading

jeffhammond commented Feb 7, 2017

brobey commented Feb 8, 2017 via email

jeffhammond commented Feb 8, 2017

cferenba commented Feb 9, 2017

brobey commented Feb 9, 2017 via email

jeffhammond commented Feb 7, 2017 •

edited

Loading