Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve OpenMP usage #2

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeffhammond
Copy link

@jeffhammond jeffhammond commented Feb 7, 2017

  1. reduced the number of fork-join per iteration

omp parallel for does a fork-join, which can get expensive at large thread-counts. when this construct is used many times in a function, it should be replaced with a single omp parallel around multiple omp for.

The code previously found between parallel regions is assumed to require serialization and uses pragma omp single for protection. single is used instead of master to allow the first encountering thread in the team to do the work, rather than waiting for the master thread.

Technically, but never in practice, single requires MPI_THREAD_SERIALIZED instead of MPI_THREAD_FUNNELED. master only requires MPI_THREAD_FUNNELED.

It is possible that single nowait is sufficient, in which case a few barriers can be eliminated. (aside: master does not imply a barrier).

  1. pragma omp simd wherever pragma ivdep is used

The OpenMP standard defines pragma omp simd semantics identical to the convention meaning of the non-standard pragma ivdep.

The Intel compiler treats pragma omp simd as an assertion rather than a hint so if SIMD isn't appropriate, this pragma should be conditionalized using preprocessor (C99/C++11 _Pragma being the O(1) solution here).

1) reduced the number of fork-join per iteration

'omp parallel for' does a fork-join, which can get expensive at large
thread-counts.  when this construct is used many times in a function, it
should be replaced with a single 'omp parallel' around multiple 'omp
for'.

the code previously found between parallel regions is assumed to
require serialization and uses 'pragma omp single' for protectin.
'single' is used instead of 'master' to allow the first encountering
thread in the team to do the work, rather than waiting for the master
thread.

technically, but never in practice, 'single' requires
MPI_THREAD_SERIALZIED instead of MPI_THREAD_FUNNELED.  'master' only
requires MPI_THREAD_FUNNELED.

it is possible that 'single nowait' is sufficient, in which case a few
barriers can be eliminated.  (aside: 'master' does not imply a barrier).

2) pragma omp simd wherever pragma ivdep is used

the OpenMP standard defines 'pragma omp simd' semantics identical to the
convention meaning of the non-standard 'pragma ivdep'.

Intel compiler treats 'pragma omp simd' as an assertion rather than a
hint so if SIMD isn't appropriate, this pragma should be conditionalized
using preprocessor (C99/C++11 _Pragma being the O(1) solution here).
@jeffhammond
Copy link
Author

I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.

@jeffhammond jeffhammond changed the title improved OpenMP usage improve OpenMP usage Feb 7, 2017
@brobey
Copy link
Member

brobey commented Feb 8, 2017 via email

@jeffhammond
Copy link
Author

No, I haven't used that tool before. The side-by-side changes were simple enough that I felt I could reason about the race conditions from the OpenMP semantics.

I'll figure out Intel Thread Checker and try that. I hope it is not making any assumptions about x86 consistency in its analysis...

@cferenba
Copy link
Member

cferenba commented Feb 9, 2017

Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).

@brobey
Copy link
Member

brobey commented Feb 9, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants