-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add a conditional definition of AccThreadSeq for the Omp4 backend #157
Conversation
This doesn't harm, but it doesn't help my test case either. Before:
After:
While using alpaka directly:
|
Thanks for testing @fwyzard . |
Ah, the |
…_KERNEL_OPTI This is now consistent with the Omp2Blocks backend, fixes alpaka-group#156
333e4a1
to
bcbc071
Compare
Updated the PR, maybe now it works. |
It does:
Thank you. |
I will check this PR soon. I need to check if swapping is allowed for the OMP4 backend. Swapping block size and elements is more than less a workaround to be allowed to use alpaka backends which restrict the block to one thread. The OMP4 backend do not have this restriction. |
I see... then I guess there is an underlying problem in the OpenMP 4 backend in alpaka, that leads to very poor performance. |
I will close the PR for now as, although it provided a workaround for the issue initiating it, there are other ways to achieve it and the PR does not align well with the behavior for other accelerators. The reasoning is given in the discussion of #156. |
This is now consistent with the Omp2Blocks backend, fixes #156.