-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add duplicated memory to force_lj #21
base: master
Are you sure you want to change the base?
Conversation
Here is some performance data for ExaMiniMD on a 4 core Linux box
|
@stanmoore1 any comments on differences in memory usage for OpenMP and Pthreads vs. using atomics? |
There is some memory overhead because the force array is duplicated. The force array is the second largest data structure, after the neighbor list, however typically each atom has many neighbors, so the neighbor list is much larger than the force array, and typically we only duplicate 8 or less times. |
Also the numbers for data duplication may be a little better because we fixed this bug: #22. |
What's the status of this? |
Don't merge yet. This is an example of using the new duplicated memory feature in Kokkos as an alternative to thread atomics, see kokkos/kokkos#1225 and kokkos/kokkos#825. For OpenMP or PThreads it uses a duplicated non-atomic view, for CUDA it still uses a non-duplicated atomic view, and for Serial it uses a non-duplicated, non-atomic view. On my Linux box with OpenMP it does give speedup over atomics. The API/naming may change a bit before it is formally released into Kokkos.
@crtrott