-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: Improve performance of CrsMatrix::copyAndPermute #13587
Labels
Comments
Timings
|
jhux2
changed the title
PackageName: General Summary of the Enhancement
Tpetra: Improve performance of CrsMatrix::copyAndPermute
Nov 11, 2024
Current timing snapshot. Original code ("capsg_M" is the timer around CrsMatrix::copyAndPermute)
new code
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Enhancement - Improve performance of CrsMatrix::copyAndPermute
In evaluating the Jacobian in a Panzer mini-EM test, a significant fraction of time is spent in copyAndPermute. Timing calipers show replaceGlobalValues (actually, combineGlobalValues called with REPLACE) and getGlobalRowCopy are two culprits. Additional time is spent in getLocal/GlobalElement hash table lookups.
Serial speedups have been obtained by writing a batch/array version of getGlobalElement (which amortizes the 'if(contiguous)' and other if-stmts), rewriting replaceGlobalValuesImpl and getGlobalRowCopy. Timing results to follow in an update to this issue.
This bottom-up approach led to a top-down investigation, leading to using Kokkos to parallelize the copyAndPermute main loop over rows of the source matrix.
The text was updated successfully, but these errors were encountered: