-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race condition in splitVertices #45656
Conversation
Add alpaka::syncBlockThreads(acc); at the end of the loop on the vertices to ensure that all threads are properly synchronised before resetting the shared memory. Clean up the kernel to use the SoA accessors and the cms::alpakatools utilities.
cms-bot internal usage |
type bugfix |
enable gpu |
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45656/41184 |
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
auto& __restrict__ data = pdata; | ||
auto& __restrict__ ws = pws; | ||
auto nt = ws.ntrks(); | ||
float const* __restrict__ zt = ws.zt(); | ||
float const* __restrict__ ezt2 = ws.ezt2(); | ||
float* __restrict__ zv = data.zv(); | ||
float* __restrict__ wv = data.wv(); | ||
float const* __restrict__ chi2 = data.chi2(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should no longer be needed, because the SoA accessors include the __restrict__
qualifier.
ALPAKA_ASSERT_ACC(zt); | ||
ALPAKA_ASSERT_ACC(wv); | ||
ALPAKA_ASSERT_ACC(chi2); | ||
ALPAKA_ASSERT_ACC(nn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are no longer needed, because the SoA View should guarantee that these are non-null.
I think all changes should be fine. Before merging this PR I would like to
|
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
@fwyzard , thanks for the fix ! Based on the check below, the HLT throughput is basically unaffected (same settings as in #45631 (comment), but a different Hilton node [*]). [*]
|
I tried to evaluate this with this script (which makes use of some of the error stream input files from #44923 (comment)) with and without this PR and I am seeing some (small) changes on CPU only:
when running on GPU differences seem to be more contained:
|
Interesting... I did no expect any changes to the CPU results, since on CPU the code runs single threaded 🤔 |
I've run a comparison of the HLT results using
and comparing
In all cases, I did not observe any differences in the HLT results running on CPU, and only small discrepancies running on GPU consistent with the usual discrepancies. |
|
#45655 vs reference:$ hltDiff -o reference_cpu1.root -n fix_splitVertices_140x_cpu1.root
Found 10000 matching events, out of which 82 have different HLT results
Events Accepted Gained Lost Other Trigger
10000 10 +9 -8 - Dataset_EventDisplay
10000 177 +24 -26 - Dataset_ExpressPhysics
10000 49 +10 -8 - Dataset_HLTMonitor
10000 5 +2 -5 - Dataset_ScoutingPFMonitor |
@cms-sw/reconstruction-l2 could you review and sign this PR and its backport, #45655 ? |
Comparisons are clean except for this one: Is that expected/understood? |
Mhm, I do not expect any differences from these changes (other than preventing a rare crash induced by the race condition being fixed), only the usual GPU vs GPU lack of reproducibility. |
I also do not observe any impact on the HLT timing, using a subset of the HLT menu that runs the Pixel, ECAL and HCAL+PF reconstruction on GPUs on every event. reference
#45655
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged. |
PR description:
Add
alpaka::syncBlockThreads(acc);
at the end of the loop on the vertices to ensure that all threads are properly synchronised before resetting the shared memory.
Clean up the kernel to use the SoA accessors and the
cms::alpakatools
utilities.PR validation:
Running the HLT with
compute-sanitizer --tool=racecheck
without this PR warns about a potential race condition insplitVertices()
.This PR fixes the warning: see #44923 (comment).
Running the current online HLT menu over 20k events on top of CMSSW 14.0.13-patch2 does not result in any changes to the HLT results (see #45656 (comment)) and performance on GPU (see #45656 (comment)).
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
To be backport to 14.0.x for data taking.