-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix extra allocations when benchmarking with no motion #483
Conversation
- Remove unnecessary indexing and transposing in BlochCPU.jl - New `get_mask` method (dispatch `t`) in Flow.jl
…h/KomaMRI.jl into fix-extra-allocations
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #483 +/- ##
==========================================
+ Coverage 89.86% 89.88% +0.01%
==========================================
Files 54 54
Lines 3050 3055 +5
==========================================
+ Hits 2741 2746 +5
Misses 309 309
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KomaMRI Benchmarks
Benchmark suite | Current: c53ac04 | Previous: 927de27 | Ratio |
---|---|---|---|
MRI Lab/Bloch/CPU/2 thread(s) |
226335431.5 ns |
224153374 ns |
1.01 |
MRI Lab/Bloch/CPU/4 thread(s) |
131397873 ns |
135808380 ns |
0.97 |
MRI Lab/Bloch/CPU/8 thread(s) |
139887517 ns |
149398614 ns |
0.94 |
MRI Lab/Bloch/CPU/1 thread(s) |
402131918 ns |
413189478 ns |
0.97 |
MRI Lab/Bloch/GPU/CUDA |
55012781 ns |
55925086 ns |
0.98 |
MRI Lab/Bloch/GPU/oneAPI |
495516018 ns |
496536737.5 ns |
1.00 |
MRI Lab/Bloch/GPU/Metal |
549471062.5 ns |
553973500 ns |
0.99 |
MRI Lab/Bloch/GPU/AMDGPU |
34639175 ns |
34739822 ns |
1.00 |
Slice Selection 3D/Bloch/CPU/2 thread(s) |
1026742878 ns |
1032666827.5 ns |
0.99 |
Slice Selection 3D/Bloch/CPU/4 thread(s) |
609839640 ns |
624017416 ns |
0.98 |
Slice Selection 3D/Bloch/CPU/8 thread(s) |
380207796 ns |
388957396 ns |
0.98 |
Slice Selection 3D/Bloch/CPU/1 thread(s) |
2246134204 ns |
2245739282 ns |
1.00 |
Slice Selection 3D/Bloch/GPU/CUDA |
101042475 ns |
101367770.5 ns |
1.00 |
Slice Selection 3D/Bloch/GPU/oneAPI |
637531750 ns |
636447326 ns |
1.00 |
Slice Selection 3D/Bloch/GPU/Metal |
554793625 ns |
554868916 ns |
1.00 |
Slice Selection 3D/Bloch/GPU/AMDGPU |
58987788.5 ns |
58876104 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
The line in BlochCPU.jl that was generating extra allocations in benchmarks (when no motion is involved) was:
Indexing
seq.t
inside the function call generates extra allocations. Even more if we convert it into a matrix ([seq_idx,:]
) and when we transpose it. This is now longer necessary sinceoutflow_spin_reset!
andget_spin_coords
admit both scalars and vectors. So doing this:and this:
Solves the problem