-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: PERF: Use NeighborhoodRange for metric image computation #98
base: master
Are you sure you want to change the base?
Conversation
Depends on: http://review.source.kitware.com/#/c/23795/4 Currently 4X slower
Hey @thewtex, what script/test are you using for testing the performance? |
@phcerdan |
using RangeType = Experimental::ShapedImageNeighborhoodRange<const MetricImageType, | ||
Experimental::ConstantBoundaryImageNeighborhoodPixelAccessPolicy<const MetricImageType> >; | ||
const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets }; | ||
const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really intended that the kernel is set on a new location (denomIt.GetIndex()) with every iteration?
const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets }; | ||
const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets }; | ||
|
||
const MetricImagePixelType normXcorr = std::inner_product( movingRange.begin(), movingRange.end(), kernelRange.begin(), 0.0 ) / denomIt.Get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I could not get optimal performance when using std::inner_product at https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/ImageFunction/include/itkGaussianDerivativeImageFunction.hxx#L219, so instead I just manually wrote the inner product calculation in a few lines of code. In this case, it might look as follows:
auto normXcorr = NumericTraits<MetricImagePixelType>::Zero;
auto movingRangeIterator = movingRange.cbegin();
for (const auto& kernelValue : kernelRange)
{
normXcorr += kernelValue * (*movingRangeIterator);
++movingRangeIterator;
}
normXcorr /= denomIt.Get();
This use case might suggest adding Update: The member function is added with patch set 5: http://review.source.kitware.com/#/c/23795/5 |
Great ideas @N-Dekker ! 💡 And thanks for adding the On a related note, I was examining the patch and the existing code, and the construction of the proxy in Do you think the construction here has performance implications and would it be avoidable? |
The use of a proxy as return type of However, looking at http://review.source.kitware.com/#/c/23795/5/Modules/Core/Common/include/itkConstantBoundaryImageNeighborhoodPixelAccessPolicy.h my intuition tells me that it might be possible to squeeze some CPU cycles out of the private helper function Update: With patch set 7 I adjusted the private helper function |
@thewtex Which compiler do you use? (Release build, optimized for speed?) I'm asking because I observed significant PERF differences between VS2015 and VS2017 (both 64-bit Release), while running the example code that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27 |
@N-Dekker this was Clang / MinSizeRel build. I will try other compilers and other build configurations, along with your example code! |
@thewtex I'm interested to see your results! Using VS2017 Release, I found that NeighborhoodRange based iteration was almost 3x faster than the old-school
|
Performance results discussed here: |
Depends on:
http://review.source.kitware.com/#/c/23795/4
Currently 4X slower
Closes #97