WIP: PERF: Use NeighborhoodRange for metric image computation #98

thewtex · 2018-10-17T21:54:26Z

Depends on:

http://review.source.kitware.com/#/c/23795/4

Currently 4X slower

Closes #97

Depends on: http://review.source.kitware.com/#/c/23795/4 Currently 4X slower

thewtex · 2018-10-17T22:20:43Z

Experimenting with the proposed neighborhood range constant pixel access policy. 🔬 Currently, this seems to be quite a bit slower, though. 🐌

phcerdan · 2018-10-18T00:23:17Z

Hey @thewtex, what script/test are you using for testing the performance?

thewtex · 2018-10-18T02:32:50Z

@phcerdan itkBlockMatchingBayesianRegularizationDisplacementCalculatorTest.

N-Dekker · 2018-10-18T10:17:17Z

include/itkBlockMatchingNormalizedCrossCorrelationNeighborhoodIteratorMetricImageFilter.hxx

+        using RangeType = Experimental::ShapedImageNeighborhoodRange<const MetricImageType,
+          Experimental::ConstantBoundaryImageNeighborhoodPixelAccessPolicy<const MetricImageType> >;
+        const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets };
+        const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets };


Is it really intended that the kernel is set on a new location (denomIt.GetIndex()) with every iteration?

N-Dekker · 2018-10-18T10:23:46Z

include/itkBlockMatchingNormalizedCrossCorrelationNeighborhoodIteratorMetricImageFilter.hxx

+        const RangeType movingRange{ *movingMinusMean, denomIt.GetIndex(), offsets };
+        const RangeType kernelRange{ *fixedMinusMean, denomIt.GetIndex(), offsets };
+
+        const MetricImagePixelType normXcorr = std::inner_product( movingRange.begin(), movingRange.end(), kernelRange.begin(), 0.0 ) / denomIt.Get();


I have to admit I could not get optimal performance when using std::inner_product at https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/Core/ImageFunction/include/itkGaussianDerivativeImageFunction.hxx#L219, so instead I just manually wrote the inner product calculation in a few lines of code. In this case, it might look as follows:

auto normXcorr = NumericTraits<MetricImagePixelType>::Zero; auto movingRangeIterator = movingRange.cbegin(); for (const auto& kernelValue : kernelRange) { normXcorr += kernelValue * (*movingRangeIterator); ++movingRangeIterator; } normXcorr /= denomIt.Get();

N-Dekker · 2018-10-18T10:41:04Z

This use case might suggest adding ShapedImageNeighborhoodRange::SetLocation(const IndexType&), so that the ranges could be declared and initialized outside the loop, and then have their location updated inside the loop, by calling range.SetLocation(denomIt.GetIndex()) for each range.

Update: The member function is added with patch set 5: http://review.source.kitware.com/#/c/23795/5

thewtex · 2018-10-18T17:36:42Z

Great ideas @N-Dekker ! 💡

And thanks for adding the SetLocation patch to move reconstruction out of the loop. I will try that and manually expanding the inner product. This will probably happen next week.

On a related note, I was examining the patch and the existing code, and the construction of the proxy in operater*():

https://github.com/InsightSoftwareConsortium/ITK/blob/a5d40b033f809b8d54e0bda9bd1eb5e7613faf69/Modules/Core/Common/include/itkShapedImageNeighborhoodRange.h#L348-L351

Do you think the construction here has performance implications and would it be avoidable?

N-Dekker · 2018-10-18T21:51:35Z

I was examining the patch and the existing code, and the construction of the proxy in operater*():
Do you think the construction here has performance implications and would it be avoidable?

The use of a proxy as return type of operater*() is hard to avoid, for a constant-boundary iterator. Because operater*() cannot return a 'real' reference to an existing pixel from the image, when trying to retrieve a pixel value outside the image boundaries.

However, looking at http://review.source.kitware.com/#/c/23795/5/Modules/Core/Common/include/itkConstantBoundaryImageNeighborhoodPixelAccessPolicy.h my intuition tells me that it might be possible to squeeze some CPU cycles out of the private helper function CalculatePixelIndexValue, which is called with every operater*(). CalculatePixelIndexValue has an if statement and a return inside its for loop which might have performance implications... but I'm not sure! The compiler optimizer might be smarter than my intuition!

Update: With patch set 7 I adjusted the private helper function CalculatePixelIndexValue, removing the return from inside its for loop, and indeed, it appears to significantly improve the performance! Using VS2017 64-bit Release, I observed a reduction of more than 30% of the run-time duration!

N-Dekker · 2018-10-21T10:46:44Z

Experimenting with the proposed neighborhood range constant pixel access policy. 🔬 Currently, this seems to be quite a bit slower, though. 🐌

@thewtex Which compiler do you use? (Release build, optimized for speed?) I'm asking because I observed significant PERF differences between VS2015 and VS2017 (both 64-bit Release), while running the example code that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27

thewtex · 2018-10-21T22:18:24Z

@N-Dekker this was Clang / MinSizeRel build. I will try other compilers and other build configurations, along with your example code!

N-Dekker · 2018-10-22T11:42:28Z

@thewtex I'm interested to see your results! Using VS2017 Release, I found that NeighborhoodRange based iteration was almost 3x faster than the old-school itk::ConstNeighborhoodIterator. Unfortunately, I have bad news... or is it good news?!? With the ConstNeighborhoodIterator patch http://review.source.kitware.com/#/c/23813/ that I just posted, the old-school iteration actually beats the NeighborhoodRange!!! At least when using VS2017 Release, on the example that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27 However, there are still some good reasons to consider NeighborhoodRange:

The range class supports very fast change of location to an arbitrary position, especially when we add ShapedImageNeighborhoodRange::SetLocation (while the old itk::ConstNeighborhoodIterator::SetLocation is clearly "not optimized for speed").
It is thread-safe (wheras the old itk::ConstNeighborhoodIterator has unsafe mutable data).
It has a modern C++ interface.

thewtex · 2018-11-07T18:34:39Z

Performance results discussed here:

https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/29

WIP: PERF: Use NeighborhoodRange for metric image computation

705a715

Depends on: http://review.source.kitware.com/#/c/23795/4 Currently 4X slower

N-Dekker reviewed Oct 18, 2018

View reviewed changes

N-Dekker mentioned this pull request Nov 13, 2018

ENH: Added policy for constant NeighborhoodRange values outside image InsightSoftwareConsortium/ITK#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: PERF: Use NeighborhoodRange for metric image computation #98

WIP: PERF: Use NeighborhoodRange for metric image computation #98

thewtex commented Oct 17, 2018

thewtex commented Oct 17, 2018

phcerdan commented Oct 18, 2018

thewtex commented Oct 18, 2018

N-Dekker Oct 18, 2018

N-Dekker Oct 18, 2018

N-Dekker commented Oct 18, 2018 •

edited

Loading

thewtex commented Oct 18, 2018

N-Dekker commented Oct 18, 2018 •

edited

Loading

N-Dekker commented Oct 21, 2018

thewtex commented Oct 21, 2018

N-Dekker commented Oct 22, 2018 •

edited

Loading

thewtex commented Nov 7, 2018

WIP: PERF: Use NeighborhoodRange for metric image computation #98

Are you sure you want to change the base?

WIP: PERF: Use NeighborhoodRange for metric image computation #98

Conversation

thewtex commented Oct 17, 2018

thewtex commented Oct 17, 2018

phcerdan commented Oct 18, 2018

thewtex commented Oct 18, 2018

N-Dekker Oct 18, 2018

Choose a reason for hiding this comment

N-Dekker Oct 18, 2018

Choose a reason for hiding this comment

N-Dekker commented Oct 18, 2018 • edited Loading

thewtex commented Oct 18, 2018

N-Dekker commented Oct 18, 2018 • edited Loading

N-Dekker commented Oct 21, 2018

thewtex commented Oct 21, 2018

N-Dekker commented Oct 22, 2018 • edited Loading

thewtex commented Nov 7, 2018

N-Dekker commented Oct 18, 2018 •

edited

Loading

N-Dekker commented Oct 18, 2018 •

edited

Loading

N-Dekker commented Oct 22, 2018 •

edited

Loading