-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some efficiency savings for pycbc_fit_sngls_over_multiparam #4957
Conversation
n_templates = len(nabove) | ||
rang = numpy.arange(0, n_templates) | ||
|
||
nabove_smoothed = numpy.zeros_like(parvals[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This preallocation saves some time, writing to indexes of the array rather than appending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also on this line, why is parvals[0]
being used specifically if we have other variables for the number of templates?
I feel someone else should review this, given my involvement in creating this. |
nabove_smoothed.append(smoothed_tuple[0]) | ||
alpha_smoothed.append(smoothed_tuple[1]) | ||
ntotal_smoothed.append(smoothed_tuple[2]) | ||
smoothed_tuple = smooth( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so here one could do eg smooth_vals[i, :] = smooth(...)
rather than having to get a tuple and assign 3 things separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, up to a couple of requests to deal with apparently redundant num_templates variables, and to shorten the code / reduce repetition by bundling the 3 outputs of smoothing together.
invalphan_sort = invalphan[par_sort] | ||
ntotal_sort = ntotal[par_sort] | ||
# Preallocate memory for *param_vals[0]-sorted* smoothing results | ||
nabove_smoothed = numpy.zeros(num_templates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was envisaging that these separate arrays would go away in favour of the smoothed_vals. Do we need to keep them here?
nabove_smoothed = numpy.array(nabove_smoothed)[unsort] | ||
alpha_smoothed = numpy.array(alpha_smoothed)[unsort] | ||
ntotal_smoothed = numpy.array(ntotal_smoothed)[unsort] | ||
smoothed_vals[:,0] = nabove_smoothed[unsort] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these assignments in triplicate still seem redundant, can we go directly from smoothed_tuple to smoothed_vals ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we did, as the un-sorting was needed for this part of the calculation, but that can be done on the n*3 array in one go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments, I think we're nearly there
Note that I can't be entirely certain that this will translate nicely onto the v23 release branch, but I'm fairly sure it will |
Add in a few efficiency savings for
pycbc_fit_sngls_over_multiparam
Standard information about the request
This is an efficiency update
This change affects the offline search, the live search
This change changes nothing for the output
This change follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines
Motivation
I was looking at
pycbc_fit_sngls_over_multiparam
thinking it would be a good place to learn / implement some GPU efficiency, but then we (mainly Ian) saw that there were some huge efficiency savings that could be made fairly quicklyContents
Biggest saving:
Additional savings:
numpy.average()
to revert to the more-efficientnumpy.mean()
method rather than using equal weights.Testing performed
Note that for this testing, I set the loop over templates to break at 10,000 iterations.
This meant that for the "old" testing, I needed to implement the pre-allocated array saving described above in order to get outputs to match
Differences in output files
All equivalent files' datasets match using equality (i.e.
numpy.array_equal(dataset1, dataset2)
), dtypes match and attributes match.Hashes of files do not match - I am unsure how this can be the case though.
Profiling
smooth_tophat (default) smoothing:
Summary:
The "old" profiling graph shows that something which is not in a function is the dominant cost. I found that by setting the
smooth()
function to not be called this drastically improved performance. We found that the problem was in the arguments being passed to the function, not the function itself.time
output shows that the 'new' method takes approx 1/35 of the 'old' time.There are also a bunch of page faults and voluntary context switches in the old version, which I don't know what they are, but it sounds bad.
Profiling graphs:
new and old
time -v
output:Old:
New:
distance_weighted smoothing
Summary:
In this, there is the extra cost of generating a normal PDF for every template, this means that the savings aren't quite as significant, but still noteworthy
Profiling graphs:
new
and old
time -v
output:old:
new:
n_closest smoothing
This is essentially unchanged, as the major savings part of the PR is not affecting this code path.
For completeness, profiles can be found at this link under
fit_over_n_closest_{new,old}.{txt,png}
for new/old andtime
output/ profiling graph respectively