Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/block offsets #550

Merged
merged 23 commits into from
Dec 7, 2023
Merged

Feature/block offsets #550

merged 23 commits into from
Dec 7, 2023

Conversation

sjoerd-bouma
Copy link
Collaborator

(Finally) - added a module that can add or remove 'block offsets' as seen in the voltage traces recorded by RNO-G. The fit is done by assuming a repeating rect shape, and fitting this in the Fourier domain, outside the band where the antennas/amplifiers are expected to be sensitive to real signals.

I also added an additional channelParameter that stores the added/removed block offsets; maybe this can be used for the RNOG data reader in the future to ensure that the correction remains reversible. We can also consider adding (some version of) this module in the RNO-G data reader, while we're still waiting for the integration of the equivalent code on the DAQ side, although it's probably worth making sure first that the performance impact of the fitting and/or Fourier transforms is not too high.

Copy link
Collaborator

@cg-laser cg-laser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me but someone with RNO-G data expertise should also have a look and check/test it.

Copy link
Collaborator

@shallmann shallmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think it can be merged, since it is standalone... not sure why the doc checks fail.

shallmann
shallmann previously approved these changes Sep 6, 2023
- 'fit' (default): fit the block offsets with a minimizer
- 'approximate' : use the first guess from the out-of-band component,
without any fitting (slightly faster)
- 'stored': use the block offsets already stored in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you simulate the block offsets would you not simulated it on a sim_station? Hence, the station would not have this parameter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'stored' option is probably superfluous, but I thought maybe one might want to undo the block offsets that have already been removed by e.g. the external calibration, or allow for some other fit to compute the block offsets.

In terms of simulation, right now we don't simulate the block offsets anyway, and because e.g. the timings are different I'm not sure how straightforward it would be to simulate them for the sim_station (rather than for the station, which is what I've been doing so far). Even if we end up doing this for the sim_station, I'd be inclined to leave it to the user to retrieve the simulated parameter from the sim_station to keep a cleaner separation between the two.

filtered_trace = fft.freq2time(filtered_trace_fft, sampling_rate)

# obtain guesses for block offsets
a_guess = np.array([
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not relevant but using np.split seems to be a bit faster

In [17]: %timeit np.mean(np.split(filtered_trace, n_blocks), axis=1)
19 µs ± 849 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [18]: %timeit a_guess = np.array([np.mean(filtered_trace[i*block_size:(i+1)*block_size]) for i in range(n_blocks)])
32.4 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever check what the difference is when you use the unfiltered trace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, implemented (though I agree I don't think this was the bottleneck). Using the unfiltered trace is a little bit faster but gives a ~4 times larger RMS difference between true and fitted block offset

@fschlueter
Copy link
Collaborator

Damn, I did not know that I had to submit my comments/review. Sorry Sjoerd that those come so late (they were written a week ago)

@sjoerd-bouma
Copy link
Collaborator Author

I fixed (hopefully) Felix's comments, and additionally added the fitter to the mattak datareader. A quick comparison of how fast the different options are:

from NuRadioReco.modules.io.RNO_G.readRNOGDataMattak import readRNOGData
import time
reader = readRNOGData()
results = dict()
for mode in ['none', 'median', 'approximate', 'fit']:
    reader.begin('/home/sb/Python/scratch/data/inbox/station11/run1100/', apply_baseline_correction=mode)
    t0 = time.time()
    n_evts = 0
    max_n_evts = 1e4 if mode != 'fit' else 25

    for evt in reader.run():
        n_evts += 1
        if n_evts >= max_n_evts:
            break

    results[mode] = (time.time() - t0) / n_evts
    
for mode, t in results.items():
    print(f"{mode:12s} : {1e3*t:-9.3f} ms / event")

Giving

none         :     1.888 ms / event
median       :     4.409 ms / event
approximate  :     4.824 ms / event
fit          :   768.660 ms / event

Both 'approximate' and 'median' (= the previous implementation in the reader) are probably good enough for most purposes.

fschlueter
fschlueter previously approved these changes Sep 25, 2023
@fschlueter
Copy link
Collaborator

What happens if you relax the tol argument or introduce a max number of iterations within curve_fit? I assume the fit should converge stabil and fast

@sjoerd-bouma
Copy link
Collaborator Author

What happens if you relax the tol argument or introduce a max number of iterations within curve_fit? I assume the fit should converge stabil and fast

I'm definitely spending too much time on this now.... but of course it's a good idea, so I checked using toy Gaussian offsets of 10% on top of a (bandpass-filtered) Gaussian trace.
image
Basically, after 2 iterations the block offsets are already reduced by 2 orders of magnitude (RMS is normalized to the offset size), which is probably enough in most cases unless the offsets are huge, so I've made this the default maxiter.

I've also removed/deprecated the old baseline correction, and made the bandpass-filtered 'approximate' mode the default, seeing as it was about a factor 4 better without a noticeable decrease in performance. Let me know if you disagree.

return wfs - baseline_traces

blockoffsetfitter = channelBlockOffsets()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That that here in between functions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it a private attribute of the readRNOGData class, having readRNOGDataMattak.blockoffsetfitter showing up in IDE autocompletion is probably unnecessary clutter.

@fschlueter
Copy link
Collaborator

What happens if you relax the tol argument or introduce a max number of iterations within curve_fit? I assume the fit should converge stabil and fast

I'm definitely spending too much time on this now.... but of course it's a good idea, so I checked using toy Gaussian offsets of 10% on top of a (bandpass-filtered) Gaussian trace. image Basically, after 2 iterations the block offsets are already reduced by 2 orders of magnitude (RMS is normalized to the offset size), which is probably enough in most cases unless the offsets are huge, so I've made this the default maxiter.

I've also removed/deprecated the old baseline correction, and made the bandpass-filtered 'approximate' mode the default, seeing as it was about a factor 4 better without a noticeable decrease in performance. Let me know if you disagree.

Hi, you tested it with 10% amplitudes of what, a baseline ADC of ~ 1800? I am a bit worried that maxiter=2 is a bit low. Not that an accuracy of 1e-2 is not sufficient but what happens if a fit does not converge as fast as in your test? Looking at your plot: Why don't we set it to ~5. It still gives us a time boost of over one order of magnitude right?

@fschlueter
Copy link
Collaborator

Deprecating the old code is fine for me. However, I am still wondering a bit why it is not significant faster than the bandpass-filtered 'approximate' mode. After all this includes running an fft which should be quite time consuming compared to the other mathematical operations ?

@fschlueter
Copy link
Collaborator

PS: An improvement with a factor of 100 in performance is never a waist of time :)

@sjoerd-bouma
Copy link
Collaborator Author

Thanks Felix,
I've increased the number of iterations to 5 as suggested - the 99th percentile is now ~2% (was 4%)
image
The times are a bit different compared to the previous plot, I realize showing the time per channel after earlier providing times per event (= 24 channels) was a bit confusing. So the improvement is not a factor 100 but closer to a factor 10 in speed, unfortunately.

fschlueter
fschlueter previously approved these changes Nov 27, 2023
@sjoerd-bouma
Copy link
Collaborator Author

@fschlueter Sorry, created a merge conflict for myself by removing all trailing whitespaces in #557, I've now turned that feature off again in my IDE. Nothing else should have changed.

@fschlueter
Copy link
Collaborator

I am using a VS code plugin which should just remove ws in lines which I have modified anyway (its not working as promised but if you find one which works let me know)

@fschlueter fschlueter merged commit 778d5cd into develop Dec 7, 2023
9 checks passed
@fschlueter
Copy link
Collaborator

I just merged it. Did not feel like waiting longer ..

@anelles
Copy link
Collaborator

anelles commented Dec 7, 2023

Hmm, reading the history here, @fschlueter may just have avoided (but just barely :) ) the three strikes rule of only merging after 24 hours after final approval. It has been taking long enough, I concur.

@fschlueter
Copy link
Collaborator

Hmm, reading the history here, @fschlueter may just have avoided (but just barely :) ) the three strikes rule of only merging after 24 hours after final approval. It has been taking long enough, I concur.

😇

@anelles anelles deleted the feature/block_offsets branch February 14, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants