Update rates-based statistics to be modular #4608

GarethCabournDavies · 2024-01-24T12:04:02Z

Overview of changes

The exponential fit statistics are all similar, with simple factors added or subtracted, as discussed in #4594

I have refactored the ExpFitStatistic to be able to use the different features using a --statistic-features option. These then all use exp_fit as the --ranking-statistic option

The available features are:

Feature	Description	Notes
`phasetd`	Apply a factor of how the phase, time and amplitude differences match up to what is expected for signals.
`kde`	Apply a factor according to a kernel density estimate of the ratio of signal and noise distributions for each template.
`dq`	Apply a factor according to any data quality channel information.
`chirp_mass`	Apply a reweighting according to the chirp mass of the template.
`sensitive_volume`	Apply a factor of the log of sensitive volume (compared to a median for the template). This means that the statistic takes into account any changes in the detector sensitivity and that, e.g., we expect to see more events in the HL network than HV coincidences.
`normalize_fit_rate`	Normalize the rates fits by the analysis time. This is needed so that the statistic is comparable over chunks of different lengths.	This was done for all `exp_fit` statistics, but is implemented here explicitly so that the `exp_fit_csnr` statistic can reuse the `lognoiserate` function from the `ExpFitStatistic`

I have also removed the different treatment of triggers with sngl_ranking below threshold; this is now required explicitly as --statistic-keywords alpha_below_threshold:6. Again this was so that the exp_fit_csnr statistic can reuse the lognoiserate function from the ExpFitStatistic

In addition, there are some minor changes as well to fix some of the statistics which didn't work at all on previous master. There is a (just for reference) PR at #4607 to show these.

Testing

I have tested all existing (and working) statistics to check that given the appropriate features, the output remains identical.

The testing is done against the codes in #4607, so that we can test statistics against what they should be, rather than what they are.

Initial tests with a very small fraction of the bank have shown that the SNR-like statistics output files have identical hashes. The exp_fit statistics outputs are all the same to within a numpy.isclose test, i.e. ~1e-6 difference for values O(1). But I will add the results of more stringent testing here.

Statistic	New statistic	Features	Keywords	pycbc_sngls_findtrigs max stat difference	pycbc_coinc_findtrigs max stat difference
quadsum	quadsum			File hash the same	File hash the same
single_ranking_only	single_ranking_only	--	--	File hash the same	File hash the same
phasetd	phasetd	--	--	File hash the same	File hash the same
exp_fit_csnr	exp_fit_csnr	--	--	File hash the same	File hash the same
phasetd_exp_fit_fgbg_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate`	`alpha_below_thresh:6`	9.5e-7	3.8e-6
phasetd_exp_fit_fgbg_bbh_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate chirp_mass`	`alpha_below_thresh:6`	1.9e-6	3.8e-6
phasetd_exp_fit_fgbg_kde	exp_fit	`phasetd sensitive_volume normalize_fit_rate kde`	`alpha_below_thresh:6`	1.9e-6	3.8e-6
dq_phasetd_exp_fit_fgbg_norm	exp_fit	`phasetd sensitive_volume normalize_fit_rate dq`	`alpha_below_thresh:6`	9.5e-7	3.8e-6
dq_phasetd_exp_fit_fgbg_kde	exp_fit	`phasetd sensitive_volume normalize_fit_rate dq kde`	`alpha_below_thresh:6`	1.9e-6	3.8e-6

GarethCabournDavies · 2024-01-25T11:33:08Z

pycbc/events/stat.py

+    for feature in opts.statistic_features:
+        if feature not in _allowed_statistic_features:
+            err_msg = f"--statistic-feature {feature} not recognised"
+            raise NotImplementedError(err_msg)


This shouldn't actually happen due to the argparse choices, but safety is best

GarethCabournDavies · 2024-01-25T11:39:51Z

bin/all_sky_search/pycbc_sngls_findtrigs

-                                          **extra_kwargs)
-    trigger_times = sds['end_time']
+    stat_t = rank_method.rank_stat_single((ifo, sds))
+    trigger_times = trigs['end_time'][:][trigger_keep_ids]


Some singles objects don't have the end time included

GarethCabournDavies · 2024-01-25T11:40:32Z

pycbc/events/stat.py

@@ -33,9 +33,18 @@
 from .eventmgr_cython import logsignalrateinternals_computepsignalbins
 from .eventmgr_cython import logsignalrateinternals_compute2detrate

+_allowed_statistic_features = [


I'm not sure where is best to describe each feature here to be honest

GarethCabournDavies · 2024-01-25T11:43:52Z

pycbc/events/stat.py

        # Assume best case scenario and use maximum signal rate
        s1 -= 2. * self.hist_max
        s1[s1 < 0] = 0
        return s1 ** 0.5


-class ExpFitStatistic(QuadratureSumStatistic):
+class ExpFitStatistic(PhaseTDStatistic):


Subclassing PhaseTDStatistic here in order to get the phasetd stuff in init

GarethCabournDavies · 2024-02-06T09:16:54Z

The numbers in the comparison table have been updated

I am sure it is not a coincidence that the error in the coincs is double that of the singles, but I think a ~4e-6 difference is not too important given the dynamic range of the statistic.

GarethCabournDavies · 2024-02-06T10:33:56Z

I am adding a description of the statistics to the docs - I am writing it in the pycbc_make_offline_search_workflow documentation at the moment, but that can be moved if requested.

GarethCabournDavies · 2024-02-06T14:48:42Z

Note that I found and fixed a bug in the way that sngl_ranking_ keywords are handled and passed through to the ranking module

GarethCabournDavies · 2024-05-21T08:16:43Z

I thought it best to check the memory usage, and for pycbc_coinc_findtrigs with 1/140 of the bank and the dq_phasetd_exp_fit_fgbg_norm (and modular equivalent) statistic, we see:
NEW:

	User time (seconds): 3586.64
	System time (seconds): 197.45
	Maximum resident set size (kbytes): 432688

OLD:

	System time (seconds): 184.80
	Maximum resident set size (kbytes): 414844

For the same statistic with pycbc_sngls_findtrigs and 1/1400 of the bank, we get:
NEW:

	User time (seconds): 74.30
	System time (seconds): 13.22
	Maximum resident set size (kbytes): 277556

OLD:

	User time (seconds): 58.19
	System time (seconds): 13.82
	Maximum resident set size (kbytes): 279236

The increase in user time seems to be because of a lower % of CPU (27 vs 75)

So basically this looks like it doesn't change anything with regard to performance, as I would expect

maxtrevor · 2024-05-21T17:23:34Z

Noting here that it was suggested on today's call that we should wait to merge this until after creating the new PyCBC Live branch intended for the rest of O4

…ha_below_thresh keyword

… to be quiet

…eclimate to be quiet" This reverts commit 4f082ea.

GarethCabournDavies · 2024-12-13T14:31:38Z

I think this is ready for review now. All tests are passing

I haven't been able to test how this works with Live as well as I would like to; there seems to be a problem with my testing environment

GarethCabournDavies added the offline search label Jan 24, 2024

GarethCabournDavies self-assigned this Jan 24, 2024

GarethCabournDavies commented Jan 25, 2024

View reviewed changes

GarethCabournDavies mentioned this pull request Jan 29, 2024

REFERENCE: fixes to the some broken statistics #4607

Closed

GarethCabournDavies force-pushed the modular_stat branch from 3b1dd80 to bfa528a Compare February 5, 2024 16:28

GarethCabournDavies requested a review from spxiwh February 6, 2024 10:34

GarethCabournDavies force-pushed the modular_stat branch from 5f7cd92 to 9c9bbee Compare March 11, 2024 13:27

GarethCabournDavies force-pushed the modular_stat branch 2 times, most recently from 2af01a3 to 2d7148a Compare April 18, 2024 13:18

GarethCabournDavies force-pushed the modular_stat branch from 2d7148a to e0214a1 Compare May 20, 2024 10:25

GarethCabournDavies force-pushed the modular_stat branch from 55fb556 to 33678e4 Compare June 28, 2024 12:45

GarethCabournDavies mentioned this pull request Jul 16, 2024

Add mechanism for re-loading the statistic files in live #4816

Merged

1 task

GarethCabournDavies added 13 commits November 1, 2024 09:00

fixes to the snr-like statistics

4dde1a7

Move exp_fit statistics into a modular framework

ff7731c

remove unused statistic

8cd3c21

use keyword:value rather than feature for alpha below

7173b48

Codeclimate complaints

1d2058f

use new-style statistic in CI

666f5fe

fix in case teh fit_by_templte is not stored in the fit_over file

175e43d

remove testing change

3ad042d

fix usage of parse_statistic_feature_options in test

47a2145

Docstrings for various functions

7b95712

Add back in the changes from gwastro#4603

1e0bd5c

Add description of the statistics to the documentation

9e3c0f5

fix error if passing keywords which need to be floats, rework the alp…

3a4bce4

…ha_below_thresh keyword

GarethCabournDavies added 17 commits November 1, 2024 09:21

Allow sngl_ranking keywords to actually be used

102203e

CC

eb69bd8

try this

d715d30

maybe

36ee4ab

single-word titles

9e82184

Fix a bunch of line-too-long errors

674a196

lines-too-long

94add88

These tables are annoying me

022997d

CC again

6800377

Fix errors in the tables

745eaf1

run black on pycbc/events/stat.py

26992ae

Start getting recent stat changes into module

d61b997

fixes post-rebase

2bb9a20

run black on pycbc/events/ranking in order to try and get codeclimate…

9564e5d

… to be quiet

Revert "run black on pycbc/events/ranking in order to try and get cod…

e99b941

…eclimate to be quiet" This reverts commit 4f082ea.

minor fixes

e9f3c1f

Bring up to date with recent changes

13489d7

GarethCabournDavies force-pushed the modular_stat branch from 33678e4 to 13489d7 Compare November 8, 2024 13:48

GarethCabournDavies added 2 commits November 13, 2024 07:45

Use new ranking statistics in CI

d81a83e

Start working on getting modular statistic into Live usage

4db495a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update rates-based statistics to be modular #4608

Update rates-based statistics to be modular #4608

GarethCabournDavies commented Jan 24, 2024 •

edited

Loading

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies Jan 25, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented May 21, 2024 •

edited

Loading

maxtrevor commented May 21, 2024 •

edited

Loading

GarethCabournDavies commented Dec 13, 2024

Update rates-based statistics to be modular #4608

Are you sure you want to change the base?

Update rates-based statistics to be modular #4608

Conversation

GarethCabournDavies commented Jan 24, 2024 • edited Loading

Overview of changes

Testing

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies Jan 25, 2024

Choose a reason for hiding this comment

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented Feb 6, 2024

GarethCabournDavies commented May 21, 2024 • edited Loading

maxtrevor commented May 21, 2024 • edited Loading

GarethCabournDavies commented Dec 13, 2024

GarethCabournDavies commented Jan 24, 2024 •

edited

Loading

GarethCabournDavies commented May 21, 2024 •

edited

Loading

maxtrevor commented May 21, 2024 •

edited

Loading