Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rates-based statistics to be modular #4608

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4dde1a7
fixes to the snr-like statistics
GarethCabournDavies Jan 18, 2024
ff7731c
Move exp_fit statistics into a modular framework
GarethCabournDavies Jan 24, 2024
8cd3c21
remove unused statistic
GarethCabournDavies Jan 24, 2024
7173b48
use keyword:value rather than feature for alpha below
GarethCabournDavies Jan 24, 2024
1d2058f
Codeclimate complaints
GarethCabournDavies Jan 24, 2024
666f5fe
use new-style statistic in CI
GarethCabournDavies Jan 24, 2024
175e43d
fix in case teh fit_by_templte is not stored in the fit_over file
GarethCabournDavies Jan 25, 2024
3ad042d
remove testing change
GarethCabournDavies Jan 26, 2024
47a2145
fix usage of parse_statistic_feature_options in test
GarethCabournDavies Jan 26, 2024
7b95712
Docstrings for various functions
GarethCabournDavies Feb 5, 2024
1e0bd5c
Add back in the changes from #4603
GarethCabournDavies Feb 5, 2024
9e3c0f5
Add description of the statistics to the documentation
GarethCabournDavies Feb 6, 2024
3a4bce4
fix error if passing keywords which need to be floats, rework the alp…
GarethCabournDavies Feb 6, 2024
102203e
Allow sngl_ranking keywords to actually be used
GarethCabournDavies Feb 6, 2024
eb69bd8
CC
GarethCabournDavies Feb 6, 2024
d715d30
try this
GarethCabournDavies Feb 6, 2024
36ee4ab
maybe
GarethCabournDavies Feb 6, 2024
9e82184
single-word titles
GarethCabournDavies Feb 16, 2024
674a196
Fix a bunch of line-too-long errors
GarethCabournDavies Feb 16, 2024
94add88
lines-too-long
GarethCabournDavies Mar 11, 2024
022997d
These tables are annoying me
GarethCabournDavies Mar 11, 2024
6800377
CC again
GarethCabournDavies Mar 11, 2024
745eaf1
Fix errors in the tables
GarethCabournDavies Mar 12, 2024
26992ae
run black on pycbc/events/stat.py
GarethCabournDavies Mar 25, 2024
d61b997
Start getting recent stat changes into module
GarethCabournDavies Apr 18, 2024
2bb9a20
fixes post-rebase
GarethCabournDavies Apr 18, 2024
9564e5d
run black on pycbc/events/ranking in order to try and get codeclimate…
GarethCabournDavies May 20, 2024
e99b941
Revert "run black on pycbc/events/ranking in order to try and get cod…
GarethCabournDavies May 20, 2024
e9f3c1f
minor fixes
GarethCabournDavies Jun 28, 2024
13489d7
Bring up to date with recent changes
GarethCabournDavies Nov 8, 2024
d81a83e
Use new ranking statistics in CI
GarethCabournDavies Nov 13, 2024
4db495a
Start working on getting modular statistic into Live usage
GarethCabournDavies Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 8 additions & 15 deletions bin/all_sky_search/pycbc_sngls_findtrigs
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,6 @@ if args.cluster_window is not None:
logging.info('Clustering events over %s s window within each template',
args.cluster_window)

extra_kwargs = {}
for inputstr in args.statistic_keywords:
try:
key, value = inputstr.split(':')
extra_kwargs[key] = value
except ValueError:
err_txt = "--statistic-keywords must take input in the " \
"form KWARG1:VALUE1 KWARG2:VALUE2 KWARG3:VALUE3 ... " \
"Received {}".format(args.statistic_keywords)
raise ValueError(err_txt)

loudest_keep_vals = args.loudest_keep_values.strip('[]').split(',')

threshes = []
Expand Down Expand Up @@ -172,6 +161,11 @@ for i, tnum in enumerate(template_ids):
"Calculating statistic in template %d out of %d",
i, len(template_ids),
)
else:
logging.debug(
"Calculating statistic in template %d out of %d",
i, len(template_ids),
)
tids_uncut = trigs.set_template(tnum)

trigger_keep_ids = cuts.apply_trigger_cuts(trigs, trigger_cut_dict,
Expand All @@ -188,9 +182,8 @@ for i, tnum in enumerate(template_ids):

# Stat class instance to calculate the ranking statistic
sds = rank_method.single(trigs)[trigger_keep_ids]
stat_t = rank_method.rank_stat_single((ifo, sds),
**extra_kwargs)
trigger_times = sds['end_time']
stat_t = rank_method.rank_stat_single((ifo, sds))
trigger_times = trigs['end_time'][:][trigger_keep_ids]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some singles objects don't have the end time included

if args.cluster_window is not None:
cid = coinc.cluster_over_time(stat_t, trigger_times,
args.cluster_window)
Expand All @@ -206,7 +199,7 @@ for i, tnum in enumerate(template_ids):
# Perform decimation
dec_facs = np.ones_like(template_ids_all)
stat_all = np.array(stat_all)
template_ids_all = np.array(template_ids_all)
template_ids_all = np.array(template_ids_all, dtype=int)
trigger_ids_all = np.array(trigger_ids_all)
trigger_times_all = np.array(trigger_times_all)

Expand Down
3 changes: 0 additions & 3 deletions bin/live/pycbc_live_single_significance_fits
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,6 @@ pycbc.init_logging(args.verbose)
sngls_io.verify_live_significance_trigger_pruning_options(args, parser)
sngls_io.verify_live_significance_duration_bin_options(args, parser)

stat_kwargs = stat.parse_statistic_keywords_opt(args.statistic_keywords)

duration_bin_edges = sngls_io.duration_bins_from_cli(args)
logging.info(
"Duration bin edges: %s",
Expand Down Expand Up @@ -192,7 +190,6 @@ for counter, filename in enumerate(files):
sds = rank_method[ifo].single(triggers_cut)
sngls_value = rank_method[ifo].rank_stat_single(
(ifo, sds),
**stat_kwargs
)

triggers_cut['stat'] = sngls_value
Expand Down
3 changes: 0 additions & 3 deletions bin/minifollowups/pycbc_sngl_minifollowup
Original file line number Diff line number Diff line change
Expand Up @@ -238,13 +238,10 @@ if args.maximum_duration is not None:
logging.info('Finding loudest clustered events')
rank_method = stat.get_statistic_from_opts(args, [args.instrument])

extra_kwargs = stat.parse_statistic_keywords_opt(args.statistic_keywords)

trigs.mask_to_n_loudest_clustered_events(
rank_method,
n_loudest=num_events,
cluster_window=args.cluster_window,
statistic_kwargs=extra_kwargs,
)

times = trigs.end_time
Expand Down
64 changes: 64 additions & 0 deletions docs/workflow/pycbc_make_offline_search_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -306,9 +306,73 @@ Specify the name of the channel you want to run the inspiral analysis over for H

[coinc]
coinc-threshold = 0.000
ranking-statistic = exp_fit
sngl-ranking = newsnr_sgveto_psdvar
statistic-features = sensitive_volume normalize_fit_rate phasetd
statistic-keywords = alpha_below_thresh:6 sngl_ranking_min_expected_psdvar:0.7

Here we are doing exact match coincidence. So we take the light travel time between detectors and look for triggers which are coincident within this time window. The threshold defines if you want to extend the window.

How triggers are ranked is defined by the ranking-statistic, sngl-ranking, statistic-features and statistic-keywords options.
- ``sngl-ranking`` = The ranking used for single-detector triggers, this is generally a re-weighting of the SNR.
- ``ranking-statistic`` = How the triggers from a set of detectors are ranked in order to calculate significance. This will take the form of an snr-like combination (``quadsum``, ``phasetd``, ``exp_fit_csnr``), or a log-rates-like statistic, ``exp_fit``. See Ranking Statistic table below for the options.
- ``statistic-features`` = If using ranking-statistic ``exp_fit``, then these are the features to add or subtract from the ranking statistic. These are described in the Statistic Features table below.
- ``statistic-keywords`` = Some statistics require keywords to modify the behavior of the statistic in certain situations. Keywords affecting the sngl-ranking calculation are also given here, starting with ``sngl_ranking_``. These are described in the Statistic Keywords table below.
- ``statistic-files`` = Files to be used in the statistic calculation, of particular note here are the files needed for DQ and KDE reranking.

.. list-table:: Ranking Statistic
:widths: 25 75
:header-rows: 1

* - Statistic name
- Description
* - ``quadsum``
- The quadrature sum of triggers in each detector in the triggered network. ``sngl_ranking_only`` can also be given and is exactly equivalent.
* - ``phasetd``
- The same as ``quadsum``, but reweighted by the coincident parameters.
* - ``exp_fit_csnr``
- This is a reworking of the exponential fit designed to resemble network SNR. Uses a monotonic function of the negative log noise rate density which approximates combined sngl-ranking for coincs with similar newsnr in each ifo
* - ``exp_fit``
- The ratio of signal-to-noise rates in the triggered network of detectors. The trigger density at a given sngl-ranking is approximated for each template, and this is combined for the triggered network.

.. list-table:: Statistic Features
:widths: 25 75
:header-rows: 1

* - Feature name
- Description
* - ``phasetd``
- Use a histogram of expected phase and time differences, and amplitude ratio, for signals to determine a factor to be added for the signal rate.
* - ``sensitive_volume``
- Signal rate is expected to be proportional to the cube of the sensitive distance. This feature adds a factor of :math:`log(\sigma^3)` minus a benchmark value, to make this zero in many cases.
* - ``normalize_fit_rate``
- Normalise the exponential fits to use a rate rather than an absolute count of triggers. This means that statistics should be comparable over differently-sized analyses.
* - ``dq``
- Apply a reweighting factor according to the rates of triggers during data-quality flags vs the rate outside this. Must supply a reranking file using ``statistic-files`` for each detector, with stat attribute '{detector}-dq_stat_info'
* - ``kde``
- Use a file to re-rank according to the signal and density rates calculated using a KDE approach. Must supply two reranking files using ``statistic-files`` with stat attributes 'signal-kde_file' and 'template-kde_file' respectively.
* - ``chirp_mass``
- Apply a factor of :math:`log((M_c / 20) ^{11 / 3})` to the statistic. This makes the signal rate uniform over chirp mass, as this factor cancels out the power of -11 / 3 caused by differences in the density of template placement.

.. list-table:: Statistic Keywords
:widths: 25 75
:header-rows: 1

* - Keyword
- Description
* - ``benchmark_lograte``
- This is a numerical factor to be subtracted from the log rate ratio in order to alter the dynamic range. Default -14.6.
* - ``minimum_statistic_cutoff``
- Cutoff for the statistic in order to avoid underflowing and very small statistic values. Default -30.
* - ``alpha_below_thresh``
- The fit coefficient (alpha) below the fit threshold (defined in the fit_by_template jobs) will be replaced by a standard value. This is as below this threshold, Gaussian noise can dominate over the glitch response that dominates above the threshold, and rates will be underestimated, boosting quiet things in noisy templates. For Gaussian noise, this will be approximately 6 (the default). To use whatever the fit value is, supply alpha_below_thresh:None.
* - ``reference_ifos``
- If using the ``sensitive_volume`` feature, these are the detectors used to determine the benchmark value by which the sensitive volume is compared. We use the median sensitive volume in the network of detectors supplied. Default H1,L1.
* - ``max_chirp_mass``
- If using the ``chirp_mass`` feature, this chirp mass defines a maximum weighting which can be applied to the statistic.
* - ``sngl_ranking_*``
- This is used to provide the keyword arguments to functions in `the events.ranking module <https://pycbc.org/pycbc/latest/html/_modules/pycbc/events/ranking.html>`_. For example, to use a different psdvar threshold in the newsnr_sgveto_psdvar_threshold function, we would use ``sngl_ranking_psd_var_val_threshold:10``.

::

[coinc-full]
Expand Down
9 changes: 8 additions & 1 deletion examples/live/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,14 @@ python -m mpi4py `which pycbc_live` \
--output-path output \
--day-hour-output-prefix \
--sngl-ranking newsnr_sgveto_psdvar_threshold \
--ranking-statistic phasetd_exp_fit_fgbg_norm \
--ranking-statistic \
exp_fit \
--statistic-features \
phasetd \
sensitive_volume \
normalize_fit_rate \
--statistic-keywords \
alpha_below_thresh:6 \
--statistic-files \
statHL.hdf \
statHV.hdf \
Expand Down
4 changes: 3 additions & 1 deletion examples/search/analysis.ini
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,9 @@ smoothing-width = 0.4
analyze =

[coinc&sngls]
ranking-statistic = phasetd_exp_fit_fgbg_norm
ranking-statistic = exp_fit
statistic-features = phasetd sensitive_volume normalize_fit_rate
statistic-keywords = alpha_below_thresh:6
sngl-ranking = newsnr_sgveto_psdvar
randomize-template-order =
statistic-files = ${resolve:./statHL.hdf} ${resolve:./statLV.hdf} ${resolve:./statHV.hdf} ${resolve:./statHLV.hdf}
Expand Down
4 changes: 4 additions & 0 deletions examples/search/plotting.ini
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ non-coinc-time-only =
vetoed-time-only =
ranking-statistic = ${sngls|ranking-statistic}
statistic-files = ${sngls|statistic-files}
statistic-features = ${sngls|statistic-features}
statistic-keywords = ${sngls|statistic-keywords}

[injection_minifollowup]
ifar-threshold = 1
Expand All @@ -50,6 +52,8 @@ sngl-ranking = newsnr_sgveto_psdvar

[page_snglinfo-vetoed]
ranking-statistic = ${sngls|ranking-statistic}
statistic-features = ${sngls|statistic-features}
statistic-keywords = ${sngls|statistic-keywords}

[single_template_plot]

Expand Down
6 changes: 5 additions & 1 deletion pycbc/events/coinc.py
Original file line number Diff line number Diff line change
Expand Up @@ -973,12 +973,16 @@ def from_cli(cls, args, num_templates, analysis_chunk, ifos):

# Allow None inputs
stat_files = args.statistic_files or []
stat_features = args.statistic_features or []
stat_keywords = args.statistic_keywords or []

# flatten the list of lists of filenames to a single list (may be empty)
stat_files = sum(stat_files, [])

kwargs = pycbcstat.parse_statistic_keywords_opt(stat_keywords)
kwargs = pycbcstat.parse_statistic_feature_options(
stat_features,
stat_keywords,
)

return cls(num_templates, analysis_chunk,
args.ranking_statistic,
Expand Down
Loading
Loading