Bootstrapping and `Dataset` updates #10

denehoffman · 2024-10-31T18:34:51Z

This PR mainly addresses methods to resample Events via a bootstrap to allow for more accurate error estimation. A bootstrapped Dataset contains the same number of events as the original, but resamples the original with replacement, meaning some events might be duplicated and some might be missing entirely. This simulates the process of collecting new data, and by redoing a fit with these new bootstrapped data multiple times, we can more accurately represent how uncertainties accumulate in the fitting process.

Additionally, this PR reorganizes the way data is stored, wrapping each Event in an Arc. This way, bootstrapped and binned Datasets don't actually use up as much memory as the original and only have to store references to the original Events. I removed the open_binned function in favor of a Dataset::bin_by method.

This PR also switches the benchmark CI for Codspeed and adds a data-related benchmark.

… to run multiple randomized iterations for each fit

…e directory structure

…pying

…ith method on `Dataset` Since `Event`s are now wrapped in `Arc`, there's no need to open a `Dataset` multiple times, we can just make new `Dataset`s by referencing the events in the original. We keep the `Arc` wrapper on `Dataset` to allow them to be quickly copied (rather than increasing the reference count for each `Event`).

This also adds a bootstrap to the unbinned fit which should yield more accurate uncertainties

codecov · 2024-10-31T18:36:10Z

Codecov Report

Attention: Patch coverage is 0% with 166 lines in your changes missing coverage. Please review.

Project coverage is 10.81%. Comparing base (50cd28f) to head (f075d8e).
Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
src/data.rs	0.00%	150 Missing ⚠️
src/python.rs	0.00%	16 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
+ Coverage   10.36%   10.81%   +0.45%     
==========================================
  Files          15       15              
  Lines        3935     3771     -164     
  Branches     3935     3771     -164     
==========================================
  Hits          408      408              
+ Misses       3527     3363     -164

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codspeed-hq · 2024-10-31T18:42:26Z

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 2 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

kmatrix benchmark (nll) (44.5 ms)
open benchmark (77.5 ms)

It's not wrong, it's just missing error estimation and the total unbinned fit.

denehoffman added 10 commits October 30, 2024 14:50

style: refactor data loading code into a shared function

0ccf3fa

get bin edges from laddu rather than matplotlib

d23e457

feat: add method to resample datasets (bootstrapping)

9645ee4

feat: update example_1 to use bootstrapping for binned fit errors and…

3e3403c

… to run multiple randomized iterations for each fit

docs: update plot and add output txt file for example_1 and reorganiz…

83d0df9

…e directory structure

feat: add benchmark for opening datasets

b298502

feat: wrap Events inside Datasets in Arc to reduce bootstrap co…

099973b

…pying

ci: switch to Codspeed for benchmarking

09ec6cc

feat: update example_1 with new binning code

7cd54c6

This also adds a bootstrap to the unbinned fit which should yield more accurate uncertainties

denehoffman added 2 commits October 31, 2024 14:44

fix: remove AmpTools version of example_1

5fe6bbc

It's not wrong, it's just missing error estimation and the total unbinned fit.

feat: add logging to example_1

f075d8e

denehoffman merged commit b8156fe into main Oct 31, 2024
19 checks passed

denehoffman deleted the bootstrapping branch October 31, 2024 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrapping and `Dataset` updates #10

Bootstrapping and `Dataset` updates #10

denehoffman commented Oct 31, 2024

codecov bot commented Oct 31, 2024 •

edited

Loading

codspeed-hq bot commented Oct 31, 2024 •

edited

Loading

Detected benchmarks

Bootstrapping and Dataset updates #10

Bootstrapping and Dataset updates #10

Conversation

denehoffman commented Oct 31, 2024

codecov bot commented Oct 31, 2024 • edited Loading

Codecov Report

codspeed-hq bot commented Oct 31, 2024 • edited Loading

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

Detected benchmarks

Bootstrapping and `Dataset` updates #10

Bootstrapping and `Dataset` updates #10

codecov bot commented Oct 31, 2024 •

edited

Loading

codspeed-hq bot commented Oct 31, 2024 •

edited

Loading