Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrapping and Dataset updates #10

Merged
merged 12 commits into from
Oct 31, 2024
Merged

Bootstrapping and Dataset updates #10

merged 12 commits into from
Oct 31, 2024

Conversation

denehoffman
Copy link
Owner

This PR mainly addresses methods to resample Events via a bootstrap to allow for more accurate error estimation. A bootstrapped Dataset contains the same number of events as the original, but resamples the original with replacement, meaning some events might be duplicated and some might be missing entirely. This simulates the process of collecting new data, and by redoing a fit with these new bootstrapped data multiple times, we can more accurately represent how uncertainties accumulate in the fitting process.

Additionally, this PR reorganizes the way data is stored, wrapping each Event in an Arc. This way, bootstrapped and binned Datasets don't actually use up as much memory as the original and only have to store references to the original Events. I removed the open_binned function in favor of a Dataset::bin_by method.

This PR also switches the benchmark CI for Codspeed and adds a data-related benchmark.

… to run multiple randomized iterations for each fit
…ith method on `Dataset`

Since `Event`s are now wrapped in `Arc`, there's no need to open a `Dataset` multiple times, we can just make new `Dataset`s by referencing the events in the original. We keep the `Arc` wrapper on `Dataset` to allow them to be quickly copied (rather than increasing the reference count for each `Event`).
This also adds a bootstrap to the unbinned fit which should yield more accurate uncertainties
Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 0% with 166 lines in your changes missing coverage. Please review.

Project coverage is 10.81%. Comparing base (50cd28f) to head (f075d8e).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
src/data.rs 0.00% 150 Missing ⚠️
src/python.rs 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
+ Coverage   10.36%   10.81%   +0.45%     
==========================================
  Files          15       15              
  Lines        3935     3771     -164     
  Branches     3935     3771     -164     
==========================================
  Hits          408      408              
+ Misses       3527     3363     -164     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Oct 31, 2024

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 2 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

  • kmatrix benchmark (nll) (44.5 ms)
  • open benchmark (77.5 ms)

It's not wrong, it's just missing error estimation and the total unbinned fit.
@denehoffman denehoffman merged commit b8156fe into main Oct 31, 2024
19 checks passed
@denehoffman denehoffman deleted the bootstrapping branch October 31, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant