Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLL Terms #8

Merged
merged 32 commits into from
Oct 30, 2024
Merged

NLL Terms #8

merged 32 commits into from
Oct 30, 2024

Conversation

denehoffman
Copy link
Owner

Adds a feature which allows NLL-like terms to be added and multiplied together. This can effectively be used to fit the same model over multiple datasets or fit multiple models over multiple datasets simultaneously. This adds an interface very similar to the one which allows Amplitudes to be registered and formed into Expressions. This should also allow users to introduce their own likelihood terms, like a LASSO term for example.

Todos:

  1. A Python-side interface for making LikelihoodTerms.
  2. LASSO implementation
  3. Constants in LikelihoodExpressions (this should be easy)

Also changes the K-matrix benchmark to collect more samples
This allows `Evaluator` and `NLL` to be safely cloned without duplicating precomputed data, which is just really nice in general but also lets us copy them around as likelihood terms. The way it's currently implement doesn't seem to change performance.
7000 evals takes too long, let's just shoot for 30 seconds
`LikelihoodTerm`s can be registered in a `LikelihoodManager` just like `Amplitude`s get registered in `Manager`s. The registration returns a `LikelihoodID`, which behaves the same way as an `AmplitudeID` and can be added or multiplied with other `LiklihoodExpression`s. These `LikelihoodExpression`s can then be loaded by the `LikelihoodManager` to create a `LikelihoodEvaluator` which can be minimized in the same way as an `NLL`. Since this is an extended implementation, we also keep the minimization method for `NLL`, since not all `LikelihoodTerm`s *should* be minimized.
I honestly don't think the measurement time does anything, maybe it takes the maximum of either of these settings?
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 1070 lines in your changes missing coverage. Please review.

Project coverage is 10.36%. Comparing base (b646360) to head (04cc7b8).
Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
src/likelihoods.rs 0.00% 464 Missing ⚠️
src/python.rs 0.00% 273 Missing ⚠️
src/amplitudes/mod.rs 0.00% 138 Missing ⚠️
src/amplitudes/kmatrix.rs 0.00% 117 Missing ⚠️
src/amplitudes/common.rs 0.00% 41 Missing ⚠️
src/amplitudes/breit_wigner.rs 0.00% 16 Missing ⚠️
src/amplitudes/ylm.rs 0.00% 9 Missing ⚠️
src/amplitudes/zlm.rs 0.00% 9 Missing ⚠️
src/resources.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #8      +/-   ##
==========================================
- Coverage   12.69%   10.36%   -2.33%     
==========================================
  Files          14       15       +1     
  Lines        3213     3935     +722     
  Branches     3213     3935     +722     
==========================================
  Hits          408      408              
- Misses       2805     3527     +722     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark for 545d5f8

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 2.9±0.02ms 3.0±0.16ms +3.45%

Copy link

Benchmark for 1fce241

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.0±0.29ms 3.0±0.09ms 0.00%

Copy link

Benchmark for 5370a2c

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.1±0.24ms 3.0±0.16ms -3.23%

Copy link

Benchmark for 6d37313

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.0±0.25ms 3.3±0.16ms +10.00%

@denehoffman
Copy link
Owner Author

This is a huge step forward in terms of efficiency. Since gradient-requiring methods like the L-BFGS-B optimizer spend a lot of time calculating gradients, any improvement in the speed of this calculation is greatly beneficial.

The gradient commit (6af103d) does this in a very easy-to-use way for end users. First, ganesh provides a gradient method which can be overwritten. By default, it uses a central finite difference, which means to calculate a gradient, it must calculate the function twice for every free parameter in the function. This is theoretically the best you can do if you don't know the analytic gradient.

However, we can be a bit clever here. Since there are only a few ways we allow users to operate on Amplitudes, we can let each amplitude calculate its own central finite difference and then use the chain rule on each of the standard operations (the Expressions enum) to calculate the entire gradient from that. But we are not done just yet! Since not every Amplitude relies on every free parameter (unless your expression is just one Amplitude), we only have to calculate the finite difference for the free parameters used by the Amplitude, setting the other gradient coordinates to zero for that Amplitude. Furthermore, if we know the analytic form of the gradient for an Amplitude, we can just use that (and this commit provides convenience methods for both). For example, the gradient of the Scalar amplitude is just a vector of zeros with a one in the position of the corresponding parameter. The same goes for the ComplexScalar, except there are two parameters, and the imaginary one gets an additional factor of $\imath$. Finally, there are some Amplitudes, like the Ylm and Zlm, which are actually just constants with respect to the free parameters in the fit. While we have to know their value to determine the gradient of more complicated expressions involving them, their actual gradient is just zero, which simplifies the chain rule. There is probably some additional optimization we could do with products to avoid messing with this zero-gradient, but for right now, this works incredibly well.

In summary, if you don't implement a gradient for an Amplitude, you'll get the same central finite difference that ganesh uses anyway. You can speed that up by only caring about the parameters used by your Amplitude, and there's a convenience method (Amplitude::central_difference_with_indices) for doing this without much additional code. In this case, or in the case where you do know the gradient, you just have to override the Amplitude::compute_gradient method (which takes a gradient as a mutable reference to a nalgebra::DVector for memory reasons).

In my own tests, this speeds up the L-BFGS-B algorithm a ton (I don't have a benchmark for this, but it's actually quite shocking in practice). Of course, this will have no effect on gradient-free methods like Nelder-Mead, although it will speed up the calculation of the Hessian/error matrix at the end of both algorithms. Since computing the Hessian is like taking the gradient of the gradient, and the gradient takes twice as many function evaluations as free parameters ($2N$), the central finite difference Hessian would take the square of this ($4N^2$), which scales horribly if $N$ is large. For example, one of my test calculations has on the order of 40 free parameters and the objective function takes around 700ms to compute. This means it takes around 1.25 hours to compute the Hessian. However, all of the gradients for the Amplitudes being used are known analytically, so this same calculation just takes $2N$ computations of the gradient, or less than a minute (or somewhere around this, since the gradient doesn't take exactly as much time to calculate as the function itself).

…m_amptools` method in the main python package

Now users can either convert files via the command line `amptools-to-laddu` script installed automatically with `laddu` or via the function `laddu.convert_from_amptools` directly in their Python code. This greatly simplifies the pipeline for using `AmpTools`-formated data in `laddu` without much additional effort for users (and hopefully aids in adoption)
Copy link

Benchmark for 092c0ea

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.0±0.13ms 3.3±0.33ms +10.00%

…thon API for `NLL` and add method to turn an `NLL` into a `LikelihoodTerm`
…e the same length

Previously, if two `LikelihoodTerm`s had a different number or layout of parameters, the gradients wouldn't be compatible. This should fix that by using the parameter layouts to move values from the local gradient for each term to the global gradient for the `LikelihoodExpression`.
Copy link

Benchmark for 1690076

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.1±0.24ms 3.3±0.16ms +6.45%

Copy link

Benchmark for 8a8f70d

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.1±0.28ms 3.3±0.17ms +6.45%

Copy link

Benchmark for 83543b7

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.1±0.47ms 3.2±0.19ms +3.23%

Floating point addition is not associative, and the standard parallel summation was causing non-deterministic results. This should be fixed with the inclusion of the `accurate` crate in this commit, although I've included a temporary hack for the gradients since you cannot sum `DVector`s using `accurate`. This should eventually be fixed, since I think it might cause a slight unneeded bump in memory (although maybe the compiler optimizes this out?).
…om MC

AmpTools doesn't scale the MC term by the size of the "data" dataset, so I've chosen not to either. This seems to give better results and was probably the correct way to do it in the first place.
This is a simple, self-contained example script that should "just work". The data and MC have been generated with [`gen_amp`](https://github.com/JeffersonLab/halld_sim/tree/962c1fffc29eb4801b146d0a7f1e9aecb417374a/src/programs/Simulation/gen_amp)  with the same amplitudes as in the example. Note that the overall scaling of parameters seems to be different. The generated S-wave had magnitude `100+0i` and the D-wave had `50+50i`, but the fit values from `laddu` aren't near those. However, the S/D ratio and phase are very close to correct, so this can probably be chalked up to some overall difference in how amplitudes are scaled in the two architectures.
Copy link

Benchmark for 4a8984f

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.0±0.02ms 3.3±0.16ms +10.00%

Copy link

Benchmark for 0c1cc69

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.1±0.27ms 3.3±0.19ms +6.45%

Copy link

Benchmark for 6b7faf9

Click to view benchmark
Test Base PR %
kmatrix benchmark (nll) 3.0±0.04ms 3.3±0.17ms +10.00%

@denehoffman denehoffman merged commit 9121511 into main Oct 30, 2024
20 of 22 checks passed
@denehoffman denehoffman deleted the nll-terms branch October 30, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant