Skip to content

Commit

Permalink
Merge pull request #32 from denehoffman/development
Browse files Browse the repository at this point in the history
The MCMC Update
  • Loading branch information
denehoffman authored Dec 21, 2024
2 parents e6319ea + 34024ec commit ccb8764
Show file tree
Hide file tree
Showing 224 changed files with 2,502,443 additions and 1,916 deletions.
6 changes: 5 additions & 1 deletion .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@ jobs:

steps:
- uses: actions/checkout@v4
- name: Install cargo-hack
uses: taiki-e/install-action@cargo-hack
- name: Run checks
run: cargo hack check --rust-version --each-feature --no-dev-deps
- name: Build
run: cargo build -r --verbose
- name: Run tests
run: cargo test -r --verbose
run: cargo hack test --verbose --each-feature
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -192,3 +192,4 @@ demos/
src/main.rs
*.perf.*
perf.*
*.pkl
9 changes: 5 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ auto_ops = "0.3.0"
rand = "0.8.5"
rand_chacha = "0.3.1"
rayon = { version = "1.10.0", optional = true }
pyo3 = { version = "0.22.6", optional = true, features = [
pyo3 = { version = "0.23.3", optional = true, features = [
"num-complex",
"abi3-py37",
] }
numpy = { version = "0.22.1", optional = true, features = ["nalgebra"] }
ganesh = "0.13.1"
numpy = { version = "0.23.0", optional = true, features = ["nalgebra"] }
ganesh = "0.15.2"
thiserror = "2.0.3"
shellexpand = "3.1.0"
accurate = "0.4.1"
Expand All @@ -44,6 +44,7 @@ serde_with = "3.11.0"
typetag = "0.2.18"
serde-pickle = "1.2.0"
bincode = "1.3.3"
fastrand = "2.3.0"

[dev-dependencies]
approx = "0.5.1"
Expand All @@ -64,7 +65,7 @@ harness = false
default = ["rayon", "python"]
extension-module = ["pyo3/extension-module"]
rayon = ["dep:rayon"]
f32 = []
f32 = ["ganesh/f32"]
python = ["pyo3", "numpy", "extension-module"]
pyo3 = ["dep:pyo3"]
numpy = ["dep:numpy"]
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,26 @@ The first example script uses data generated with [gen_amp](https://github.com/J
/>
</p>


Additionally, this example has an optional MCMC analysis complete with a custom observer to monitor convergence based on the integrated autocorrelation time. This can be run using `example_1_mcmc.py` script after `example_1.py` has completed, as it uses data stored during the execution of `example_1.py` to initialize the walkers. A word of warning, this analysis takes a long time, but is meant more as a demonstration of what's possible with the current system. The custom autocorrelation observer plays an important role in choosing convergence criteria, since this problem has an implicit symmetry (the absolute phase between the two waves matters, but the sign is ambiguous) which cause the posterior distributions to sometimes be multimodal, which can lead to long IATs. Instead, the custom implementation projects the walkers' positions onto the two waves and uses those projections as a proxy to the real chain. This proxy is unimodal by definition, so the IATs calculated from it are much smaller and more realistically describe convergence.

Some example plots can be seen below for the first data bin:

<p align="center">
<table>
<tr>
<td><img width="250" src="python_examples/example_1/mcmc_plots/corner_0.svg" /></td>
<td><img width="250" src="python_examples/example_1/mcmc_plots/corner_transformed_0.svg" /></td>
<td><img width="250" src="python_examples/example_1/mcmc_plots/iat_0.svg" /></td>
</tr>
<tr>
<td colspan="3" align="center">
<img width="800" src="python_examples/example_1/mcmc_plots/trace_0.svg" />
</td>
</tr>
</table>
</p>

# Data Format
The data format for `laddu` is a bit different from some of the alternatives like [`AmpTools`](https://github.com/mashephe/AmpTools). Since ROOT doesn't yet have bindings to Rust and projects to read ROOT files are still largely works in progress (although I hope to use [`oxyroot`](https://github.com/m-dupont/oxyroot) in the future when I can figure out a few bugs), the primary interface for data in `laddu` is Parquet files. These are easily accessible from almost any other language and they don't take up much more space than ROOT files. In the interest of future compatibility with any number of experimental setups, the data format consists of an arbitrary number of columns containing the four-momenta of each particle, the polarization vector of each particle (optional) and a single column for the weight. These columns all have standardized names. For example, the following columns would describe a dataset with four particles, the first of which is a polarized photon beam, as in the GlueX experiment:
| Column name | Data Type | Interpretation |
Expand Down Expand Up @@ -280,9 +300,6 @@ The data format for `laddu` is a bit different from some of the alternatives lik
To make it easier to get started, we can directly convert from the `AmpTools` format using the provided [`amptools-to-laddu`] script (see the `bin` directory of this repository). This is not bundled with the Python library (yet) but may be in the future.

# Future Plans
* Introduce Rust-side function minimization. My [`ganesh`](https://github.com/denehoffman/ganesh) was written with this library in mind, and bindings will eventually be included to smooth over the fitting interface.
* Allow users to build likelihood functions from multiple terms, including non-amplitude terms like [LASSO](https://en.wikipedia.org/wiki/Lasso_(statistics)).
* Create a nice interface for binning datasets along a particular variable and fitting the binned data.
* MPI and GPU integration (these are incredibly difficult to do right now, but it's something I'm looking into).
* As always, more tests and documentation.

Expand Down
19 changes: 18 additions & 1 deletion python/laddu/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,14 @@
from laddu.amplitudes.zlm import Zlm
from laddu.convert import convert_from_amptools, read_root_file
from laddu.data import BinnedDataset, Dataset, Event, open
from laddu.likelihoods import NLL, LikelihoodManager, Status
from laddu.likelihoods import (
NLL,
LikelihoodManager,
Status,
Ensemble,
AutocorrelationObserver,
integrated_autocorrelation_times,
)
from laddu.utils.variables import (
Angles,
CosTheta,
Expand Down Expand Up @@ -40,6 +47,12 @@ def callback(self, step: int, status: Status) -> tuple[Status, bool]:
pass


class MCMCObserver(metaclass=ABCMeta):
@abstractmethod
def callback(self, step: int, ensemble: Ensemble) -> tuple[Ensemble, bool]:
pass


def open_amptools(
path: str | Path,
tree: str = 'kin',
Expand Down Expand Up @@ -80,13 +93,15 @@ def open_amptools(
'Mandelstam',
'Mass',
'Observer',
'MCMCObserver',
'Phi',
'PolAngle',
'PolMagnitude',
'PolarComplexScalar',
'Polarization',
'Scalar',
'Status',
'Ensemble',
'Vector3',
'Vector4',
'Ylm',
Expand All @@ -102,4 +117,6 @@ def open_amptools(
'open_amptools',
'parameter',
'utils',
'AutocorrelationObserver',
'integrated_autocorrelation_times',
]
18 changes: 17 additions & 1 deletion python/laddu/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,14 @@ from laddu.amplitudes.ylm import Ylm
from laddu.amplitudes.zlm import Zlm
from laddu.convert import convert_from_amptools
from laddu.data import BinnedDataset, Dataset, Event, open
from laddu.likelihoods import NLL, LikelihoodManager, Status
from laddu.likelihoods import (
NLL,
Ensemble,
LikelihoodManager,
Status,
AutocorrelationObserver,
integrated_autocorrelation_times,
)
from laddu.utils.variables import (
Angles,
CosTheta,
Expand All @@ -29,6 +36,11 @@ class Observer(metaclass=ABCMeta):
@abstractmethod
def callback(self, step: int, status: Status) -> tuple[Status, bool]: ...

class MCMCObserver(metaclass=ABCMeta):
@abstractmethod
def callback(self, step: int, ensemble: Ensemble) -> tuple[Ensemble, bool]:
pass

def open_amptools(
path: str | Path,
tree: str = 'kin',
Expand All @@ -55,13 +67,15 @@ __all__ = [
'Mandelstam',
'Mass',
'Observer',
'MCMCObserver',
'Phi',
'PolAngle',
'PolMagnitude',
'PolarComplexScalar',
'Polarization',
'Scalar',
'Status',
'Ensemble',
'Vector3',
'Vector4',
'Ylm',
Expand All @@ -76,4 +90,6 @@ __all__ = [
'open_amptools',
'parameter',
'utils',
'AutocorrelationObserver',
'integrated_autocorrelation_times',
]
6 changes: 6 additions & 0 deletions python/laddu/likelihoods/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,22 @@
LikelihoodScalar,
LikelihoodTerm,
Status,
Ensemble,
AutocorrelationObserver,
integrated_autocorrelation_times,
)

__all__ = [
'NLL',
'Status',
'Ensemble',
'Bound',
'LikelihoodID',
'LikelihoodExpression',
'LikelihoodTerm',
'LikelihoodManager',
'LikelihoodEvaluator',
'LikelihoodScalar',
'AutocorrelationObserver',
'integrated_autocorrelation_times',
]
65 changes: 62 additions & 3 deletions python/laddu/likelihoods/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ from typing import Any, Literal
import numpy as np
import numpy.typing as npt

from laddu.amplitudes import AmplitudeID, Evaluator, Expression, Manager
from laddu.amplitudes import Evaluator, Model
from laddu.data import Dataset

class LikelihoodID:
Expand Down Expand Up @@ -58,15 +58,25 @@ class LikelihoodEvaluator:
verbose: bool = False,
**kwargs,
) -> Status: ...
def mcmc(
self,
p0: list[list[float]] | npt.NDArray[np.float64],
n_steps: int,
*,
method: Literal['ESS', 'AIES'] = 'ESS',
debug: bool = False,
verbose: bool = False,
seed: int = 0,
**kwargs,
) -> Ensemble: ...

class NLL:
parameters: list[str]
data: Dataset
accmc: Dataset
def __init__(
self,
manager: Manager,
expression: Expression | AmplitudeID,
model: Model,
ds_data: Dataset,
ds_accmc: Dataset,
) -> None: ...
Expand Down Expand Up @@ -101,6 +111,17 @@ class NLL:
verbose: bool = False,
**kwargs,
) -> Status: ...
def mcmc(
self,
p0: list[list[float]] | npt.NDArray[np.float64],
n_steps: int,
*,
method: Literal['ESS', 'AIES'] = 'ESS',
debug: bool = False,
verbose: bool = False,
seed: int = 0,
**kwargs,
) -> Ensemble: ...

def LikelihoodScalar(name: str) -> LikelihoodTerm: ...

Expand All @@ -125,18 +146,56 @@ class Status:
def __getstate__(self) -> object: ...
def __setstate__(self, state: object) -> None: ...

class Ensemble:
dimension: tuple[int, int, int]

def __init__(self) -> None: ...
def save_as(self, path: str) -> None: ...
@staticmethod
def load_from(path: str) -> Status: ...
def __getstate__(self) -> object: ...
def __setstate__(self, state: object) -> None: ...
def get_chain(self, *, burn: int = 0, thin: int = 1) -> npt.NDArray[np.float64]: ...
def get_flat_chain(
self, *, burn: int = 0, thin: int = 1
) -> npt.NDArray[np.float64]: ...
def get_integrated_autocorrelation_times(
self, *, c: float = 7.0, burn: int = 0, thin: int = 1
) -> npt.NDArray[np.float64]: ...

class Bound:
lower: float
upper: float

def integrated_autocorrelation_times(
x: npt.NDArray[np.float64], *, c: float = 7.0
) -> npt.NDArray[np.float64]: ...

class AutocorrelationObserver:
taus: npt.NDArray[np.float64]
def __init__(
self,
*,
n_check=50,
n_taus_threshold=50,
dtau_threshold=0.01,
discard=0.5,
terminate=True,
c=7.0,
verbose=False,
) -> None: ...

__all__ = [
'NLL',
'Status',
'Ensemble',
'Bound',
'LikelihoodID',
'LikelihoodExpression',
'LikelihoodTerm',
'LikelihoodManager',
'LikelihoodEvaluator',
'LikelihoodScalar',
'AutocorrelationObserver',
'integrated_autocorrelation_times',
]
Loading

0 comments on commit ccb8764

Please sign in to comment.