Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Motivated choice for dlogz #423

Open
segasai opened this issue Feb 19, 2023 · 3 comments
Open

Motivated choice for dlogz #423

segasai opened this issue Feb 19, 2023 · 3 comments
Labels
enhancement upgrades and improvements question questions about stuff

Comments

@segasai
Copy link
Collaborator

segasai commented Feb 19, 2023

Currently dlogz and dlogz_init values are taken out of thin air pretty much and they are often set to 0.01. But there is a way of motivating their choice.

The rationale there is the following.
I'll assume we're sampling an N-dim Gaussian, with n live points and aim to have Neff samples. I'll also define as Z(r) as posterior volume within a ball radius r.

Given that we want $N_{eff}$ samples, we want our inner point in the samples to satisfy approximately $Z(r_{in}) = 1/N_{eff}$. In the same time, given the live-points are uniformly distributed,the radius of the outermost point is $r_{out}= n^{\frac{1}{N}} r_{in}$.
The remaining $\delta \log Z$ (in dynesty sense) for the outermost point is then $\log (1-Z(r_{out}))$.
Given that $Z(r) = IncGamma(\frac{N}{2}, \frac{r^2}{2})$ one can compute $\delta \log Z$ given $N_{eff}$, n live points and N dimensions.
Here is the code doing this calculation:

import scipy.optimize
import scipy.special
import numpy as np


def getdlogz(Neff, ndim, nlive):

    def func(r):
        # this is Z(r)  == 1/Neff eqn
        return scipy.special.gammainc(ndim / 2., r**2 / 2.) - 1 / Neff

    p0 = scipy.special.gamma(ndim / 2.)**(1. / ndim)
    ret = scipy.optimize.root(func, p0)
    r1 = ret.x[0]
    # this is the radius of the sphere corresponding to 1/neff=Z                
    r2 = r1 * (nlive**(1. / ndim))
    # this is the radius of outer most point if we have nlive uniform pts       
    ret = -np.log1p(-scipy.special.gammainc(ndim / 2., r2**2 / 2))
    # this is dlogz corresponding to the outermost point                        
    return ret

For example for ndim=4, nlive=100, neff=100 that gives dlogz=0.6
ndim=100, nlive=100, neff=100 that gives dlogz=0.04
ndim=10, nlive=100, neff=10000 gives dlogz =0.005

This is a motivation in terms of Neff. I haven't thought about motivation in terms of logz accuracy. But presumably if we cap the neff to be larger than 100 in the calculation above that will guarantee that our innermost point will correspond to Z_in/Z_tot = 0.01 which should be good enough for good logz accuracy.

Thoughts @joshspeagle ?

@segasai segasai added question questions about stuff enhancement upgrades and improvements labels Feb 19, 2023
@segasai
Copy link
Collaborator Author

segasai commented Mar 2, 2023

Pinging @joshspeagle again.

@joshspeagle
Copy link
Owner

Interesting. I don't know how I feel about benchmarking dlogz to be based on neff, since fundamentally it was designed to provide an upper bound on the remainder of the integral. The default choice in many cases was picked assuming some baseline tolerance of a few %, but this does ignore the fact that you almost always are recycling the final set of live points so you expect to be probing interior to the final threshold anyways.

Given that the default stopping criteria in some cases has been adjusted to depend on neff, this is probably a reasonable (and somewhat justified) choice. I would just think it's probably good to make sure that the initial run is able to probe far enough to note require another batch to try and sample beyond the final live point for people using the dynamic sampler (for the static sampler this should be fine).

@segasai
Copy link
Collaborator Author

segasai commented Mar 6, 2023

Thanks,

I agree we certainly want the first run to probe "deep" enough in the posterior.
My concern with the current behaviour is that it is

  • Not dependent on nlive
  • It is somewhat encouraged to use dlogz and dlogz_init, but I'm not sure that's very helpful because of the dependence on nlive. Also dlogz of like 1e-3 I think already leads to numerical issues if large number of points is used.
  • For a person just interested in posterior, dlogz has not much meaning and people don't quite know how to interpret.

Okay, given your comments, let me code a function (while trying to be conservative), i.e. not to increase dlogz to say above 0.1 and then maybe we can take a look at it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement upgrades and improvements question questions about stuff
Projects
None yet
Development

No branches or pull requests

2 participants