You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently dlogz and dlogz_init values are taken out of thin air pretty much and they are often set to 0.01. But there is a way of motivating their choice.
The rationale there is the following.
I'll assume we're sampling an N-dim Gaussian, with n live points and aim to have Neff samples. I'll also define as Z(r) as posterior volume within a ball radius r.
Given that we want $N_{eff}$ samples, we want our inner point in the samples to satisfy approximately $Z(r_{in}) = 1/N_{eff}$. In the same time, given the live-points are uniformly distributed,the radius of the outermost point is $r_{out}= n^{\frac{1}{N}} r_{in}$.
The remaining $\delta \log Z$ (in dynesty sense) for the outermost point is then $\log (1-Z(r_{out}))$.
Given that $Z(r) = IncGamma(\frac{N}{2}, \frac{r^2}{2})$ one can compute $\delta \log Z$ given $N_{eff}$, n live points and N dimensions.
Here is the code doing this calculation:
importscipy.optimizeimportscipy.specialimportnumpyasnpdefgetdlogz(Neff, ndim, nlive):
deffunc(r):
# this is Z(r) == 1/Neff eqnreturnscipy.special.gammainc(ndim/2., r**2/2.) -1/Neffp0=scipy.special.gamma(ndim/2.)**(1./ndim)
ret=scipy.optimize.root(func, p0)
r1=ret.x[0]
# this is the radius of the sphere corresponding to 1/neff=Z r2=r1* (nlive**(1./ndim))
# this is the radius of outer most point if we have nlive uniform pts ret=-np.log1p(-scipy.special.gammainc(ndim/2., r2**2/2))
# this is dlogz corresponding to the outermost point returnret
For example for ndim=4, nlive=100, neff=100 that gives dlogz=0.6
ndim=100, nlive=100, neff=100 that gives dlogz=0.04
ndim=10, nlive=100, neff=10000 gives dlogz =0.005
This is a motivation in terms of Neff. I haven't thought about motivation in terms of logz accuracy. But presumably if we cap the neff to be larger than 100 in the calculation above that will guarantee that our innermost point will correspond to Z_in/Z_tot = 0.01 which should be good enough for good logz accuracy.
Interesting. I don't know how I feel about benchmarking dlogz to be based on neff, since fundamentally it was designed to provide an upper bound on the remainder of the integral. The default choice in many cases was picked assuming some baseline tolerance of a few %, but this does ignore the fact that you almost always are recycling the final set of live points so you expect to be probing interior to the final threshold anyways.
Given that the default stopping criteria in some cases has been adjusted to depend on neff, this is probably a reasonable (and somewhat justified) choice. I would just think it's probably good to make sure that the initial run is able to probe far enough to note require another batch to try and sample beyond the final live point for people using the dynamic sampler (for the static sampler this should be fine).
I agree we certainly want the first run to probe "deep" enough in the posterior.
My concern with the current behaviour is that it is
Not dependent on nlive
It is somewhat encouraged to use dlogz and dlogz_init, but I'm not sure that's very helpful because of the dependence on nlive. Also dlogz of like 1e-3 I think already leads to numerical issues if large number of points is used.
For a person just interested in posterior, dlogz has not much meaning and people don't quite know how to interpret.
Okay, given your comments, let me code a function (while trying to be conservative), i.e. not to increase dlogz to say above 0.1 and then maybe we can take a look at it again.
Currently dlogz and dlogz_init values are taken out of thin air pretty much and they are often set to 0.01. But there is a way of motivating their choice.
The rationale there is the following.
I'll assume we're sampling an N-dim Gaussian, with n live points and aim to have Neff samples. I'll also define as Z(r) as posterior volume within a ball radius r.
Given that we want$N_{eff}$ samples, we want our inner point in the samples to satisfy approximately $Z(r_{in}) = 1/N_{eff}$ . In the same time, given the live-points are uniformly distributed,the radius of the outermost point is $r_{out}= n^{\frac{1}{N}} r_{in}$ .$\delta \log Z$ (in dynesty sense) for the outermost point is then $\log (1-Z(r_{out}))$ .$Z(r) = IncGamma(\frac{N}{2}, \frac{r^2}{2})$ one can compute $\delta \log Z$ given $N_{eff}$ , n live points and N dimensions.
The remaining
Given that
Here is the code doing this calculation:
For example for ndim=4, nlive=100, neff=100 that gives dlogz=0.6
ndim=100, nlive=100, neff=100 that gives dlogz=0.04
ndim=10, nlive=100, neff=10000 gives dlogz =0.005
This is a motivation in terms of Neff. I haven't thought about motivation in terms of logz accuracy. But presumably if we cap the neff to be larger than 100 in the calculation above that will guarantee that our innermost point will correspond to Z_in/Z_tot = 0.01 which should be good enough for good logz accuracy.
Thoughts @joshspeagle ?
The text was updated successfully, but these errors were encountered: