Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bispectrum: Bugs and improved implementation in spatialstats #234

Open
4 tasks
e-koch opened this issue Apr 13, 2021 · 1 comment
Open
4 tasks

Bispectrum: Bugs and improved implementation in spatialstats #234

e-koch opened this issue Apr 13, 2021 · 1 comment

Comments

@e-koch
Copy link
Member

e-koch commented Apr 13, 2021

@mjo22 has put together improved + much faster CPU/GPU implementations for calculating the bispectrum in https://github.com/mjo22/spatialstats/.

  • Add spatialstats as an optional dependency and expose the option to use the improved implementations in Bispectrum
  • Ensure the implementations give a common result on the unit test data (depends on bug fixes in TurbuStat, see below).

Michael also caught minor bugs in the turbustat implementations that should be fixed and enable all 3 implementations to return the same answer:

  • Fix bispectral amplitudes by normalizing by the number of samples
  • k1+k2=k3 needs to be bounded such that k3 remains in the region sampled by the image. Current samples "roll-over" using negative indices, but these should instead be thrown out or the sampling should be changed to avoid sampling outside the accepted range.

Neither of these bugs significantly changes the TurbuStat output for its current uses. For example, bispectra are compared using the same number of samples, so the normalization will not affect the difference between the two.

@mjo22, let me know if there are any important points I've missed! Or if you want to clarify anywhere. Thank you!

@mjo22
Copy link

mjo22 commented Apr 13, 2021

Thanks @e-koch. I'll add the following things:

  • To further elaborate on the first minor bug fix, regarding normalizing by the number of samples:

    Fix bispectral amplitudes by normalizing by the number of samples. This will calculate a moving average the correlation function F(k1)F(k2)F*(k1+k2), where F is the FFT of some data and F* is its complex conjugate. To recover the bispectral sum, one must also multiply by a correction factor. This correction factor can be either 1) the exact number of combinations of vectors k1, k2 that one can sample or 2) the volume of the sampling space. spatialstats implements option 1) and therefore converges to the exact sum. Since Turbustat's implementation does not enumerate the sample space like spatialstats, it makes more sense to use option 2).

    This correction factor for 2) is the multiplication of spherical shell volumes. More specifically, dv1*dv2, where dv1, dv2 are the volumes of the k1, k2 bins.

    In 2D,

    dv1 = np.pi*(k1+1)**2 - np.pi*k1**2

    and in 3D

    dv1 = 4/3*np.pi*(k1+1)**3 - 4/3*np.pi*k1**3

    The same calculations are true for dv2 of course.

  • For a real scalar field, the imaginary part of the bispectrum is zero. One can show this with the identity F(-k) = F*(k), where F is a fourier transform of real data. Convergence to zero is not realistic when using sampling because it relies on exact cancellations between evaluations F(k1)F(k2)F*(k1+k2) and F(-k1)F(-k2)F*(-(k1+k2)). It is best I think to throw away the imaginary part of bispectrum for real input, as it will only be a source of numerical error. It can get to be on the order of the real part of the bispectrum for a small number of samples and sometimes it is the main reason things do not converge.

  • For an easy speedup of the Turbustat implementation, I'd also recommend only calculating half of the image since B(k1, k2) = B(k2, k1). For an NxN image, one can do this with the loop

    for k1 in range(N):
        for k2 in range(k1+1):

    and setting both the (k1, k2) and (k2, k1) components of the image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants