Polluted result using CUDA 3D rfft in Float32 data type #326

doraemonho · 2022-07-16T23:04:21Z

Hi, I am working on the 3D RFFT but find out the result of rfft using CUDA is a bit weird in Float32 .

I define a function F(k) in k-space as a gaussian function like e^{-(k-k0)^2}e^{-i\phi}, where k0 is some number and phi is a random number array between 0 and 2pi. Here is the implementation

function GetFk(grid;kf = 2,σ²= 1)
    kx,ky,kz = grid.kr,grid.l,grid.m;
    k⁻¹  = @. √(grid.invKrsq);
    k    = @. √(grid.Krsq);
    Fk   = @. √(exp(-(k.-kf)^2/σ²)/2/π)*k⁻¹;
    randN = typeof(Fk) <: Array ? Base.rand : CUDA.rand;
    eⁱᶿ¹ = exp.(im*randN(T,GPUgrid.nkr,GPUgrid.nl,GPUgrid.nm)*2*π);
    return Fk.*eⁱᶿ¹;
end

If I do a rfft on this function, the result of CUDA rfft will overlay a moiré like pattern like below:

Here is the implemention:

using PyPlot,CUDA, FourierFlows
using FFTW
using LinearAlgebra: mul!, ldiv!

T = Float32;
GPUgrid = ThreeDGrid(GPU(),128,2π;T=T);
GFkh = GetFk(GPUgrid;kf = 2,σ²= 1);

GFk  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(GFk, GPUgrid.rfftplan, deepcopy(GFkh));

CPUgrid = ThreeDGrid(CPU(),128,2π;T=T);
CFkh = Array(GFkh);
CFk  = zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(CFk, CPUgrid.rfftplan, deepcopy(CFkh));

figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk[:,:,1]));
subplot(122);title("CPU T =$T",size=16)
imshow(Array(CFk[:,:,1]))

However, if you change T from Float32 to Float64, the result would become normal again:

The problem itself is rare and would happen in this kind of function.

The text was updated successfully, but these errors were encountered:

navidcy · 2022-07-17T09:49:51Z

Hi @doraemonho — Nice! I’ll try to have a look at it soon.

navidcy · 2022-07-17T15:39:53Z

Inside the GetFk function the GPUgrid is called instead the grid from the function’s argument. Is that the prob?

doraemonho · 2022-07-17T16:04:31Z

Thanks for taking a look!

Inside the GetFk function the GPUgrid is called instead the grid from the function’s argument. Is that the prob?

I rewrtie the function a bit to define the GPUgrid inside the function (showed below), but the result doesn't change.

function GetFk(N::Int,Lx::Number;T = Float32, dev = GPU(), kf = 2,σ²= 1)
    Fgrid    = ThreeDGrid(GPU(),N,Lx;T=T);
    kx,ky,kz = Fgrid.kr,Fgrid.l,Fgrid.m;
    k⁻¹  = @. √(Fgrid.invKrsq);
    k    = @. √(Fgrid.Krsq);
    Fk   = @. √(exp(-(k.-kf)^2/σ²)/2/π)*k⁻¹;
    randN = typeof(Fk) <: Array ? Base.rand : CUDA.rand;
    eⁱᶿ¹ = exp.(im*randN(T,Fgrid.nkr,Fgrid.nl,Fgrid.nm)*2*π);
    return Fk.*eⁱᶿ¹,Fgrid;
end

T = Float32;
N = 128
Lx= 2π
GFkh,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);

GFk  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(GFk, GPUgrid.rfftplan, deepcopy(GFkh));

CPUgrid = ThreeDGrid(CPU(),N,Lx;T=T);
CFkh = Array(GFkh);
CFk  = zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(CFk, CPUgrid.rfftplan, deepcopy(CFkh));

figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk[:,:,1]));
subplot(122);title("CPU T =$T",size=16)
imshow(Array(CFk[:,:,1]))

navidcy · 2022-07-17T16:07:03Z

I’ll have a closer look in the morning from my laptop

doraemonho · 2022-07-17T16:14:51Z

It occurs to me that, the root of this problem is coming from the fft plan itself.
Now, I don't even use grid fft plan itself but define a rfft plan using FFTW, I can reproduce the polluted result

T = Float32;
GFkh_32,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);
GFk32  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
GPUrfftplan_F32 = plan_rfft(GFk32);
ldiv!(GFk32, GPUrfftplan_F32, deepcopy(GFkh_32));


figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk32[:,:,1]));

T = Float64;
GFkh_64,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);
GFk64  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
GPUrfftplan_F64 = plan_rfft(GFk64);
ldiv!(GFk64, GPUrfftplan_F64, deepcopy(GFkh_64));

subplot(122);title("GPU T =$T",size=16)
imshow(Array(GFk64[:,:,1]))

I am not sure if this is a FFTW.jl 's or CUDA.jl 's problem.

doraemonho · 2022-07-18T00:06:07Z

Quick Update here
After a long trial and error, it seems that I am finally arriving to the root of problem.
It may has something to do with the fft plan of 2^n 3D real transform.
If you try switch the resolution of any 2^n 3D Map (128^3 in our case) to other even number (130^3).
The discrepancy between CPU and GPU rfft would vanish.
I suspect there may have a bug on the algorithm of implementing 2^n FP32 rfft but I am not sure if this is a FFTW problem or CUDA problem

navidcy · 2022-07-18T14:30:24Z

Can we make sure it's not a plotting issue?

I tried to do some FFTs back and forth and couldn't find any disagreement between different data types. Have a look:

Here's an example just with FFTW and CUDA

CUDA + FFTW

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using Test

for T in [Float64, Float32]
    nx, ny, nz = 6, 8, 10
    nk, nl, nm = Int(nx//2)+1, ny, nz

    a = randn(T, (nx, ny, nz))
    a_cu = CuArray(a)

    ah = randn(Complex{T}, (nk, nl, nm))
    ah_cu = CuArray(ah)

    plan = plan_rfft(a)
    plan_cu = plan_rfft(a_cu)

    bh = zeros(Complex{T}, (nk, nl, nm))
    bh_cu = CuArray(bh)

    c = zeros(T, (nx, ny, nz))
    c_cu = CuArray(c)

    mul!(bh, plan, a)
    ldiv!(c, plan, deepcopy(bh))

    mul!(bh_cu, plan_cu, a_cu)
    ldiv!(c_cu, plan_cu, deepcopy(bh_cu))

    @testset "CPU test for $T" begin
        @test bh == rfft(a)
        @test c == irfft(bh, nx)
    end

    @testset "GPU test for $T" begin
        @test bh_cu == rfft(a_cu)
        @test c_cu == irfft(bh_cu, nx)
    end
end

and here's an example with CUDA + FourierFlows

CUDA + FourierFlows

using CUDA
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

for T in [Float64, Float32]
    nx, ny, nz = 6, 8, 10

    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    a = randn(T, (grid.nx, grid.ny, grid.nz))
    a_cu = CuArray(a)

    ah = randn(Complex{eltype(grid)}, (grid.nkr, grid.nl, grid.nm))
    ah_cu = CuArray(ah)

    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    bh = zeros(Complex{eltype(grid)}, (grid.nkr, grid.nl, grid.nm))
    bh_cu = CuArray(bh)

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c)

    mul!(bh, plan, a)
    ldiv!(c, plan, deepcopy(bh))

    mul!(bh_cu, plan_cu, a_cu)
    ldiv!(c_cu, plan_cu, deepcopy(bh_cu))

    @testset "CPU test for $T" begin
        @test bh == rfft(a)
        @test c == irfft(bh, nx)
    end

    @testset "GPU test for $T" begin
        @test bh_cu == rfft(a_cu)
        @test c_cu == irfft(bh_cu, nx)
    end
end

All test pass for me. Same if I choose nx = ny = nz = 128.

doraemonho · 2022-07-18T18:35:08Z

Can we make sure it's not a plotting issue?

I tried to do some FFTs back and forth and couldn't find any disagreement between different data types. Have a look:

Here's an example just with FFTW and CUDA

CUDA + FFTW
and here's an example with CUDA + FourierFlows

CUDA + FourierFlows
All test pass for me. Same if I choose nx = ny = nz = 128.

Thanks for checking!
Same result for me but I think this problem would only occurs if we satisfy the following condition :

Doing rfft from k-space to real space
The function that we are transferring is special ( like the spectral filtered e^{i\phi} k^{-2} )

I quite certain the problem is not coming the plot as the radial spectrum of FP32-CUDA found a spike at large k number.
I cannot check the result using your method as the FFT result is a little different (~1e-18) between GPU and CPU result.
However, I come up a alternative by using \approx instead of == inside the test. The Float64 is fine but fail in Float32.

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

nx, ny, nz = 128, 128, 128;       # define size of the array
θ = rand(θ,div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]

for T in [Float64,Float32]
    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)             
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    θT  = convert(Array{T,3},θ)
    eⁱᶿk⁻² = @. exp.(im*θT)*grid.invKrsq;     #define the spectral filtered function in CPU
    eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);               #define the spectral filtered function in GPU

    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c);
    
    ldiv!(c, plan, deepcopy(eⁱᶿk⁻²))    
    ldiv!(c_cu, plan_cu, deepcopy(eⁱᶿk⁻²_cu))
    c_cu = Array(c_cu); #move the result from GPU-> CPU
    
    test_title = "eⁱᶿk⁻² CPU & GPUtest for " * string(T)

    @testset "$test_title" begin
        #Compare the result using ≈
        @test c_cu  ≈  c
    end 
end

navidcy · 2022-07-21T16:14:46Z

Can you print the types of eⁱᶿk⁻² and eⁱᶿk⁻²_cu to make sure they are correct type?

navidcy · 2022-07-21T16:15:38Z

Before and after the conversion to Array for eⁱᶿk⁻²_cu

doraemonho · 2022-07-22T00:29:08Z

Before and after the conversion to Array for eⁱᶿk⁻²_cu

do you mean before/after the rift or before the testing process?
This is my type checking code and the result, as you can see, FP64 is fine but FP32 just fail

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

nx, ny, nz = 128, 128, 128;       # define size of the array
θ = rand(div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]

for T in [Float64,Float32]
    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)             
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    θT  = convert(Array{T,3},θ)
    eⁱᶿk⁻² = @. exp.(im*θT)*grid.invKrsq;     #define the spectral filtered function in CPU
    eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);              #define the spectral filtered function in GPU
    
    println("before the rfft")
    @show typeof(eⁱᶿk⁻²)
    @show typeof(eⁱᶿk⁻²_cu)
    
    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c);
    
    
    ldiv!(c, plan, deepcopy(eⁱᶿk⁻²))    
    ldiv!(c_cu, plan_cu, deepcopy(eⁱᶿk⁻²_cu))
    c_cu = Array(c_cu); #move the result from GPU-> CPU
    
    
    println("before the test")
    @show typeof(c)
    @show typeof(c_cu)
    
    test_title = "eⁱᶿk⁻² CPU & GPU test for " * string(T)

    @testset "$test_title" begin
        #Compare the result using ≈
        @test c_cu  ≈  c
    end 
end

navidcy · 2022-08-29T01:20:25Z

@doraemonho sorry for slacking... I went out on leave and now slowly coming back.

Did you have any progress on this or is it still outstanding?

doraemonho · 2022-08-30T15:08:53Z

@doraemonho sorry for slacking... I went out on leave and now slowly coming back.

Did you have any progress on this or is it still outstanding?

@navidcy no worries, I still have this issue.
The way of circumventing it is to change the resolution from 2^n to 10*n or switch to FP64 calculation. The problem only occurs in this kind of function

doraemonho · 2022-11-08T08:44:58Z

Just an update from what we were : -)
This may be a CuFFT bug that would occur after the version CuFFT v.10.3.
I reported it to NVIDIA and just heard back from them. They confirmed the issue and their CuFFT team is taking a look into it right now.
Not sure when will be solved but I will keep updating when I get back from them

navidcy · 2022-11-08T20:14:54Z

@doraemonho thanks for this!

nvlcambier · 2022-11-11T00:43:18Z

Hi there! I'm working on cuFFT and following up from JuliaGPU/CUDA.jl#1656 (internal NVIDIA bug# 3847913).

Sorry to hijack your issues - I thought it would be easier and faster to chat here directly.
Let me know if you'd prefer to chat somewhere else (either JuliaGPU/CUDA.jl#1656 or in the bug ticket).

Do you have more information about what exactly is the input to the complex-to-real transform that is being done by cuFFT ?

The reason why this matters is the following.

When doing a DFT of length N with a real input, if x is the output and N is even:

x[0] is real;
x[N/2] is real;
x[i] = x[N-i]*
(https://en.wikipedia.org/wiki/Discrete_Fourier_transform#DFT_of_real_and_purely_imaginary_signals)

Given this, in practice, real-to-complex transforms work on N real inputs and produce N/2+1 complex outputs. The complex output is such that the first two properties above are respected.

Complex-to-real transforms take N/2+1 complex input (y) and produce N real output and assume that

y[0] is real;
y[N/2] is real
in memory.

One might assume that imag(y[0]) or imag(y[N/2]) are not even read by the kernels.
But in some (very few) case it is, because it can improve performance. This is usually not a problem because the input to a complex-to-real FFT is usually the output of a real-to-complex FFT, which satisfies this property.

What I suspect is that your input doesn't satisfy one of the above properties.
I've done some profiling of the Julia code and in the nx=128, FP32 case, I see GPU kernels which assume the above properties.

In JuliaGPU/CUDA.jl#1656 you mentionned that

This doesn't happen for FP64 --> this is because FP64 transforms don't use this algorithm
This doesn't happen for nx=130 --> this is because only powers of 2 use this algorithm

I've attached a cpp repro that shows this: when the last entry of the N/2+1 complex input to the complex-to-real FFT is not real (i.e., the imaginary part is non zero), in some cases (powers of 2 with enough batches on new-enough GPUs), the output will be wrong. If you compile and run it, for instance with CUDA 11.8, you will see

Symmetry error: 2.402546e-07
Error C2R output vs C2C output (with mistake ? 0): 1.307564e-07
Last value of input_c2r set to {6.389053,5.000000} instead of {6.389053, 0}
Error C2R output vs C2C output (with mistake ? 1): 1.566945e-02

I realize this is a subtle issue. It might also have nothing to do with this, but I thought this was worth a shot.

Let me know if this helps,

repro.tar.gz

doraemonho · 2022-11-11T09:01:55Z

Hi there! I'm working on cuFFT and following up from [JuliaGPU/CUDA.jl#1656] .....

Hi, thanks for checking it out!
As the problem started from this thread, I think we may better chat here?

Do you have more information about what exactly is the input to the complex-to-real transform that is being done by cuFFT?

The input is just a random phase function $e^{i\phi}k^{-2}$. The first term is just Euler's formula with some random phase angle bounded between 0 and $2\pi$, and $k^{-2}$ can be referred to $(\sqrt{3}\times N\times x_i )^{-2}$, where $x_i$ is discrete coordinate number bound between 0 and 1. This is one classical way of injecting turbulence energy in fluid simulation.

We always set $k^{-2} =0$ when $x_i = 0$ to avoid singularity but other terms are nonzeros and remain imaginary. So I think
y[0] = 0 doesn't hold in our case. I work out a test of setting $y[0] = 0$ and it works! (Code showed at bottom)

I have one more question to ask. I was trying CuFFT v.10.3.0 for the same setup but the result was fine. However, the symptom only occurs in the newer version of CuFFT library. Are complex to real transform algorithm changes during the recent version?

using CUDA
using FFTW
using Random
using LinearAlgebra: mul!, ldiv!
using Test


function getk⁻²_and_rfftplan(dev::Dev; 
                             nx = 64, ny = nx, nz = nx, Lx = 2π, Ly = Lx, Lz = Lx,
                             T = Float32, nthreads=Sys.CPU_THREADS, effort=FFTW.MEASURE) where Dev
    ArrayType = dev == "CPU" ? Array : CuArray;
    
    # Define the rfft parameters
    nk = nx
    nl = ny
    nm = nz
    nkr = Int(nx/2 + 1)  
    # Wavenubmer
    k = ArrayType{T}(reshape( fftfreq(nx, 2π/Lx*nx), (nk, 1, 1)))
    l = ArrayType{T}(reshape( fftfreq(ny, 2π/Ly*ny), (1, nl, 1)))
    m = ArrayType{T}(reshape( fftfreq(nz, 2π/Lz*nz), (1, 1, nm)))
    kr = ArrayType{T}(reshape(rfftfreq(nx, 2π/Lx*nx), (nkr, 1, 1)))

       Ksq = @. k^2 + l^2 + m^2
    invKsq = @. 1 / Ksq
    CUDA.@allowscalar  invKsq[1, 1, 1] = 0;

       Krsq = @. kr^2 + l^2 + m^2
    invKrsq = @. 1 / Krsq
    CUDA.@allowscalar invKrsq[1, 1, 1] = 0;

    FFTW.set_num_threads(nthreads);
    rfftplan = plan_rfft(ArrayType{T, 3}(undef, nx, ny, nz))
    
    return invKrsq,rfftplan
end

# Set up the initial condition
Random.seed!(1234);        # Set up the random seed
θ = rand(div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]
nx = ny = nz= 128
T = Float32

  k⁻²,  rfftplan = getk⁻²_and_rfftplan("CPU"; nx = nx, Lx = 2π, ny = ny, nz = ny, T =  T)             
k⁻²_c,rfftplan_c = getk⁻²_and_rfftplan("GPU"; nx = nx, Lx = 2π, ny = ny, nz = ny, T =  T)

θT  = im.*convert(Array{T,3},θ)
eⁱᶿk⁻² = @. exp.(θT)*k⁻²;        #define the spectral filtered function in CPU
eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);     #define the spectral filtered function in GPU

c_b4_cleaning = CUDA.zeros(T, nx, ny, nz);
c_af_cleaning = CUDA.zeros(T, nx, ny, nz);

# Work out the rfft before cleaning 0 terms
ldiv!(c_b4_cleaning, rfftplan_c, deepcopy(eⁱᶿk⁻²_cu));

#Clean y[N/2] terms (index 0 = 1 julia)
@. eⁱᶿk⁻²_cu[1,:,:] .= 0;
ldiv!(c_af_cleaning, rfftplan_c, deepcopy(eⁱᶿk⁻²_cu));

c_b4_cleaning = Array(c_b4_cleaning); #move the result from GPU-> CPU
c_af_cleaning = Array(c_af_cleaning); #move the result from GPU-> CPU

figure(figsize=(12,6))
subplot("121");
imshow(c_b4_cleaning[:,:,100]);title("Before cleaning y[1,:,:]")
subplot("122");
imshow(c_af_cleaning[:,:,100]);title("After cleaning y[1,:,:]")

nvlcambier · 2022-11-11T17:48:21Z

Are complex to real transform algorithm changes during the recent version?

Yes, there are occasional changes. I doesn't surprise me that you didn't see this behavior in older versions.

I work out a test of setting and it works! (Code showed at bottom)

Awesome!

If that solves your problem, we'd appreciate if you can follow up on the Bug ticket.

In think in summary

If you want to do a complex-to-real transform, ensure the first and last element of the complex input of length (N/2)/1 are real, otherwise the result is essentially undefined;
If you can't have those elements real, I would just do a complex-to-complex with a complex input of length N. But then the output won't necessarily be real.

doraemonho · 2022-11-14T06:31:02Z

Yes, there are occasional changes. I doesn't surprise me that you didn't see this behavior in older versions.

Thanks for the information! I appreciate your help. I would follow the ticket then.

@navidcy would you think we should keep this issue open or just close it?

navidcy · 2022-11-14T06:33:05Z

Let’s keep it open for now so that people can see that there is an issue. Perhaps we should issue a warning if someone makes a ThreeDGrid w Float32 on GPU. The issue is only for 3D, right?

doraemonho · 2022-11-14T06:41:35Z

Let’s keep it open for now so that people can see that there is an issue. Perhaps we should issue a warning if someone makes a ThreeDGrid w Float32 on GPU. The issue is only for 3D, right?

From the description, it seems to apply to all cases(1D/2D/3D) as long as the Fourier transformed $x_{1}$ and $x_{N/2+1}$ is not 0 before they do the Complex to Real transform.

nvlcambier · 2022-11-14T16:34:54Z

This applies to 1D, 2D and 3D yes.

If the imaginary part of x[0] and x[N/2+1] is not zero, computing a complex-to-real DFT doesn't really make sense since the output of the DFT isn't guaranteed to be real. If you still want to compute the DFT of such signal, I would suggest to do a complex-to-complex DFT. But the output might not be real.

FYI, this is also documented in section 2.3 of cuFFT's online documentation

https://docs.nvidia.com/cuda/cufft/index.html#fft-types

navidcy · 2022-11-16T00:42:48Z

I'm confused a bit.

If we start with a real signal and Fourier transform it then the components that correspond to 0-th frequency
and to $N/2+1$-frequency aren't they real?

navidcy · 2022-11-16T00:51:53Z

$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using LinearAlgebra

julia> using CUDA

julia> using CUDA.CUFFT

julia> a = cu(rand(8))
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.4507753
 0.06592999
 0.96525353
 0.6843841
 0.94825345
 0.046085697
 0.38072583
 0.8913641

julia> rfft(a)
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
   4.4327717f0 + 0.0f0im
 -0.33708918f0 - 0.4522028f0im
 0.053049427f0 + 1.4637324f0im
 -0.65786713f0 + 0.71685266f0im
   1.0572441f0 + 0.0f0im

julia> ah =complex(cu(zeros(5)))
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im

julia> rfftplan = plan_rfft(CuArray{Float32, 1}(undef, 8))
CUFFT real-to-complex forward plan for 8-element CuArray of Float32

julia> mul!(ah, rfftplan, a)
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
   4.4327717f0 + 0.0f0im
 -0.33708918f0 - 0.4522028f0im
 0.053049427f0 + 1.4637324f0im
 -0.65786713f0 + 0.71685266f0im
   1.0572441f0 + 0.0f0im

julia> b = deepcopy(a*0)
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

julia> ldiv!(b, rfftplan, ah)
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.45077527
 0.06593
 0.9652535
 0.68438405
 0.9482534
 0.046085723
 0.38072583
 0.89136404

navidcy · 2022-11-16T00:56:49Z

(same results in 2D and 3D...)

navidcy · 2022-11-16T01:03:03Z

OK, I see... the issue comes when we generate the ah and not take ah as the rFFT of a real array. That's perfectly reasonable and indeed it's something expected.

But why that's not an issue with Float64?

nvlcambier · 2022-11-16T02:44:00Z

Because cuFFT FP64 transforms use (slightly) different algorithms, so the issue is not visible.

navidcy · 2023-07-10T05:55:49Z

I guess it seems that a warning should at least be displayed when user does give an N/2-Fourier component with non-zero imaginary part? I guess by CUDA.jl?

doraemonho mentioned this issue Oct 27, 2022

Aliasing/Polluted Result from rfftplan for Float32 2^n 3D array JuliaGPU/CUDA.jl#1656

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polluted result using CUDA 3D rfft in Float32 data type #326

Polluted result using CUDA 3D rfft in Float32 data type #326

doraemonho commented Jul 16, 2022

navidcy commented Jul 17, 2022

navidcy commented Jul 17, 2022 •

edited

Loading

doraemonho commented Jul 17, 2022 •

edited

Loading

navidcy commented Jul 17, 2022

doraemonho commented Jul 17, 2022 •

edited

Loading

doraemonho commented Jul 18, 2022

navidcy commented Jul 18, 2022 •

edited

Loading

doraemonho commented Jul 18, 2022

navidcy commented Jul 21, 2022

navidcy commented Jul 21, 2022

doraemonho commented Jul 22, 2022

navidcy commented Aug 29, 2022

doraemonho commented Aug 30, 2022

doraemonho commented Nov 8, 2022

navidcy commented Nov 8, 2022

nvlcambier commented Nov 11, 2022

doraemonho commented Nov 11, 2022 •

edited

Loading

nvlcambier commented Nov 11, 2022 •

edited

Loading

doraemonho commented Nov 14, 2022

navidcy commented Nov 14, 2022

doraemonho commented Nov 14, 2022 •

edited

Loading

nvlcambier commented Nov 14, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

nvlcambier commented Nov 16, 2022

navidcy commented Jul 10, 2023

Polluted result using CUDA 3D rfft in Float32 data type #326

Polluted result using CUDA 3D rfft in Float32 data type #326

Comments

doraemonho commented Jul 16, 2022

navidcy commented Jul 17, 2022

navidcy commented Jul 17, 2022 • edited Loading

doraemonho commented Jul 17, 2022 • edited Loading

navidcy commented Jul 17, 2022

doraemonho commented Jul 17, 2022 • edited Loading

doraemonho commented Jul 18, 2022

navidcy commented Jul 18, 2022 • edited Loading

doraemonho commented Jul 18, 2022

navidcy commented Jul 21, 2022

navidcy commented Jul 21, 2022

doraemonho commented Jul 22, 2022

navidcy commented Aug 29, 2022

doraemonho commented Aug 30, 2022

doraemonho commented Nov 8, 2022

navidcy commented Nov 8, 2022

nvlcambier commented Nov 11, 2022

doraemonho commented Nov 11, 2022 • edited Loading

nvlcambier commented Nov 11, 2022 • edited Loading

doraemonho commented Nov 14, 2022

navidcy commented Nov 14, 2022

doraemonho commented Nov 14, 2022 • edited Loading

nvlcambier commented Nov 14, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

navidcy commented Nov 16, 2022

nvlcambier commented Nov 16, 2022

navidcy commented Jul 10, 2023

navidcy commented Jul 17, 2022 •

edited

Loading

doraemonho commented Jul 17, 2022 •

edited

Loading

doraemonho commented Jul 17, 2022 •

edited

Loading

navidcy commented Jul 18, 2022 •

edited

Loading

doraemonho commented Nov 11, 2022 •

edited

Loading

nvlcambier commented Nov 11, 2022 •

edited

Loading

doraemonho commented Nov 14, 2022 •

edited

Loading