Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in advect_particle #3923

Merged
merged 33 commits into from
Nov 23, 2024
Merged

Fix bug in advect_particle #3923

merged 33 commits into from
Nov 23, 2024

Conversation

simone-silvestri
Copy link
Collaborator

@simone-silvestri simone-silvestri commented Nov 13, 2024

The problem here stems from the fact that fractional_indices used for interpolation are 0-based indices to which, in the interpolate function, is summed 1.

The particle advection module uses fractional_indices directly outside the interpolate function, so the indices are inconsistent (there is a mix of 0-based and 1-based conventions).

There might be two solutions here:

  1. add one to the fractional_indices in the particle module (the solution proposed in this PR)
  2. return a 1-based index from fractional_indices. This might be cleaner from a user and developer perspective (mixing index basis is dangerous). However, as it stands now, we probably support extrapolation (because fractional_index can be negative), so we would lose that behavior.

I also have a straightforward, simple unit test. We probably need some more.

There were also a couple more bugs in the bounce method

closes #3917

@simone-silvestri simone-silvestri changed the title Fix bug in interpolate! for regular grids Fix bug in advect_particle Nov 13, 2024
@@ -68,30 +68,30 @@ end
x₀ = xnode(1, 1, 1, grid, locs...)
Δx = xspacings(grid, locs...)
FT = eltype(grid)
return convert(FT, (x - x₀) / Δx)
return convert(FT, (x - x₀) / Δx) # 0 - based indexing!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return convert(FT, (x - x₀) / Δx) # 0 - based indexing!
return convert(FT, (x - x₀) / Δx) + 1

why isn't this the fix?

@glwagner
Copy link
Member

Why don't we change fractional_indices to always use 1-based indexing? Isn't that the straightforward thing to do?

@simone-silvestri
Copy link
Collaborator Author

I am trying option 2. I also think it might be the cleaner option. Let's see if something breaks

src/Fields/interpolate.jl Outdated Show resolved Hide resolved
@glwagner
Copy link
Member

However, as it stands now, we probably support extrapolation (because fractional_index can be negative), so we would lose that behavior.

Why does 0-based indexing support extrapolation when 1-based does not?

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Nov 14, 2024

I am actually not even sure that it can do extrapolation.
But there is this part of code that confuses me a bit because it allows for a negative fractional_index

i⁻ = Base.unsafe_trunc(Int, fractional_idx)
i⁻ = Int(i⁻ + 1) # convert to "proper" integer?
shift = Int(sign(fractional_idx))
i⁺ = i⁻ + shift
ξ = mod(fractional_idx, 1)

and I kind of deduced is for extrapolation, but I see now that it does not make much sense

@ali-ramadhan
Copy link
Member

Thanks so much for the fix @simone-silvestri! I'm guessing the PR is not ready for testing? Happy to help here if I can.

I tried running the MWE from #3917 (comment) which now throws a GPU exception at the first time step, caught with julia -g2:

[ Info: Initializing simulation...
iteration: 0, time: 0 seconds, Δt: 1.100 seconds, U_max=(4.66e-03, 4.59e-03, 5.99e-05)
[ Info:     ... simulation initialization complete (15.507 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (2.446 seconds).
ERROR: a undefined variable error was thrown during kernel execution on thread (65, 1, 1) in block (45, 1, 1).
Stacktrace:
 [1] fractional_z_index at /home/alir/atdepth/Oceananigans.jl/src/Fields/interpolate.jl:137
 [2] fractional_z_index at /home/alir/atdepth/Oceananigans.jl/src/ImmersedBoundaries/immersed_boundary_nodes.jl:65
 [3] _fractional_indices at /home/alir/atdepth/Oceananigans.jl/src/Fields/interpolate.jl:169
 [4] fractional_indices at /home/alir/atdepth/Oceananigans.jl/src/Fields/interpolate.jl:156
 [5] advect_particle at /home/alir/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/lagrangian_particle_advection.jl:113
 [6] macro expansion at /home/alir/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/lagrangian_particle_advection.jl:187
 [7] gpu__advect_particles! at /home/alir/.julia/packages/KernelAbstractions/iW1Rw/src/macros.jl:97
 [8] gpu__advect_particles! at ./none:0

ERROR: LoadError: KernelException: exception thrown during kernel execution on device NVIDIA GeForce RTX 4090
Stacktrace:
  [1] check_exceptions()
    @ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/exceptions.jl:39
  [2] synchronize(stream::CUDA.CuStream; blocking::Bool, spin::Bool)
    @ CUDA ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:207
  [3] synchronize (repeats 2 times)
    @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:194 [inlined]
  [4] (::CUDA.var"#1127#1128"{Float64, Vector{Float64}, Int64, CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, Int64, Int64})()
    @ CUDA ~/.julia/packages/CUDA/2kjXI/src/array.jl:554
  [5] #context!#990
    @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:168 [inlined]
  [6] context!
    @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:163 [inlined]
  [7] unsafe_copyto!(dest::Vector{Float64}, doffs::Int64, src::CUDA.CuArray{Float64, 3, CUDA.DeviceMemory}, soffs::Int64, n::Int64)
    @ CUDA ~/.julia/packages/CUDA/2kjXI/src/array.jl:550
  [8] copyto!
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:503 [inlined]
  [9] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:52 [inlined]
 [10] getindex
    @ ~/.julia/packages/OffsetArrays/hwmnB/src/OffsetArrays.jl:440 [inlined]
 [11] _getindex
    @ ./abstractarray.jl:1323 [inlined]
 [12] getindex
    @ ./abstractarray.jl:1290 [inlined]
 [13] getindex
    @ ~/atdepth/Oceananigans.jl/src/Fields/field.jl:401 [inlined]
 [14] first(a::Field{Nothing, Nothing, Nothing, Nothing, ImmersedBoundaryGrid{…}, Tuple{…}, OffsetArrays.OffsetArray{…}, Float64, FieldBoundaryConditions{…}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{…}})
    @ Base ./abstractarray.jl:452
 [15] macro expansion
    @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210 [inlined]
 [16] maximum(f::Function, c::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{…}, Tuple{…}, OffsetArrays.OffsetArray{…}, Float64, FieldBoundaryConditions{…}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{…}}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields ~/atdepth/Oceananigans.jl/src/Fields/field.jl:703
 [17] maximum(f::Function, c::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{…}, Tuple{…}, OffsetArrays.OffsetArray{…}, Float64, FieldBoundaryConditions{…}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{…}})
    @ Oceananigans.Fields ~/atdepth/Oceananigans.jl/src/Fields/field.jl:689
 [18] progress(sim::Simulation{HydrostaticFreeSurfaceModel{…}, Float64, Float64, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Main ~/atdepth/Oceananigans.jl/3917.jl:84
 [19] (::Callback{Nothing, typeof(progress), IterationInterval, Oceananigans.TimeStepCallsite})(sim::Simulation{HydrostaticFreeSurfaceModel{…}, Float64, Float64, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/callback.jl:15
 [20] time_step!(sim::Simulation{HydrostaticFreeSurfaceModel{…}, Float64, Float64, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:143
 [21] run!(sim::Simulation{HydrostaticFreeSurfaceModel{…}, Float64, Float64, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}}; pickup::Bool)
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:97
 [22] run!(sim::Simulation{HydrostaticFreeSurfaceModel{…}, Float64, Float64, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}, OrderedCollections.OrderedDict{…}})
    @ Oceananigans.Simulations ~/atdepth/Oceananigans.jl/src/Simulations/run.jl:85
 [23] top-level scope
    @ ~/atdepth/Oceananigans.jl/3917.jl:105
 [24] include(fname::String)
    @ Base.MainInclude ./client.jl:494
 [25] top-level scope
    @ REPL[1]:1
in expression starting at /home/alir/atdepth/Oceananigans.jl/3917.jl:105
Some type information was truncated. Use `show(err)` to see complete types.

@@ -132,16 +134,16 @@ ZRegGrid = Union{ZRegularRG, ZRegularLLG, ZRegOrthogonalSphericalShellGrid}

@inline function fractional_z_index(z::FT, locs, grid::ZRegGrid) where FT
z₀ = znode(1, 1, 1, grid, locs...)
Δz = @inbounds first(zspacings(grid, locs...))
return convert(FT, (z - z₀) / Δz)
Δz = Δz(1, 1, 1, grid, locs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Δz = Δz(1, 1, 1, grid, locs...)
Δz = zspacing(1, 1, 1, grid, locs...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the pointwise operators defined in the operator module instead of the zspacing function and so on. (there is a conflict in the name here though, I ll correct it)

@ali-ramadhan
Copy link
Member

Looking at the latest Buildkite build, I think one or some of the tests are failing because Δxᶠᵃᶠ is not defined in Operators/spacings_and_areas_and_volumes.jl following PR #3143.

https://buildkite.com/clima/oceananigans/builds/19175#01934dbf-8dcf-471b-a3dc-ffecd42006df/18-582

@simone-silvestri
Copy link
Collaborator Author

Looking at the latest Buildkite build, I think one or some of the tests are failing because Δxᶠᵃᶠ is not defined in Operators/spacings_and_areas_and_volumes.jl following PR #3143.

https://buildkite.com/clima/oceananigans/builds/19175#01934dbf-8dcf-471b-a3dc-ffecd42006df/18-582

Hmmm, shall we define it here in this PR?

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Nov 21, 2024

The problem is that Δxᶠᵃᶠ might be ambiguous for a curvilinear grid, so it is difficult to define. A solution is to use just three-dimensional spacings always with just and . A nothing location can point to .

@ali-ramadhan
Copy link
Member

Hmmm, shall we define it here in this PR?

Yeah I think that's what we decided to do as part of #3143: to define spacing functions as needed. Although if we find ourselves needing new ones all the time, then might as well define them all.

The problem is that Δxᶠᵃᶠ might be ambiguous for a curvilinear grid, so it is difficult to define. A solution is to use just three-dimensional spacings always with just and . A nothing location can point to .

In #3143 we also decided that a nothing location would point to , at least for the purposes of spacings. If we want to stick to this then the better long-term solution might be to try and always use spacings with just and .

@simone-silvestri
Copy link
Collaborator Author

In #3143 we also decided that a nothing location would point to , at least for the purposes of spacings. If we want to stick to this then the better long-term solution might be to try and always use spacings with just and .

I see. I think the problem is that is well-defined only for Flat directions and is almost always ambiguous for three-dimensional grids. There is also a problem with immersed boundary grids (for example, partial cells where dz is always three-dimensional).

In my opinion should be a placeholder for when the location does not matter (as for rectilinear grids for example).

@glwagner
Copy link
Member

glwagner commented Nov 21, 2024

We should never use ^a in numerics. Only for grid-specific operators. Using ^a generally means that the code is incorrect. As @simone-silvestri this is not merely an issue with the underlying grids, but also affects immersed boundary grids which can change the metrics.

@simone-silvestri
Copy link
Collaborator Author

I understand now the role of nothing pointing to a. If it points to a it will spit an error for those grids where the a is ambiguous. It makes sense because it is an extra check that we don't do anything incorrect.

I defined the operators that were not defined and the tests now pass

@glwagner
Copy link
Member

solid

@simone-silvestri simone-silvestri merged commit 9cc7112 into main Nov 23, 2024
46 checks passed
@simone-silvestri simone-silvestri deleted the ss/fix-interpolation-bug branch November 23, 2024 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some Lagrangian particles like to travel Southwest on an immersed lat-lon grid + hydrostatic model
3 participants