You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that there is a difference between feeding in Vector{SVector{2,Float64}} if it is packed in a StructArray or if it is "raw". My findings are here:
using Base.Threads
using Bumper
using StrideArrays # Not necessary, but can make operations like broadcasting with Bumper.jl faster.using Polyester
using BenchmarkTools
using ChunkSplitters
using StaticArrays
using StructArrays
struct DimensionalData{D, T <:AbstractFloat}
vectors::Tuple{Vararg{Vector{T}, D}}
V::StructArray{SVector{D, T}, 1, Tuple{Vararg{Vector{T}, D}}}
# General constructor for vectorsfunctionDimensionalData(vectors::Vector{T}...) where {T}
D =length(vectors)
V =StructArray{SVector{D, T}}(vectors)
new{D, T}(Tuple(vectors), V)
end# Constructor for initializing with all zeros, adapting to dimension DfunctionDimensionalData{D, T}(len::Int) where {D, T}
vectors =ntuple(d ->zeros(T, len), D) # Create D vectors of zeros
V =StructArray{SVector{D, T}}(vectors)
new{D, T}(vectors, V)
endendfunctionReductionFunctionChunk!(dρdtI, I, J, drhopLp, drhopLn)
XT =eltype(dρdtI); XL =length(dρdtI); X0 =zero(XT)
nchunks =nthreads() # Assuming nchunks is defined somewhere as nthreads()@inbounds@no_escapebegin
local_X =@alloc(XT, XL, nchunks)
fill!(local_X,X0)
# Directly iterate over the chunks@batchfor ichunk in1:nchunks
chunk_inds =getchunk(I, ichunk; n=nchunks)
for idx in chunk_inds
i = I[idx]
j = J[idx]
# Accumulate the contributions into the correct place
local_X[i, ichunk] += drhopLp[idx]
local_X[j, ichunk] += drhopLn[idx]
endend# Reduction step# using @tturbo is slightly faster than batch (28 mus), but @batch (30 mus) works for svector, so we prefer this.@batchfor ix in1:XL
for chunk in1:nchunks
dρdtI[ix] += local_X[ix, chunk]
endendendreturnnothingendbegin
ProblemScaleFactor =1
NumberOfPoints =6195*ProblemScaleFactor
NumberOfInterations =50000*ProblemScaleFactor
I =rand(1:NumberOfPoints, NumberOfInterations)
J = I; #rand(1:NumberOfPoints, NumberOfInterations)
V =zeros(SVector{2,Float64},NumberOfPoints)
VL =rand(eltype(V),NumberOfInterations)
VD =DimensionalData{2, Float64}(NumberOfPoints)
VDL =DimensionalData{2, Float64}(NumberOfInterations)
VDL.V .= VL
V .*=0ReductionFunctionChunk!(V,I,J,VL,VL)
println("Value when doing chunk svector reduction: ", sum(V))
VD.V .*=0ReductionFunctionChunk!(VD.V,I,J,VDL.V,VDL.V)
println("Value when doing chunk svector reduction with dimensional data: ", sum(VD.V))
println("Chunk function:")
display(@benchmarkReductionFunctionChunk!($V,$I,$J,$VL,$VL))
println("Chunk function with struct:")
K = VD.V
KL = VDL.V
@benchmarkReductionFunctionChunk!($K,$I,$J,$KL,$KL)
end
Value when doing chunk svector reduction: [50079.66957585603, 50105.60196762815]
Value when doing chunk svector reduction with dimensional data: [50079.66957585603, 50105.60196762815]
Chunk function:
BenchmarkTools.Trial:10000 samples with 1 evaluation.
Range (min … max):37.800 μs …434.800 μs ┊ GC (min … max):0.00%…0.00%
Time (median):54.200 μs ┊ GC (median):0.00%
Time (mean ± σ):51.479 μs ±9.509 μs ┊ GC (mean ± σ):0.00%±0.00%
▁ █
▁█▇▃▁▁▂▃▁▃▃▂▄▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅▆██▃▁▂▆▇▆▄▆▂▃▂▂▂▂▂▂▂▂▁▁▁▁▁ ▂
37.8 μs Histogram: frequency by time 65 μs <
Memory estimate:0 bytes, allocs estimate:0.
Chunk function with struct:
BenchmarkTools.Trial:10000 samples with 1 evaluation.
Range (min … max):47.900 μs …355.600 μs ┊ GC (min … max):0.00%…0.00%
Time (median):77.800 μs ┊ GC (median):0.00%
Time (mean ± σ):68.857 μs ±15.964 μs ┊ GC (mean ± σ):0.00%±0.00%
▅█▁▃▂▁ ▄▆█▇▆▅▃▂▂▁▁ ▂
███████▇███▇█▆▇▆▅▄▆▄▄▅▄▅▃▄▄▇▆▆█▇████▆▇████████████▇▇▆▆▅▆▅▄▄▂ █
47.9 μs Histogram:log(frequency) by time 92.7 μs <
Memory estimate:176 bytes, allocs estimate:2.
My findings are as follows:
The 3 * 64 bytes, seem to come from indexing operations. By uncommenting an index operation inside of ReductionFunctionChunk! the amount of memory usage goes down.
More importantly, when removing @batch everything goes to zero allocations as expected.
Kind regards
The text was updated successfully, but these errors were encountered:
Hello!
I noticed that there is a difference between feeding in
Vector{SVector{2,Float64}}
if it is packed in aStructArray
or if it is "raw". My findings are here:My findings are as follows:
The 3 * 64 bytes, seem to come from indexing operations. By uncommenting an index operation inside of
ReductionFunctionChunk!
the amount of memory usage goes down.More importantly, when removing
@batch
everything goes to zero allocations as expected.Kind regards
The text was updated successfully, but these errors were encountered: