Switch CI to 1.11. #462

maleadt · 2024-10-17T15:47:52Z

No description provided.

christiangnrd · 2024-10-17T16:16:33Z

Should we also revert 71b784e?

maleadt · 2024-10-17T16:59:04Z

Should we also revert 71b784e?

Yeah, sure. Is that the reason for the compilation failure? I thought I had fixed the underlying issues, #370 (comment).

christiangnrd · 2024-10-17T17:12:34Z

Should we also revert 71b784e?

Yeah, sure. Is that the reason for the compilation failure? I thought I had fixed the underlying issues, #370 (comment).

It is not. I just figured that now that we have more than one macOS 15 runner we might as well benchmark with the latest features (potentially) active.

This failure seems to be unrelated to the #370 failure as this one is fully reproducible.

Working example from perf/byval.jl

using Metal

begin

const threads = 256

@inline get_inputs3(indx_y, a, b, c)                            = (a, b, c)
@inline get_inputs3(indx_y, a1, a2, b1, b2, c1, c2)             = indx_y == 1 ? (a1, b1, c1) : (a2, b2, c2)
@inline get_inputs3(indx_y, a1, a2, a3, b1, b2, b3, c1, c2, c3) = indx_y == 1 ? (a1, b1, c1) : indx_y == 2 ? (a2, b2, c2) : (a3, b3, c3)

# add arrays of matrixes kernel
function kernel_add_mat_z_slices(n, vararg...)
    x1, x2, y = get_inputs3(threadgroup_position_in_grid_2d().y, vararg...)
    i = thread_position_in_grid_1d()
    if i <= n
        @inbounds y[i] = x1[i] + x2[i]
    end
    return
end

function add_z_slices!(y, x1, x2)
    m1, n1 = size(x1[1]) #get size of first slice
    groups = (m1 * n1 + threads - 1) ÷ threads
    # get length(x1) more groups than needed to process 1 slice
    @metal groups = groups, length(x1) threads = threads kernel_add_mat_z_slices(m1 * n1, x1..., x2..., y...)
end

num_z_slices = 3
m, n = 3072, 1536    # 256 multiplier

x1 = [mtl(randn(Float32, (m, n)) .+ Float32(0.5)) for i = 1:num_z_slices]
x2 = [mtl(randn(Float32, (m, n)) .+ Float32(0.5)) for i = 1:num_z_slices]
y1 = [similar(x1[1]) for i = 1:num_z_slices]

# adding arrays in an array
Metal.@sync add_z_slices!(y1[1:num_z_slices], x1[1:num_z_slices], x2[1:num_z_slices])

end

Doesn't fail when num_z_slices = 2.

maleadt · 2024-10-18T12:19:34Z

Something wrong with the debug info:

      "frames": [
        {
          "imageOffset": 56140,
          "symbol": "__abort_with_payload",
          "symbolLocation": 8,
          "imageIndex": 1
        },
        {
          "imageOffset": 212612,
          "symbol": "abort_with_payload_wrapper_internal",
          "symbolLocation": 104,
          "imageIndex": 1
        },
        {
          "imageOffset": 212508,
          "symbol": "abort_with_reason",
          "symbolLocation": 32,
          "imageIndex": 1
        },
        {
          "imageOffset": 598872,
          "symbol": "fatalErrorHandler(void*, char const*, bool)",
          "symbolLocation": 724,
          "imageIndex": 8
        },
        {
          "imageOffset": 24336744,
          "symbol": "llvm::report_fatal_error(llvm::Twine const&, bool)",
          "symbolLocation": 412,
          "imageIndex": 9
        },
        {
          "imageOffset": 30557624,
          "symbol": "llvm::report_fatal_error(char const*, bool)",
          "symbolLocation": 52,
          "imageIndex": 9
        },
        {
          "imageOffset": 12603572,
          "symbol": "llvm::UpgradeDebugInfo(llvm::Module&)",
          "symbolLocation": 452,
          "imageIndex": 9
        },

EDIT: wait, is this IR even valid?

❯ ./metallib-dis broken.metallib -o broken.ll
PHI nodes not grouped at top of basic block!
  %.in.in.in = phi {}* [ %60, %guard_pass35 ], [ %58, %guard_pass27 ], [ %59, %conversion ]
label %L26
PHI nodes not grouped at top of basic block!
  %.in128.in.in = phi {}* [ %63, %guard_pass35 ], [ %61, %guard_pass27 ], [ %62, %conversion ]
label %L26
PHI nodes not grouped at top of basic block!
  %.in129.in.in = phi {}* [ %66, %guard_pass35 ], [ %64, %guard_pass27 ], [ %65, %conversion ]
label %L26

maleadt · 2024-10-18T12:28:34Z

Looks like the IR downgrader is messing up the IR.

maleadt · 2024-10-18T12:54:43Z

MWE:

define void @kernel() {
top:
  store ptr addrspace(1) null, ptr null, align 8
  br label %block

block:
  %in = phi ptr [ null, %top ]
  ret void
}

The problem is that the downgrader inserts bitcasts before the phi:

define void @kernel() {
top:
  %0 = bitcast {}* null to {} addrspace(1)**
  store {} addrspace(1)* null, {} addrspace(1)** %0, align 8
  br label %block

block:                                            ; preds = %top
  %1 = bitcast {}* null to {}*
  %in = phi {}* [ %1, %top ]
  ret void
}

maleadt · 2024-10-18T13:19:29Z

JuliaGPU/llvm-downgrade@4ebc423
JuliaPackaging/Yggdrasil#9637
JuliaPackaging/Yggdrasil#9639
JuliaRegistries/General#117599

github-actions

Metal Benchmarks

Benchmark suite	Current: `63c587a`	Previous: `100f831`	Ratio
`private array/construct`	`27428.571428571428` ns	`26340.333333333332` ns	`1.04`
`private array/broadcast`	`459333` ns	`465791` ns	`0.99`
`private array/random/randn/Float32`	`756541` ns	`827937.5` ns	`0.91`
`private array/random/randn!/Float32`	`643500` ns	`635250` ns	`1.01`
`private array/random/rand!/Int64`	`546687.5` ns	`562291.5` ns	`0.97`
`private array/random/rand!/Float32`	`587166` ns	`594500` ns	`0.99`
`private array/random/rand/Int64`	`770250` ns	`801791` ns	`0.96`
`private array/random/rand/Float32`	`602291.5` ns	`591208` ns	`1.02`
`private array/copyto!/gpu_to_gpu`	`643021` ns	`645500` ns	`1.00`
`private array/copyto!/cpu_to_gpu`	`725124.5` ns	`618187.5` ns	`1.17`
`private array/copyto!/gpu_to_cpu`	`637187.5` ns	`798500` ns	`0.80`
`private array/accumulate/1d`	`1343999.5` ns	`1333083` ns	`1.01`
`private array/accumulate/2d`	`1404917` ns	`1424500` ns	`0.99`
`private array/iteration/findall/int`	`2096271` ns	`2100167` ns	`1.00`
`private array/iteration/findall/bool`	`1818541` ns	`1847000` ns	`0.98`
`private array/iteration/findfirst/int`	`1695083` ns	`1696166.5` ns	`1.00`
`private array/iteration/findfirst/bool`	`1674750` ns	`1651958.5` ns	`1.01`
`private array/iteration/scalar`	`3856833` ns	`3657771` ns	`1.05`
`private array/iteration/logical`	`3206167` ns	`3264437.5` ns	`0.98`
`private array/iteration/findmin/1d`	`1762167` ns	`1565166` ns	`1.13`
`private array/iteration/findmin/2d`	`1355333` ns	`1351333.5` ns	`1.00`
`private array/reductions/reduce/1d`	`1025979` ns	`1063291` ns	`0.96`
`private array/reductions/reduce/2d`	`664125` ns	`695645.5` ns	`0.95`
`private array/reductions/mapreduce/1d`	`1035916` ns	`1078084` ns	`0.96`
`private array/reductions/mapreduce/2d`	`665167` ns	`705166` ns	`0.94`
`private array/permutedims/4d`	`2713625` ns	`860084` ns	`3.16`
`private array/permutedims/2d`	`1005167` ns	`862229.5` ns	`1.17`
`private array/permutedims/3d`	`1578979` ns	`919520.5` ns	`1.72`
`private array/copy`	`583396` ns	`574854` ns	`1.01`
`latency/precompile`	`5199144875` ns	`4396587542` ns	`1.18`
`latency/ttfp`	`6483345937.5` ns	`6698494124.5` ns	`0.97`
`latency/import`	`1137595437.5` ns	`722852834` ns	`1.57`
`integration/metaldevrt`	`686916.5` ns	`719875` ns	`0.95`
`integration/byval/slices=1`	`1559417` ns	`1530167` ns	`1.02`
`integration/byval/slices=3`	`10793187.5` ns	`9115541.5` ns	`1.18`
`integration/byval/reference`	`1529583` ns	`1520271` ns	`1.01`
`integration/byval/slices=2`	`2556458.5` ns	`2666416` ns	`0.96`
`kernel/indexing`	`451334` ns	`468541` ns	`0.96`
`kernel/indexing_checked`	`452750` ns	`461292` ns	`0.98`
`kernel/launch`	`10437.333333333334` ns	`8834` ns	`1.18`
`metal/synchronization/stream`	`14000` ns	`14583` ns	`0.96`
`metal/synchronization/context`	`15042` ns	`15250` ns	`0.99`
`shared array/construct`	`26680.5` ns	`26069.5` ns	`1.02`
`shared array/broadcast`	`465167` ns	`468333` ns	`0.99`
`shared array/random/randn/Float32`	`773792` ns	`785583` ns	`0.98`
`shared array/random/randn!/Float32`	`647854` ns	`626541.5` ns	`1.03`
`shared array/random/rand!/Int64`	`546791` ns	`564084` ns	`0.97`
`shared array/random/rand!/Float32`	`586958` ns	`598792` ns	`0.98`
`shared array/random/rand/Int64`	`772687.5` ns	`788666` ns	`0.98`
`shared array/random/rand/Float32`	`566584` ns	`629791` ns	`0.90`
`shared array/copyto!/gpu_to_gpu`	`85208` ns	`96916` ns	`0.88`
`shared array/copyto!/cpu_to_gpu`	`88334` ns	`88583` ns	`1.00`
`shared array/copyto!/gpu_to_cpu`	`82458` ns	`83458` ns	`0.99`
`shared array/accumulate/1d`	`1362375` ns	`1356667` ns	`1.00`
`shared array/accumulate/2d`	`1392229.5` ns	`1421333` ns	`0.98`
`shared array/iteration/findall/int`	`1797792` ns	`1792833` ns	`1.00`
`shared array/iteration/findall/bool`	`1587625` ns	`1620166.5` ns	`0.98`
`shared array/iteration/findfirst/int`	`1409958` ns	`1385791` ns	`1.02`
`shared array/iteration/findfirst/bool`	`1362958.5` ns	`1376291` ns	`0.99`
`shared array/iteration/scalar`	`151708.5` ns	`151458` ns	`1.00`
`shared array/iteration/logical`	`2995771` ns	`3042333` ns	`0.98`
`shared array/iteration/findmin/1d`	`1459063` ns	`1274875` ns	`1.14`
`shared array/iteration/findmin/2d`	`1366021` ns	`1346333` ns	`1.01`
`shared array/reductions/reduce/1d`	`714875` ns	`694458` ns	`1.03`
`shared array/reductions/reduce/2d`	`668479` ns	`702292` ns	`0.95`
`shared array/reductions/mapreduce/1d`	`740917` ns	`754229` ns	`0.98`
`shared array/reductions/mapreduce/2d`	`659625` ns	`705395.5` ns	`0.94`
`shared array/permutedims/4d`	`2707708.5` ns	`858875` ns	`3.15`
`shared array/permutedims/2d`	`1016083` ns	`862292` ns	`1.18`
`shared array/permutedims/3d`	`1581375` ns	`923916.5` ns	`1.71`
`shared array/copy`	`242750` ns	`246583` ns	`0.98`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt force-pushed the tb/1.11 branch from 5947364 to d13ac94 Compare October 18, 2024 20:47

Switch CI to 1.11.

63c587a

maleadt force-pushed the tb/1.11 branch from d13ac94 to 63c587a Compare October 19, 2024 06:47

github-actions bot reviewed Oct 19, 2024

View reviewed changes

maleadt merged commit 15ac66d into main Oct 19, 2024
2 checks passed

maleadt deleted the tb/1.11 branch October 19, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch CI to 1.11. #462

Switch CI to 1.11. #462

maleadt commented Oct 17, 2024

christiangnrd commented Oct 17, 2024

maleadt commented Oct 17, 2024

christiangnrd commented Oct 17, 2024 •

edited

Loading

maleadt commented Oct 18, 2024 •

edited

Loading

maleadt commented Oct 18, 2024

maleadt commented Oct 18, 2024 •

edited

Loading

maleadt commented Oct 18, 2024 •

edited

Loading

github-actions bot left a comment

Switch CI to 1.11. #462

Switch CI to 1.11. #462

Conversation

maleadt commented Oct 17, 2024

christiangnrd commented Oct 17, 2024

maleadt commented Oct 17, 2024

christiangnrd commented Oct 17, 2024 • edited Loading

maleadt commented Oct 18, 2024 • edited Loading

maleadt commented Oct 18, 2024

maleadt commented Oct 18, 2024 • edited Loading

maleadt commented Oct 18, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

Metal Benchmarks

christiangnrd commented Oct 17, 2024 •

edited

Loading

maleadt commented Oct 18, 2024 •

edited

Loading

maleadt commented Oct 18, 2024 •

edited

Loading

maleadt commented Oct 18, 2024 •

edited

Loading