Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoopLoadElim: calling LoopVersioning with single-iteration loop #96656

Closed
artagnon opened this issue Jun 25, 2024 · 5 comments
Closed

LoopLoadElim: calling LoopVersioning with single-iteration loop #96656

artagnon opened this issue Jun 25, 2024 · 5 comments
Labels
llvm:analysis loopoptim question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! regression

Comments

@artagnon
Copy link
Contributor

artagnon commented Jun 25, 2024

The following example:

define void @lver.check.unnecessary(ptr %arg, ptr %arg1, i1 %arg2) {
entry:
  %load = load i32, ptr %arg, align 4
  br i1 %arg2, label %noloop.exit, label %loop.ph

loop.ph:                                          ; preds = %entry
  %sext7 = sext i32 %load to i64
  %gep8 = getelementptr i8, ptr %arg1, i64 8
  br label %loop.body

loop.body:                                        ; preds = %loop.body, %loop.ph
  %phi = phi i64 [ 0, %loop.ph ], [ %add, %loop.body ]
  %mul = mul i64 %phi, %sext7
  %gep10 = getelementptr double, ptr %gep8, i64 %mul
  %load11 = load double, ptr %gep10, align 8
  store double %load11, ptr %arg1, align 8
  %add = add i64 %phi, 1
  %icmp = icmp eq i64 %phi, 0
  br i1 %icmp, label %loop.exit, label %loop.body

noloop.exit:                                      ; preds = %entry
  %sext = sext i32 %load to i64
  %gep = getelementptr double, ptr %arg1, i64 %sext
  %load5 = load double, ptr %gep, align 8
  store double %load5, ptr %arg, align 8
  ret void

loop.exit:                                        ; preds = %loop.body
  ret void
}

has identical output after running loop-versioning before #92119. However, after that patch, the following diff is observed:

diff --git a/lver.before.ll b/lver.main.ll
index 9dce17d..fc81e12 100644
--- a/lver.before.ll
+++ b/lver.main.ll
@@ -4,22 +4,39 @@ source_filename = "hand-reduce.ll"
 define void @lver.check.unnecessary(ptr %arg, ptr %arg1, i1 %arg2) {
 entry:
   %load = load i32, ptr %arg, align 4
-  br i1 %arg2, label %noloop.exit, label %loop.ph
+  br i1 %arg2, label %noloop.exit, label %loop.body.lver.check

-loop.ph:                                          ; preds = %entry
+loop.body.lver.check:                             ; preds = %entry
   %sext7 = sext i32 %load to i64
   %gep8 = getelementptr i8, ptr %arg1, i64 8
+  %ident.check = icmp ne i32 %load, 1
+  br i1 %ident.check, label %loop.body.ph.lver.orig, label %loop.body.ph
+
+loop.body.ph.lver.orig:                           ; preds = %loop.body.lver.check
+  br label %loop.body.lver.orig
+
+loop.body.lver.orig:                              ; preds = %loop.body.lver.orig, %loop.body.ph.lver.orig
+  %phi.lver.orig = phi i64 [ 0, %loop.body.ph.lver.orig ], [ %add.lver.orig, %loop.body.lver.orig ]
+  %mul.lver.orig = mul i64 %phi.lver.orig, %sext7
+  %gep10.lver.orig = getelementptr double, ptr %gep8, i64 %mul.lver.orig
+  %load11.lver.orig = load double, ptr %gep10.lver.orig, align 8
+  store double %load11.lver.orig, ptr %arg1, align 8
+  %add.lver.orig = add i64 %phi.lver.orig, 1
+  %icmp.lver.orig = icmp eq i64 %phi.lver.orig, 0
+  br i1 %icmp.lver.orig, label %loop.exit.loopexit, label %loop.body.lver.orig
+
+loop.body.ph:                                     ; preds = %loop.body.lver.check
   br label %loop.body

-loop.body:                                        ; preds = %loop.body, %loop.ph
-  %phi = phi i64 [ 0, %loop.ph ], [ %add, %loop.body ]
+loop.body:                                        ; preds = %loop.body, %loop.body.ph
+  %phi = phi i64 [ 0, %loop.body.ph ], [ %add, %loop.body ]
   %mul = mul i64 %phi, %sext7
   %gep10 = getelementptr double, ptr %gep8, i64 %mul
   %load11 = load double, ptr %gep10, align 8
   store double %load11, ptr %arg1, align 8
   %add = add i64 %phi, 1
   %icmp = icmp eq i64 %phi, 0
-  br i1 %icmp, label %loop.exit, label %loop.body
+  br i1 %icmp, label %loop.exit.loopexit1, label %loop.body

 noloop.exit:                                      ; preds = %entry
   %sext = sext i32 %load to i64
@@ -28,6 +45,12 @@ noloop.exit:                                      ; preds = %entry
   store double %load5, ptr %arg, align 8
   ret void

-loop.exit:                                        ; preds = %loop.body
+loop.exit.loopexit:                               ; preds = %loop.body.lver.orig
+  br label %loop.exit
+
+loop.exit.loopexit1:                              ; preds = %loop.body
+  br label %loop.exit
+
+loop.exit:                                        ; preds = %loop.exit.loopexit1, %loop.exit.loopexit
   ret void
 }

This is a regression.

The underlying issue is in LoopAccessAnalysis, which produces a false equal predicate. The diff before and after running LAA on the example is:

diff --git a/laa.before b/laa.main
index 17b0a1b..1b3b1b0 100644
--- a/laa.before
+++ b/laa.main
@@ -7,5 +7,9 @@ Printing analysis 'Loop Access Analysis' for function 'lver.check.unnecessary':

     Non vectorizable stores to invariant address were not found in loop.
     SCEV assumptions:
+    Equal predicate: %load == 1

     Expressions re-written:
+    [PSE]  %gep10 = getelementptr double, ptr %gep8, i64 %mul:
+      {(8 + %arg1),+,(8 * (sext i32 %load to i64))<nsw>}<%loop.body>
+      --> {(8 + %arg1),+,8}<%loop.body>
@artagnon artagnon changed the title LoopVersioning: unnecessary lver.check inserted since #92119 LAA: unnecessary equal predicate since #92119 Jun 25, 2024
artagnon added a commit to artagnon/llvm-project that referenced this issue Jun 27, 2024
The issue is in LoopAccessAnalysis, but the regression was seen in the
user LoopVersioning. Hence, add pre-commit tests for both, in
preparation to fix the issue in LoopAccessAnalysis.
artagnon added a commit to artagnon/llvm-project that referenced this issue Jun 27, 2024
Speculating the stride currently inserts a Stride == 1 predicate, which
is equivalent to asserting that the that the loop executes atleast once.
However, when the backedge-taken-count is known-non-negative,
speculating the stride unnecessarily versions the loop. Avoid this.

Fixes llvm#96656.
@efriedma-quic
Copy link
Collaborator

What makes it "unnecessary"? You have a strided load, and if the stride is 1 you can vectorize the load. I mean, in this particular case it's not really profitable because the loop can't be vectorized for other reasons, but the analysis of the load itself seems correct.

@artagnon
Copy link
Contributor Author

What makes it "unnecessary"?

The fact that the loop is executed at least once. I'm not 100% sure, but I think LAA is only supposed to speculate on the stride when the TC is unknown and may be 0, in order to version the loop and insert a Stride == 1 predicate, so that one version of the loop executes with that predicate. See also #96927.

artagnon added a commit that referenced this issue Jun 28, 2024
The issue is in LoopAccessAnalysis, but the regression was seen in the
user LoopVersioning. Hence, add pre-commit tests for both, in
preparation to fix the issue in LoopAccessAnalysis.
artagnon added a commit to artagnon/llvm-project that referenced this issue Jun 28, 2024
Speculating the stride currently inserts a Stride == 1 predicate, which
is equivalent to asserting that the that the loop executes atleast once.
However, when the backedge-taken-count is known-non-negative,
speculating the stride unnecessarily versions the loop. Avoid this.

Fixes llvm#96656.
@fhahn
Copy link
Contributor

fhahn commented Jun 28, 2024

This is form an end-to-end run, right? If so, who is calling LoopVersioning on the loop with a single iteration? I agree that versioning for loops with a single iteration likely isn't profitable.

@artagnon
Copy link
Contributor Author

Yes, it is from an end-to-end run. I'll audit the callers to fix the issue.

artagnon added a commit to artagnon/llvm-project that referenced this issue Jul 3, 2024
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The caller has now been found to be
LoopLoadElim. Hence, re-organize the added tests to avoid confusion, and
add a new reduced-test for llvm#96656 to LoopLoadElim, in preparation to fix
the bug.
artagnon added a commit to artagnon/llvm-project that referenced this issue Jul 3, 2024
It is unnecessary for LoopLoadElim to version single-iteration loops.
Don't call LoopVersioning when the BTC is known to be 1.

Fixes llvm#96656.
@artagnon artagnon changed the title LAA: unnecessary equal predicate since #92119 LoopLoadElim: calling LoopVersioning with single-iteration loop Jul 3, 2024
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this issue Jul 3, 2024
The issue is in LoopAccessAnalysis, but the regression was seen in the
user LoopVersioning. Hence, add pre-commit tests for both, in
preparation to fix the issue in LoopAccessAnalysis.
@artagnon
Copy link
Contributor Author

After much confusion (due to blind reliance on llvm-reduce), we now have a resolution. Yes, LoopVersioning versions single-iteration loops, but LoopLoadElimination never calls LoopVersioning with single-iteration loops, since it bails out before versioning the loop: the function responsible for this is findStoreToLoadDependences. This regression is actually about LLE becoming more powerful (although only a small regression is seen in the end-to-end compile), as LAA correctly turns certain dependences from Unknown to a known one.

The actual reproducer for this regression is:

define void @unknown_stride_known_dependence(ptr %x, ptr %y, i1 %cond) {
; CHECK-LABEL: define void @unknown_stride_known_dependence(
; CHECK-SAME: ptr [[X:%.*]], ptr [[Y:%.*]], i1 [[COND:%.*]]) {
; CHECK-NEXT:  [[ENTRY:.*:]]
; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[X]], align 4
; CHECK-NEXT:    br i1 [[COND]], label %[[NOLOOP_EXIT:.*]], label %[[LOOP_LVER_CHECK:.*]]
; CHECK:       [[LOOP_LVER_CHECK]]:
; CHECK-NEXT:    [[SEXT_X:%.*]] = sext i32 [[LOAD]] to i64
; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr i8, ptr [[Y]], i64 8
; CHECK-NEXT:    [[GEP_16:%.*]] = getelementptr i8, ptr [[Y]], i64 16
; CHECK-NEXT:    [[IDENT_CHECK:%.*]] = icmp ne i32 [[LOAD]], 1
; CHECK-NEXT:    br i1 [[IDENT_CHECK]], label %[[LOOP_PH_LVER_ORIG:.*]], label %[[LOOP_PH:.*]]
; CHECK:       [[LOOP_PH_LVER_ORIG]]:
; CHECK-NEXT:    br label %[[LOOP_LVER_ORIG:.*]]
; CHECK:       [[LOOP_LVER_ORIG]]:
; CHECK-NEXT:    [[IV_LVER_ORIG:%.*]] = phi i64 [ 0, %[[LOOP_PH_LVER_ORIG]] ], [ [[IV_NEXT_LVER_ORIG:%.*]], %[[LOOP_LVER_ORIG]] ]
; CHECK-NEXT:    [[MUL_LVER_ORIG:%.*]] = mul i64 [[IV_LVER_ORIG]], [[SEXT_X]]
; CHECK-NEXT:    [[GEP_8_MUL_LVER_ORIG:%.*]] = getelementptr double, ptr [[GEP_8]], i64 [[MUL_LVER_ORIG]]
; CHECK-NEXT:    [[LOAD_8_LVER_ORIG:%.*]] = load double, ptr [[GEP_8_MUL_LVER_ORIG]], align 8
; CHECK-NEXT:    [[GEP_16_MUL_LVER_ORIG:%.*]] = getelementptr double, ptr [[GEP_16]], i64 [[MUL_LVER_ORIG]]
; CHECK-NEXT:    store double [[LOAD_8_LVER_ORIG]], ptr [[GEP_16_MUL_LVER_ORIG]], align 8
; CHECK-NEXT:    [[IV_NEXT_LVER_ORIG]] = add i64 [[IV_LVER_ORIG]], 1
; CHECK-NEXT:    [[ICMP_LVER_ORIG:%.*]] = icmp eq i64 [[IV_LVER_ORIG]], 1
; CHECK-NEXT:    br i1 [[ICMP_LVER_ORIG]], label %[[EXIT_LOOPEXIT_LOOPEXIT:.*]], label %[[LOOP_LVER_ORIG]]
; CHECK:       [[LOOP_PH]]:
; CHECK-NEXT:    [[LOAD_INITIAL:%.*]] = load double, ptr [[GEP_8]], align 8
; CHECK-NEXT:    br label %[[LOOP:.*]]
; CHECK:       [[LOOP]]:
; CHECK-NEXT:    [[STORE_FORWARDED:%.*]] = phi double [ [[LOAD_INITIAL]], %[[LOOP_PH]] ], [ [[STORE_FORWARDED]], %[[LOOP]] ]
; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[LOOP_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[IV]], [[SEXT_X]]
; CHECK-NEXT:    [[GEP_8_MUL:%.*]] = getelementptr double, ptr [[GEP_8]], i64 [[MUL]]
; CHECK-NEXT:    [[LOAD_8:%.*]] = load double, ptr [[GEP_8_MUL]], align 8
; CHECK-NEXT:    [[GEP_16_MUL:%.*]] = getelementptr double, ptr [[GEP_16]], i64 [[MUL]]
; CHECK-NEXT:    store double [[STORE_FORWARDED]], ptr [[GEP_16_MUL]], align 8
; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT:    [[ICMP:%.*]] = icmp eq i64 [[IV]], 1
; CHECK-NEXT:    br i1 [[ICMP]], label %[[EXIT_LOOPEXIT_LOOPEXIT1:.*]], label %[[LOOP]]
; CHECK:       [[NOLOOP_EXIT]]:
; CHECK-NEXT:    [[SEXT:%.*]] = sext i32 [[LOAD]] to i64
; CHECK-NEXT:    [[GEP_Y:%.*]] = getelementptr double, ptr [[Y]], i64 [[SEXT]]
; CHECK-NEXT:    [[LOAD_Y:%.*]] = load double, ptr [[GEP_Y]], align 8
; CHECK-NEXT:    store double [[LOAD_Y]], ptr [[X]], align 8
; CHECK-NEXT:    br label %[[EXIT:.*]]
; CHECK:       [[EXIT_LOOPEXIT_LOOPEXIT]]:
; CHECK-NEXT:    br label %[[EXIT_LOOPEXIT:.*]]
; CHECK:       [[EXIT_LOOPEXIT_LOOPEXIT1]]:
; CHECK-NEXT:    br label %[[EXIT_LOOPEXIT]]
; CHECK:       [[EXIT_LOOPEXIT]]:
; CHECK-NEXT:    br label %[[EXIT]]
; CHECK:       [[EXIT]]:
; CHECK-NEXT:    ret void
;
entry:
  %load = load i32, ptr %x, align 4
  br i1 %cond, label %noloop.exit, label %loop.ph

loop.ph:                                              ; preds = %entry
  %sext.x = sext i32 %load to i64
  %gep.8 = getelementptr i8, ptr %y, i64 8
  %gep.16 = getelementptr i8, ptr %y, i64 16
  br label %loop

loop:                                                 ; preds = %loop, %loop.ph
  %iv = phi i64 [ 0, %loop.ph ], [ %iv.next, %loop ]
  %mul = mul i64 %iv, %sext.x
  %gep.8.mul = getelementptr double, ptr %gep.8, i64 %mul
  %load.8 = load double, ptr %gep.8.mul, align 8
  %gep.16.mul = getelementptr double, ptr %gep.16, i64 %mul
  store double %load.8, ptr %gep.16.mul
  %iv.next = add i64 %iv, 1
  %icmp = icmp eq i64 %iv, 1
  br i1 %icmp, label %exit, label %loop

noloop.exit:                                          ; preds = %loop.ph
  %sext = sext i32 %load to i64
  %gep.y = getelementptr double, ptr %y, i64 %sext
  %load.y = load double, ptr %gep.y
  store double %load.y, ptr %x
  br label %exit

exit:                                                 ; preds = %loop.body
  ret void
}

This is actually the case of BTC = 1, where the loop needs to be versioned for correctness. Previously, LLE bailed out in findStoreToLoadDependences, as the dependences were unknown, before versioning the loop. Now, there is a valid candidate for LLE, and indeed LLE is more powerful.

Previously, LAA returned the following:

Printing analysis 'Loop Access Analysis' for function 'dgtsv_':
  :
    Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
Unknown data dependence.
    Dependences:
      Unknown:
          %16 = load double, ptr %15, align 8 ->
          store double %16, ptr %17, align 8

    Run-time memory checks:
    Grouped accesses:

    Non vectorizable stores to invariant address were not found in loop.
    SCEV assumptions:

    Expressions re-written:

Now, LAA returns:

Printing analysis 'Loop Access Analysis' for function 'dgtsv_':
  :
    Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
Backward loop carried data dependence.
    Dependences:
      Backward:
          %16 = load double, ptr %15, align 8 ->
          store double %16, ptr %17, align 8

    Run-time memory checks:
    Grouped accesses:

    Non vectorizable stores to invariant address were not found in loop.
    SCEV assumptions:
    Equal predicate: %4 == 1

    Expressions re-written:
    [PSE]  %15 = getelementptr double, ptr %10, i64 %14:
      {(8 + %1),+,(8 * (sext i32 %4 to i64))<nsw>}<%12>
      --> {(8 + %1),+,8}<%12>
    [PSE]  %17 = getelementptr double, ptr %11, i64 %14:
      {(16 + %1),+,(8 * (sext i32 %4 to i64))<nsw>}<%12>
      --> {(16 + %1),+,8}<%12>

Yes, there is an equal-predicate, due to which loop-versioning versions the loop, but this output is correct, and callers of LAA are more powerful now. Hence, I would classify this bug as invalid.

@artagnon artagnon closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2024
artagnon added a commit to artagnon/llvm-project that referenced this issue Aug 30, 2024
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Aug 30, 2024
artagnon added a commit that referenced this issue Sep 30, 2024
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but #96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this issue Oct 2, 2024
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this issue Oct 2, 2024
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
xgupta pushed a commit to xgupta/llvm-project that referenced this issue Oct 4, 2024
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis loopoptim question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! regression
Projects
None yet
4 participants