-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoopLoadElim: calling LoopVersioning with single-iteration loop #96656
Comments
The issue is in LoopAccessAnalysis, but the regression was seen in the user LoopVersioning. Hence, add pre-commit tests for both, in preparation to fix the issue in LoopAccessAnalysis.
Speculating the stride currently inserts a Stride == 1 predicate, which is equivalent to asserting that the that the loop executes atleast once. However, when the backedge-taken-count is known-non-negative, speculating the stride unnecessarily versions the loop. Avoid this. Fixes llvm#96656.
What makes it "unnecessary"? You have a strided load, and if the stride is 1 you can vectorize the load. I mean, in this particular case it's not really profitable because the loop can't be vectorized for other reasons, but the analysis of the load itself seems correct. |
The fact that the loop is executed at least once. I'm not 100% sure, but I think LAA is only supposed to speculate on the stride when the TC is unknown and may be 0, in order to version the loop and insert a |
Speculating the stride currently inserts a Stride == 1 predicate, which is equivalent to asserting that the that the loop executes atleast once. However, when the backedge-taken-count is known-non-negative, speculating the stride unnecessarily versions the loop. Avoid this. Fixes llvm#96656.
This is form an end-to-end run, right? If so, who is calling LoopVersioning on the loop with a single iteration? I agree that versioning for loops with a single iteration likely isn't profitable. |
Yes, it is from an end-to-end run. I'll audit the callers to fix the issue. |
After pr96656.ll were added to LAA and LoopVersioning, it was decided that the bug is in a caller of LoopVersioning, not in LAA or LoopVersioning itself. The caller has now been found to be LoopLoadElim. Hence, re-organize the added tests to avoid confusion, and add a new reduced-test for llvm#96656 to LoopLoadElim, in preparation to fix the bug.
It is unnecessary for LoopLoadElim to version single-iteration loops. Don't call LoopVersioning when the BTC is known to be 1. Fixes llvm#96656.
The issue is in LoopAccessAnalysis, but the regression was seen in the user LoopVersioning. Hence, add pre-commit tests for both, in preparation to fix the issue in LoopAccessAnalysis.
After much confusion (due to blind reliance on llvm-reduce), we now have a resolution. Yes, LoopVersioning versions single-iteration loops, but LoopLoadElimination never calls LoopVersioning with single-iteration loops, since it bails out before versioning the loop: the function responsible for this is The actual reproducer for this regression is: define void @unknown_stride_known_dependence(ptr %x, ptr %y, i1 %cond) {
; CHECK-LABEL: define void @unknown_stride_known_dependence(
; CHECK-SAME: ptr [[X:%.*]], ptr [[Y:%.*]], i1 [[COND:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[LOAD:%.*]] = load i32, ptr [[X]], align 4
; CHECK-NEXT: br i1 [[COND]], label %[[NOLOOP_EXIT:.*]], label %[[LOOP_LVER_CHECK:.*]]
; CHECK: [[LOOP_LVER_CHECK]]:
; CHECK-NEXT: [[SEXT_X:%.*]] = sext i32 [[LOAD]] to i64
; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr i8, ptr [[Y]], i64 8
; CHECK-NEXT: [[GEP_16:%.*]] = getelementptr i8, ptr [[Y]], i64 16
; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[LOAD]], 1
; CHECK-NEXT: br i1 [[IDENT_CHECK]], label %[[LOOP_PH_LVER_ORIG:.*]], label %[[LOOP_PH:.*]]
; CHECK: [[LOOP_PH_LVER_ORIG]]:
; CHECK-NEXT: br label %[[LOOP_LVER_ORIG:.*]]
; CHECK: [[LOOP_LVER_ORIG]]:
; CHECK-NEXT: [[IV_LVER_ORIG:%.*]] = phi i64 [ 0, %[[LOOP_PH_LVER_ORIG]] ], [ [[IV_NEXT_LVER_ORIG:%.*]], %[[LOOP_LVER_ORIG]] ]
; CHECK-NEXT: [[MUL_LVER_ORIG:%.*]] = mul i64 [[IV_LVER_ORIG]], [[SEXT_X]]
; CHECK-NEXT: [[GEP_8_MUL_LVER_ORIG:%.*]] = getelementptr double, ptr [[GEP_8]], i64 [[MUL_LVER_ORIG]]
; CHECK-NEXT: [[LOAD_8_LVER_ORIG:%.*]] = load double, ptr [[GEP_8_MUL_LVER_ORIG]], align 8
; CHECK-NEXT: [[GEP_16_MUL_LVER_ORIG:%.*]] = getelementptr double, ptr [[GEP_16]], i64 [[MUL_LVER_ORIG]]
; CHECK-NEXT: store double [[LOAD_8_LVER_ORIG]], ptr [[GEP_16_MUL_LVER_ORIG]], align 8
; CHECK-NEXT: [[IV_NEXT_LVER_ORIG]] = add i64 [[IV_LVER_ORIG]], 1
; CHECK-NEXT: [[ICMP_LVER_ORIG:%.*]] = icmp eq i64 [[IV_LVER_ORIG]], 1
; CHECK-NEXT: br i1 [[ICMP_LVER_ORIG]], label %[[EXIT_LOOPEXIT_LOOPEXIT:.*]], label %[[LOOP_LVER_ORIG]]
; CHECK: [[LOOP_PH]]:
; CHECK-NEXT: [[LOAD_INITIAL:%.*]] = load double, ptr [[GEP_8]], align 8
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[STORE_FORWARDED:%.*]] = phi double [ [[LOAD_INITIAL]], %[[LOOP_PH]] ], [ [[STORE_FORWARDED]], %[[LOOP]] ]
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[LOOP_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[MUL:%.*]] = mul i64 [[IV]], [[SEXT_X]]
; CHECK-NEXT: [[GEP_8_MUL:%.*]] = getelementptr double, ptr [[GEP_8]], i64 [[MUL]]
; CHECK-NEXT: [[LOAD_8:%.*]] = load double, ptr [[GEP_8_MUL]], align 8
; CHECK-NEXT: [[GEP_16_MUL:%.*]] = getelementptr double, ptr [[GEP_16]], i64 [[MUL]]
; CHECK-NEXT: store double [[STORE_FORWARDED]], ptr [[GEP_16_MUL]], align 8
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT: [[ICMP:%.*]] = icmp eq i64 [[IV]], 1
; CHECK-NEXT: br i1 [[ICMP]], label %[[EXIT_LOOPEXIT_LOOPEXIT1:.*]], label %[[LOOP]]
; CHECK: [[NOLOOP_EXIT]]:
; CHECK-NEXT: [[SEXT:%.*]] = sext i32 [[LOAD]] to i64
; CHECK-NEXT: [[GEP_Y:%.*]] = getelementptr double, ptr [[Y]], i64 [[SEXT]]
; CHECK-NEXT: [[LOAD_Y:%.*]] = load double, ptr [[GEP_Y]], align 8
; CHECK-NEXT: store double [[LOAD_Y]], ptr [[X]], align 8
; CHECK-NEXT: br label %[[EXIT:.*]]
; CHECK: [[EXIT_LOOPEXIT_LOOPEXIT]]:
; CHECK-NEXT: br label %[[EXIT_LOOPEXIT:.*]]
; CHECK: [[EXIT_LOOPEXIT_LOOPEXIT1]]:
; CHECK-NEXT: br label %[[EXIT_LOOPEXIT]]
; CHECK: [[EXIT_LOOPEXIT]]:
; CHECK-NEXT: br label %[[EXIT]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: ret void
;
entry:
%load = load i32, ptr %x, align 4
br i1 %cond, label %noloop.exit, label %loop.ph
loop.ph: ; preds = %entry
%sext.x = sext i32 %load to i64
%gep.8 = getelementptr i8, ptr %y, i64 8
%gep.16 = getelementptr i8, ptr %y, i64 16
br label %loop
loop: ; preds = %loop, %loop.ph
%iv = phi i64 [ 0, %loop.ph ], [ %iv.next, %loop ]
%mul = mul i64 %iv, %sext.x
%gep.8.mul = getelementptr double, ptr %gep.8, i64 %mul
%load.8 = load double, ptr %gep.8.mul, align 8
%gep.16.mul = getelementptr double, ptr %gep.16, i64 %mul
store double %load.8, ptr %gep.16.mul
%iv.next = add i64 %iv, 1
%icmp = icmp eq i64 %iv, 1
br i1 %icmp, label %exit, label %loop
noloop.exit: ; preds = %loop.ph
%sext = sext i32 %load to i64
%gep.y = getelementptr double, ptr %y, i64 %sext
%load.y = load double, ptr %gep.y
store double %load.y, ptr %x
br label %exit
exit: ; preds = %loop.body
ret void
} This is actually the case of BTC = 1, where the loop needs to be versioned for correctness. Previously, LLE bailed out in Previously, LAA returned the following:
Now, LAA returns:
Yes, there is an equal-predicate, due to which loop-versioning versions the loop, but this output is correct, and callers of LAA are more powerful now. Hence, I would classify this bug as invalid. |
After pr96656.ll were added to LAA and LoopVersioning, it was decided that the bug is in a caller of LoopVersioning, not in LAA or LoopVersioning itself. The new candidate was LoopLoadElim, but #96656 has since been marked invalid. Hence, re-organize the added tests to avoid confusion, and the testcase from the investigation to LoopLoadElim.
After pr96656.ll were added to LAA and LoopVersioning, it was decided that the bug is in a caller of LoopVersioning, not in LAA or LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656 has since been marked invalid. Hence, re-organize the added tests to avoid confusion, and the testcase from the investigation to LoopLoadElim.
After pr96656.ll were added to LAA and LoopVersioning, it was decided that the bug is in a caller of LoopVersioning, not in LAA or LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656 has since been marked invalid. Hence, re-organize the added tests to avoid confusion, and the testcase from the investigation to LoopLoadElim.
After pr96656.ll were added to LAA and LoopVersioning, it was decided that the bug is in a caller of LoopVersioning, not in LAA or LoopVersioning itself. The new candidate was LoopLoadElim, but llvm#96656 has since been marked invalid. Hence, re-organize the added tests to avoid confusion, and the testcase from the investigation to LoopLoadElim.
The following example:
has identical output after running loop-versioning before #92119. However, after that patch, the following diff is observed:
This is a regression.
The underlying issue is in LoopAccessAnalysis, which produces a false equal predicate. The diff before and after running LAA on the example is:
The text was updated successfully, but these errors were encountered: