-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Straighten out flow during early jump threading #104603
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
After early jump threading had kicked in we can frequently prove that a branch will go in one direction. This was previously left up to RBO; instead, try to fold it immediately and straighten out the flow before we build SSA, so that we can refine the phis. Fix dotnet#101176
Perfscore regressions seem to just be from us recognizing more loops and applying the questionable "loop-scaling" logic to them. For example, here's a case with no diffs yet massive perfscore regressions: Example@@ -8,58 +8,58 @@
; 0 inlinees with PGO data; 22 single block inlinees; 8 inlinees without PGO data
; Final local variable assignments
;
-;* V00 arg0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op single-def <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V01 loc0 [V01,T13] ( 4, 18 ) ubyte -> rbx
-;* V02 loc1 [V02 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V03 loc2 [V03 ] ( 7, 60 ) struct (32) [rsp+0x78] do-not-enreg[XSF] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-; V04 loc3 [V04 ] ( 2, 4 ) struct (24) [rsp+0x60] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-; V05 loc4 [V05,T02] ( 6, 48 ) ref -> rbp class-hnd exact <Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax>
-; V06 loc5 [V06 ] ( 2, 16 ) struct (24) [rsp+0x48] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SyntaxToken>
-; V07 OutArgs [V07 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-;* V09 tmp2 [V09 ] ( 0, 0 ) struct ( 8) zero-ref "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V10 tmp3 [V10,T25] ( 4, 6 ) ref -> rdx class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.SyntaxNode>
-;* V11 tmp4 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd "dup spill" <Microsoft.CodeAnalysis.SyntaxNode>
-;* V12 tmp5 [V12 ] ( 0, 0 ) ref -> zero-ref
-;* V13 tmp6 [V13,T31] ( 0, 0 ) int -> zero-ref
-;* V14 tmp7 [V14 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
-; V15 tmp8 [V15,T21] ( 2, 8 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-; V16 tmp9 [V16,T24] ( 2, 8 ) struct (32) [rsp+0x28] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-;* V17 tmp10 [V17 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V18 tmp11 [V18,T00] ( 4, 64 ) byref -> r14 "Inlining Arg"
-; V19 tmp12 [V19,T05] ( 5, 40 ) ref -> r15 class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V20 tmp13 [V20,T07] ( 4, 32 ) ref -> rax class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.GreenNode>
-;* V21 tmp14 [V21 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd "Inline ldloca(s) first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V22 tmp15 [V22,T09] ( 2, 32 ) ref -> rdx class-hnd "impAppendStmt" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-;* V23 tmp16 [V23 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V24 tmp17 [V24,T01] ( 4, 64 ) ref -> rcx class-hnd "dup spill" <Microsoft.CodeAnalysis.GreenNode>
-;* V25 tmp18 [V25 ] ( 0, 0 ) ref -> zero-ref
-; V26 tmp19 [V26,T11] ( 3, 24 ) ref -> rcx
-; V27 tmp20 [V27,T08] ( 4, 32 ) ref -> rcx
-;* V28 tmp21 [V28,T19] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
-; V29 tmp22 [V29,T06] ( 3, 40 ) int -> rcx "Inline stloc first use temp"
-;* V30 tmp23 [V30,T29] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
-; V31 tmp24 [V31,T14] ( 3, 18 ) int -> rdi "Inline stloc first use temp"
-; V32 tmp25 [V32,T17] ( 4, 14 ) int -> rax "Inline return value spill temp"
-;* V33 tmp26 [V33 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
-; V34 tmp27 [V34,T22] ( 2, 8 ) ref -> rax class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-;* V35 tmp28 [V35 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
-; V36 tmp29 [V36,T15] ( 4, 16 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-; V37 tmp30 [V37,T20] ( 4, 8 ) int -> rax "Inline stloc first use temp"
-; V38 tmp31 [V38,T28] ( 2, 2 ) ref -> rcx single-def "field V00._node (fldOffset=0x0)" P-INDEP
-; V39 tmp32 [V39,T18] ( 4, 13 ) int -> rdi "field V02._index (fldOffset=0x0)" P-INDEP
-; V40 tmp33 [V40,T12] ( 8, 21 ) ref -> rsi single-def "field V02._list (fldOffset=0x8)" P-INDEP
-;* V41 tmp34 [V41,T32] ( 0, 0 ) int -> zero-ref single-def "field V08._index (fldOffset=0x0)" P-INDEP
-; V42 tmp35 [V42,T30] ( 2, 2 ) ref -> rsi single-def "field V08._list (fldOffset=0x8)" P-INDEP
-;* V43 tmp36 [V43 ] ( 0, 0 ) ref -> zero-ref "field V09._node (fldOffset=0x0)" P-INDEP
-;* V44 tmp37 [V44 ] ( 0, 0 ) int -> zero-ref "V16.[000..004)"
-; V45 tmp38 [V45,T23] ( 2, 8 ) ref -> rcx "argument with side effect"
-; V46 tmp39 [V46,T03] ( 3, 48 ) ref -> rcx "argument with side effect"
-; V47 tmp40 [V47,T10] ( 2, 32 ) ref -> rdx "argument with side effect"
-; V48 tmp41 [V48,T04] ( 3, 48 ) ref -> rax "argument with side effect"
-; V49 cse0 [V49,T26] ( 3, 6 ) ref -> rcx "CSE #01: moderate"
-; V50 rat0 [V50,T16] ( 4, 14 ) ref -> rcx "replacement local"
-; V51 rat1 [V51,T27] ( 3, 4 ) long -> rax "CSE for expectedClsNode"
+;* V00 arg0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op single-def <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V01 loc0 [V01,T08] ( 4, 258 ) ubyte -> rbx
+;* V02 loc1 [V02 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V03 loc2 [V03 ] ( 7,2576 ) struct (32) [rsp+0x78] do-not-enreg[XSF] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+; V04 loc3 [V04 ] ( 2, 16 ) struct (24) [rsp+0x60] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+; V05 loc4 [V05,T01] ( 6,1632 ) ref -> rbp class-hnd exact <Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax>
+; V06 loc5 [V06 ] ( 2, 256 ) struct (24) [rsp+0x48] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SyntaxToken>
+; V07 OutArgs [V07 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
+;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+;* V09 tmp2 [V09 ] ( 0, 0 ) struct ( 8) zero-ref "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V10 tmp3 [V10,T24] ( 4, 24 ) ref -> rdx class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.SyntaxNode>
+;* V11 tmp4 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd "dup spill" <Microsoft.CodeAnalysis.SyntaxNode>
+;* V12 tmp5 [V12 ] ( 0, 0 ) ref -> zero-ref
+;* V13 tmp6 [V13 ] ( 0, 0 ) int -> zero-ref
+;* V14 tmp7 [V14 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
+; V15 tmp8 [V15,T20] ( 2, 32 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+; V16 tmp9 [V16,T23] ( 2, 32 ) struct (32) [rsp+0x28] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+;* V17 tmp10 [V17 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V18 tmp11 [V18,T00] ( 4,2176 ) byref -> r14 "Inlining Arg"
+; V19 tmp12 [V19,T03] ( 5,1312 ) ref -> r15 class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V20 tmp13 [V20,T12] ( 4, 128 ) ref -> rax class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.GreenNode>
+;* V21 tmp14 [V21 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd "Inline ldloca(s) first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V22 tmp15 [V22,T13] ( 2, 128 ) ref -> rdx class-hnd "impAppendStmt" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+;* V23 tmp16 [V23 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V24 tmp17 [V24,T04] ( 4,1024 ) ref -> rcx class-hnd "dup spill" <Microsoft.CodeAnalysis.GreenNode>
+;* V25 tmp18 [V25 ] ( 0, 0 ) ref -> zero-ref
+; V26 tmp19 [V26,T07] ( 3, 384 ) ref -> rcx
+; V27 tmp20 [V27,T06] ( 4, 512 ) ref -> rcx
+;* V28 tmp21 [V28 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
+; V29 tmp22 [V29,T02] ( 3,1536 ) int -> rcx "Inline stloc first use temp"
+;* V30 tmp23 [V30 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
+; V31 tmp24 [V31,T10] ( 3, 160 ) int -> rdi "Inline stloc first use temp"
+; V32 tmp25 [V32,T16] ( 4, 88 ) int -> rax "Inline return value spill temp"
+;* V33 tmp26 [V33 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
+; V34 tmp27 [V34,T21] ( 2, 32 ) ref -> rax class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+;* V35 tmp28 [V35 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
+; V36 tmp29 [V36,T17] ( 4, 64 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+; V37 tmp30 [V37,T19] ( 4, 32 ) int -> rax "Inline stloc first use temp"
+; V38 tmp31 [V38,T27] ( 2, 2 ) ref -> rcx single-def "field V00._node (fldOffset=0x0)" P-INDEP
+; V39 tmp32 [V39,T15] ( 4, 105 ) int -> rdi "field V02._index (fldOffset=0x0)" P-INDEP
+; V40 tmp33 [V40,T11] ( 8, 137 ) ref -> rsi single-def "field V02._list (fldOffset=0x8)" P-INDEP
+;* V41 tmp34 [V41,T29] ( 0, 0 ) int -> zero-ref single-def "field V08._index (fldOffset=0x0)" P-INDEP
+; V42 tmp35 [V42,T28] ( 2, 2 ) ref -> rsi single-def "field V08._list (fldOffset=0x8)" P-INDEP
+;* V43 tmp36 [V43 ] ( 0, 0 ) ref -> zero-ref "field V09._node (fldOffset=0x0)" P-INDEP
+;* V44 tmp37 [V44 ] ( 0, 0 ) int -> zero-ref "V16.[000..004)"
+; V45 tmp38 [V45,T22] ( 2, 32 ) ref -> rcx "argument with side effect"
+; V46 tmp39 [V46,T09] ( 3, 192 ) ref -> rcx "argument with side effect"
+; V47 tmp40 [V47,T14] ( 2, 128 ) ref -> rdx "argument with side effect"
+; V48 tmp41 [V48,T05] ( 3, 768 ) ref -> rax "argument with side effect"
+; V49 cse0 [V49,T25] ( 3, 24 ) ref -> rcx "CSE #01: moderate"
+; V50 rat0 [V50,T18] ( 4, 56 ) ref -> rcx "replacement local"
+; V51 rat1 [V51,T26] ( 3, 16 ) long -> rax "CSE for expectedClsNode"
;
; Lcl frame size = 152
@@ -86,13 +86,13 @@ G_M20888_IG02: ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
; gcrRegs +[rsi]
mov edi, -1
;; size=10 bbWeight=1 PerfScore 0.75
-G_M20888_IG03: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+G_M20888_IG03: ; bbWeight=64, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rcx]
inc edi
test rsi, rsi
je SHORT G_M20888_IG05
- ;; size=7 bbWeight=8 PerfScore 12.00
-G_M20888_IG04: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=7 bbWeight=64 PerfScore 96.00
+G_M20888_IG04: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov rcx, gword ptr [rsi+0x18]
; gcrRegs +[rcx]
mov rax, rcx
@@ -102,13 +102,13 @@ G_M20888_IG04: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
mov eax, 1
; gcrRegs -[rax]
jmp SHORT G_M20888_IG07
- ;; size=21 bbWeight=2 PerfScore 17.00
-G_M20888_IG05: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=21 bbWeight=8 PerfScore 68.00
+G_M20888_IG05: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rcx]
xor eax, eax
jmp SHORT G_M20888_IG07
- ;; size=4 bbWeight=2 PerfScore 4.50
-G_M20888_IG06: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=4 bbWeight=8 PerfScore 18.00
+G_M20888_IG06: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[rcx]
movzx rax, byte ptr [rcx+0x0F]
cmp eax, 255
@@ -118,20 +118,20 @@ G_M20888_IG06: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
call [rax+0x28]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- ;; size=21 bbWeight=2 PerfScore 20.50
-G_M20888_IG07: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=21 bbWeight=8 PerfScore 82.00
+G_M20888_IG07: ; bbWeight=64, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
cmp edi, eax
jge SHORT G_M20888_IG10
- ;; size=4 bbWeight=8 PerfScore 10.00
-G_M20888_IG08: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=4 bbWeight=64 PerfScore 80.00
+G_M20888_IG08: ; bbWeight=32, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
test rsi, rsi
jne SHORT G_M20888_IG12
- ;; size=5 bbWeight=2 PerfScore 2.50
-G_M20888_IG09: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=5 bbWeight=32 PerfScore 40.00
+G_M20888_IG09: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov rdx, rsi
; gcrRegs +[rdx]
jmp SHORT G_M20888_IG13
- ;; size=5 bbWeight=2 PerfScore 4.50
+ ;; size=5 bbWeight=8 PerfScore 18.00
G_M20888_IG10: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[rdx rsi]
mov eax, ebx
@@ -147,7 +147,7 @@ G_M20888_IG11: ; bbWeight=1, epilog, nogc, extend
pop r15
ret
;; size=19 bbWeight=1 PerfScore 5.25
-G_M20888_IG12: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M20888_IG12: ; bbWeight=8, gcVars=0000000000000000 {}, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs +[rsi]
mov rcx, gword ptr [rsi+0x18]
; gcrRegs +[rcx]
@@ -162,42 +162,42 @@ G_M20888_IG12: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0040 {
; gcr arg pop 0
mov rdx, rax
; gcrRegs +[rdx]
- ;; size=29 bbWeight=2 PerfScore 27.50
-G_M20888_IG13: ; bbWeight=2, gcrefRegs=0044 {rdx rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=29 bbWeight=8 PerfScore 110.00
+G_M20888_IG13: ; bbWeight=8, gcrefRegs=0044 {rdx rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rax]
mov rcx, rdx
; gcrRegs +[rcx]
test rcx, rcx
je SHORT G_M20888_IG15
- ;; size=8 bbWeight=2 PerfScore 3.00
-G_M20888_IG14: ; bbWeight=1, gcrefRegs=0046 {rcx rdx rsi}, byrefRegs=0000 {}, byref
+ ;; size=8 bbWeight=8 PerfScore 12.00
+G_M20888_IG14: ; bbWeight=4, gcrefRegs=0046 {rcx rdx rsi}, byrefRegs=0000 {}, byref
mov rax, 0xD1FFAB1E ; Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax
cmp qword ptr [rcx], rax
jne G_M20888_IG25
- ;; size=19 bbWeight=1 PerfScore 4.25
-G_M20888_IG15: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=19 bbWeight=4 PerfScore 17.00
+G_M20888_IG15: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs -[rdx]
lea rdx, [rsp+0x60]
cmp dword ptr [rcx], ecx
call [Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax:get_Attributes():Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]:this]
; gcrRegs -[rcx]
; gcr arg pop 0
- ;; size=13 bbWeight=2 PerfScore 13.00
-G_M20888_IG16: ; bbWeight=2, nogc, extend
+ ;; size=13 bbWeight=8 PerfScore 52.00
+G_M20888_IG16: ; bbWeight=8, nogc, extend
vmovdqu xmm0, xmmword ptr [rsp+0x60]
vmovdqu xmmword ptr [rsp+0x30], xmm0
mov rcx, qword ptr [rsp+0x70]
mov qword ptr [rsp+0x40], rcx
- ;; size=22 bbWeight=2 PerfScore 12.00
-G_M20888_IG17: ; bbWeight=2, nogc, extend
+ ;; size=22 bbWeight=8 PerfScore 48.00
+G_M20888_IG17: ; bbWeight=8, nogc, extend
vmovdqu ymm0, ymmword ptr [rsp+0x28]
vmovdqu ymmword ptr [rsp+0x78], ymm0
- ;; size=12 bbWeight=2 PerfScore 10.00
-G_M20888_IG18: ; bbWeight=2, isz, extend
+ ;; size=12 bbWeight=8 PerfScore 40.00
+G_M20888_IG18: ; bbWeight=8, isz, extend
mov dword ptr [rsp+0x78], -1
jmp SHORT G_M20888_IG20
- ;; size=10 bbWeight=2 PerfScore 6.00
-G_M20888_IG19: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=10 bbWeight=8 PerfScore 24.00
+G_M20888_IG19: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs +[rcx]
mov edx, 1
call [Microsoft.CodeAnalysis.CSharp.Symbols.QuickAttributeHelpers:GetQuickAttributes(System.String,ubyte):ubyte]
@@ -205,14 +205,12 @@ G_M20888_IG19: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
; gcr arg pop 0
or eax, ebx
movzx rbx, al
- ;; size=16 bbWeight=8 PerfScore 30.00
-G_M20888_IG20: ; bbWeight=16, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref
+ ;; size=16 bbWeight=128 PerfScore 480.00
+G_M20888_IG20: ; bbWeight=512, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov ecx, dword ptr [rsp+0x78]
inc ecx
cmp ecx, dword ptr [rsp+0x80]
jge G_M20888_IG03
- ;; size=19 bbWeight=16 PerfScore 68.00
-G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov dword ptr [rsp+0x78], ecx
lea rcx, [rsp+0x80]
mov r8d, dword ptr [rsp+0x78]
@@ -229,11 +227,13 @@ G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
; gcrRegs +[r15]
test r15, r15
jne SHORT G_M20888_IG22
+ ;; size=71 bbWeight=512 PerfScore 8704.00
+G_M20888_IG21: ; bbWeight=32, gcrefRegs=8060 {rbp rsi r15}, byrefRegs=4000 {r14}, byref, isz
+ ; gcrRegs -[rax]
mov rcx, gword ptr [rbp+0x18]
; gcrRegs +[rcx]
xor edx, edx
mov rax, qword ptr [rcx]
- ; gcrRegs -[rax]
mov rax, qword ptr [rax+0x50]
call [rax+0x20]<unknown method>
; gcrRegs -[rcx] +[rax]
@@ -269,8 +269,8 @@ G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
; gcr arg pop 0
mov r15, gword ptr [r14]
; gcrRegs +[r15]
- ;; size=128 bbWeight=8 PerfScore 304.00
-G_M20888_IG22: ; bbWeight=8, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
+ ;; size=76 bbWeight=32 PerfScore 808.00
+G_M20888_IG22: ; bbWeight=128, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
; byrRegs -[r14]
mov rcx, r15
; gcrRegs +[rcx]
@@ -294,8 +294,8 @@ G_M20888_IG22: ; bbWeight=8, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {},
jne SHORT G_M20888_IG23
xor rcx, rcx
jmp SHORT G_M20888_IG24
- ;; size=44 bbWeight=8 PerfScore 156.00
-G_M20888_IG23: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=44 bbWeight=128 PerfScore 2496.00
+G_M20888_IG23: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
mov rax, qword ptr [rcx]
mov rax, qword ptr [rax+0x60]
call [rax+0x30]<unknown method>
@@ -303,14 +303,14 @@ G_M20888_IG23: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
; gcr arg pop 0
mov rcx, rax
; gcrRegs +[rcx]
- ;; size=13 bbWeight=8 PerfScore 58.00
-G_M20888_IG24: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=13 bbWeight=128 PerfScore 928.00
+G_M20888_IG24: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs -[rax]
test rcx, rcx
jne G_M20888_IG19
mov rcx, 0xD1FFAB1E
jmp G_M20888_IG19
- ;; size=24 bbWeight=8 PerfScore 28.00
+ ;; size=24 bbWeight=128 PerfScore 448.00
G_M20888_IG25: ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
; gcrRegs -[rcx rsi] +[rdx]
mov rcx, rax
@@ -320,7 +320,7 @@ G_M20888_IG25: ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byr
int3
;; size=9 bbWeight=0 PerfScore 0.00
-; Total bytes of code 516, prolog size 52, PerfScore 811.83, instruction count 141, allocated bytes for code 516 (MethodHash=453aae67) for method Microsoft.CodeAnalysis.CSharp.DeclarationTreeBuilder:GetQuickAttributes(Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]):ubyte (FullOpts)
+; Total bytes of code 516, prolog size 52, PerfScore 14590.08, instruction count 141, allocated bytes for code 516 (MethodHash=453aae67) for method Microsoft.CodeAnalysis.CSharp.DeclarationTreeBuilder:GetQuickAttributes(Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]):ubyte (FullOpts) Diff in the jitdump looks like @@ -1,8 +1,8 @@
*************** In optMarkLoopHeads()
-2 loop heads marked
+4 loop heads marked
*************** In optFindAndScaleGeneralLoopBlocks()
-Marking a loop from BB02 to BB30
+Marking a loop from BB02 to BB26
BB02(wt=400)
BB03(wt=400)
BB04(wt=400)
@@ -19,20 +19,21 @@ Marking a loop from BB02 to BB30
BB15(wt=400)
BB16(wt=400)
BB17(wt=400)
- BB18(wt=400)
+ BB18(wt=800)
BB19(wt=400)
BB20(wt=400)
- BB21(wt=800)
+ BB21(wt=400)
BB22(wt=400)
BB23(wt=400)
BB24(wt=400)
- BB25(wt=400)
- BB26(wt=400)
- BB27(wt=400)
- BB28(wt=800)
- BB29(wt=800)
- BB30(wt=800)
-Marking a loop from BB08 to BB19
+ BB25(wt=800)
+ BB26(wt=800)
+Marking a loop from BB03 to BB26
+ BB03(wt=1600)
+ BB04(wt=1600)
+ BB05(wt=1600)
+ BB06(wt=1600)
+ BB07(wt=1600)
BB08(wt=1600)
BB09(wt=1600)
BB10(wt=1600)
@@ -42,7 +43,35 @@ Marking a loop from BB08 to BB19
BB14(wt=1600)
BB15(wt=1600)
BB16(wt=1600)
- BB17(wt=3200)
- BB18(wt=3200)
- BB19(wt=3200)
-Found a total of 2 general loops.
+ BB17(wt=1600)
+ BB18(wt=6400)
+ BB19(wt=1600)
+ BB20(wt=1600)
+ BB21(wt=1600)
+ BB22(wt=1600)
+ BB23(wt=1600)
+ BB24(wt=1600)
+ BB25(wt=6400)
+ BB26(wt=6400)
+Marking a loop from BB07 to BB16
+ BB07(wt=6400)
+ BB08(wt=6400)
+ BB09(wt=6400)
+ BB10(wt=6400)
+ BB11(wt=6400)
+ BB12(wt=6400)
+ BB13(wt=6400)
+ BB14(wt=6400)
+ BB15(wt=12800)
+ BB16(wt=12800)
+Marking a loop from BB09 to BB16
+ BB09(wt=25600)
+ BB10(wt=25600)
+ BB11(wt=25600)
+ BB12(wt=25600)
+ BB13(wt=25600)
+ BB14(wt=25600)
+ BB15(wt=102400)
+ BB16(wt=102400)
+Found a total of 4 general loops. |
Looking at some regressions.. libraries_tests.run.windows.x64.Release.mch System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1)@@ -9,58 +9,65 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 arg0 [V00,T00] ( 7, 6 ) ref -> rbx class-hnd <System.Xml.Schema.XmlSchemaObject>
-; V01 loc0 [V01,T02] ( 4, 3.50) ref -> rax class-hnd <System.Xml.Schema.XmlSchema>
+; V00 arg0 [V00,T00] ( 8, 7.37) ref -> rbx class-hnd <System.Xml.Schema.XmlSchemaObject>
+; V01 loc0 [V01,T02] ( 4, 4 ) ref -> rax class-hnd <System.Xml.Schema.XmlSchema>
; V02 OutArgs [V02 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V03 rat0 [V03,T01] ( 5, 7 ) ref -> rax "replacement local"
; V04 rat1 [V04,T03] ( 3, 2 ) long -> rcx "CSE for expectedClsNode"
;
; Lcl frame size = 32
-G_M65415_IG01: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
+G_M65415_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rbx
sub rsp, 32
mov rbx, rcx
; gcrRegs +[rbx]
- ;; size=8 bbWeight=0.50 PerfScore 0.75
-G_M65415_IG02: ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ ;; size=8 bbWeight=1 PerfScore 1.50
+G_M65415_IG02: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
xor rax, rax
; gcrRegs +[rax]
- ;; size=2 bbWeight=0.50 PerfScore 0.12
-G_M65415_IG03: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
test rbx, rbx
- je SHORT G_M65415_IG06
- mov rbx, gword ptr [rbx+0x18]
- mov rax, rbx
- test rax, rax
- je SHORT G_M65415_IG05
- ;; size=17 bbWeight=1 PerfScore 4.75
-G_M65415_IG04: ; bbWeight=0.50, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
- mov rcx, 0xD1FFAB1E ; System.Xml.Schema.XmlSchema
- cmp qword ptr [rax], rcx
- jne SHORT G_M65415_IG07
- ;; size=15 bbWeight=0.50 PerfScore 2.12
-G_M65415_IG05: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
- test rax, rax
- je SHORT G_M65415_IG03
- ;; size=5 bbWeight=1 PerfScore 1.25
-G_M65415_IG06: ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref, epilog, nogc
+ jne SHORT G_M65415_IG04
+ ;; size=7 bbWeight=1 PerfScore 1.50
+G_M65415_IG03: ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref, epilog, nogc
; gcrRegs -[rbx]
add rsp, 32
pop rbx
ret
;; size=6 bbWeight=1 PerfScore 1.75
-G_M65415_IG07: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M65415_IG04: ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs -[rax] +[rbx]
+ mov rbx, gword ptr [rbx+0x18]
+ mov rax, rbx
+ ; gcrRegs +[rax]
+ test rax, rax
+ je SHORT G_M65415_IG06
+ ;; size=12 bbWeight=1 PerfScore 3.50
+G_M65415_IG05: ; bbWeight=0.50, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ mov rcx, 0xD1FFAB1E ; System.Xml.Schema.XmlSchema
+ cmp qword ptr [rax], rcx
+ jne SHORT G_M65415_IG08
+ ;; size=15 bbWeight=0.50 PerfScore 2.12
+G_M65415_IG06: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ test rax, rax
+ jne SHORT G_M65415_IG03
+ ;; size=5 bbWeight=1 PerfScore 1.25
+G_M65415_IG07: ; bbWeight=1.37, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ test rbx, rbx
+ jne SHORT G_M65415_IG04
+ jmp SHORT G_M65415_IG03
+ ;; size=7 bbWeight=1.37 PerfScore 4.45
+G_M65415_IG08: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax]
mov rdx, rbx
; gcrRegs +[rdx]
call CORINFO_HELP_ISINSTANCEOFCLASS
; gcrRegs -[rdx] +[rax]
; gcr arg pop 0
- jmp SHORT G_M65415_IG05
+ jmp SHORT G_M65415_IG06
;; size=10 bbWeight=0 PerfScore 0.00
-; Total bytes of code 63, prolog size 8, PerfScore 10.75, instruction count 21, allocated bytes for code 63 (MethodHash=360f0078) for method System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1)
+; Total bytes of code 70, prolog size 8, PerfScore 16.08, instruction count 24, allocated bytes for code 70 (MethodHash=360f0078) for method System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1) Looks like we straighten out the flow, and then immediately some other "duplicate tail condition" logic kicks in: @@ -1,119 +1,178 @@
*************** In fgUpdateFlowGraph()
Before updating the flow graph:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 73 [000..004)-> BB03(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..013)-> BB03(1) (always) i IBC hascall bwd bwd-target
BB03 [0002] 2 BB01,BB02 2 146 [013..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC bwd
BB04 [0003] 1 BB03 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 2 BB03,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Considering uncond to cond BB01 -> BB03
setting likelihood of BB01 -> BB05 from 1 to 0.5
setting likelihood of BB01 -> BB04 to 0.5
fgOptimizeUncondBranchToSimpleCond(from BB01 to cond BB03), modified BB01
expecting opts to key off V01 in BB01
Decreased BB03 profile weight from 146 to 73
-Considering uncond to cond BB02 -> BB03
+Forward substituting local after jump threading. Before:
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] J------N--- └──▌ NE int
+ [000027] ----------- ├──▌ LCL_VAR ref V01 loc0
+ [000028] ----------- └──▌ CNS_INT ref null
+
+After:
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] J------N--- └──▌ NE int
+ [000029] ----------- ├──▌ CNS_INT ref null
+ [000028] ----------- └──▌ CNS_INT ref null
+
+Now trying to fold...
+
+Folding operator with constant nodes into a constant:
+ [000026] J------N--- ▌ NE int
+ [000029] ----------- ├──▌ CNS_INT ref null
+ [000028] ----------- └──▌ CNS_INT ref null
+Bashed to int constant:
+ [000026] ----------- ▌ CNS_INT int 0
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] ----------- └──▌ CNS_INT int 0
+
+removing useless STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] ----------- └──▌ CNS_INT int 0
+ from BB01
+setting likelihood of BB01 -> BB04 from 0.5 to 1
+
+Conditional folded at BB01
+BB01 becomes a BBJ_ALWAYS to BB04
+Trying to compact last pred BB02 of BB03 that we now bypass
Compacting BB03 into BB02:
*************** In fgDebugCheckBBlist
+Considering uncond to cond BB01 -> BB04
After updating the flow graph:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
+BB01 [0000] 1 1 73 [000..004)-> BB04(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
-BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
+BB05 [0004] 2 BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
*************** Exception Handling table is empty
*************** In fgDebugCheckBBlist
*************** In fgExpandRarelyRunBlocks()
*************** In fgReorderBlocks()
Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
+BB01 [0000] 1 1 73 [000..004)-> BB04(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
+BB05 [0004] 2 BB02,BB04 1 73 [019..01B) (return) i IBC
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+
+Duplication of the conditional block BB04 (always branch from BB01) performed, because the cost of duplication (6) is less or equal than 6, validProfileWeights = true
+setting likelihood of BB01 -> BB02 to 0
+setting likelihood of BB01 -> BB05 from 1 to 1
+
+fgOptimizeBranch added these statements(s) at the end of BB01:
+STMT00008 ( 0x016[E-] ... ??? )
+ ( 7, 6) [000030] ----------- ▌ JTRUE void
+ ( 5, 4) [000031] J------N--- └──▌ EQ int
+ ( 3, 2) [000032] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000033] ----------- └──▌ CNS_INT ref null
+
+fgOptimizeBranch changed block BB01 from BBJ_ALWAYS to BBJ_COND.
+
+After this change in fgOptimizeBranch the BB graph is:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+BB01 [0000] 1 1 73 [000..004)-> BB05(1),BB02(0) ( cond ) i IBC
+BB02 [0001] 2 BB01,BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
+BB04 [0003] 1 BB02 1.37 100 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
*************** Finishing PHASE Optimize control flow
Trees after Optimize control flow
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
-BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
-BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
+BB01 [0000] 1 1 73 [000..004)-> BB05(1),BB02(0) ( cond ) i IBC
+BB02 [0001] 2 BB01,BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
+BB04 [0003] 1 BB02 1.37 100 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------- BB01 [0000] [000..004) -> BB05(0.5),BB04(0.5) (cond), preds={} succs={BB04,BB05}
+------------ BB01 [0000] [000..004) -> BB05(1),BB02(0) (cond), preds={} succs={BB02,BB05}
***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x001 )
[000001] DA---+----- ▌ STORE_LCL_VAR ref V01 loc0
[000000] -----+----- └──▌ CNS_INT ref null
***** BB01 [0000]
-STMT00007 ( ??? ... ??? )
- [000025] ----------- ▌ JTRUE void
- [000026] J------N--- └──▌ NE int
- [000027] ----------- ├──▌ LCL_VAR ref V01 loc0
- [000028] ----------- └──▌ CNS_INT ref null
+STMT00008 ( 0x016[E-] ... ??? )
+ ( 7, 6) [000030] ----------- ▌ JTRUE void
+ ( 5, 4) [000031] J------N--- └──▌ EQ int
+ ( 3, 2) [000032] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000033] ----------- └──▌ CNS_INT ref null
------------- BB02 [0001] [004..016) -> BB05(0.5),BB04(0.5) (cond), preds={BB04} succs={BB04,BB05}
+------------ BB02 [0001] [004..016) -> BB05(0.5),BB04(0.5) (cond), preds={BB01,BB04} succs={BB04,BB05}
***** BB02 [0001]
STMT00005 ( 0x004[E-] ... ??? )
[000015] DA-XG+----- ▌ STORE_LCL_VAR ref V00 arg0
[000021] ---XG+----- └──▌ IND ref
[000024] -----+----- └──▌ ADD byref
[000012] -----+----- ├──▌ LCL_VAR ref V00 arg0
[000023] -----+----- └──▌ CNS_INT long 24 Fseq[<unknown field>]
***** BB02 [0001]
STMT00006 ( 0x00C[E-] ... 0x012 )
[000019] DAC-G+----- ▌ STORE_LCL_VAR ref V01 loc0
[000018] --C-G+----- └──▌ CALL help ref CORINFO_HELP_ISINSTANCEOFCLASS
[000016] -----+----- arg1 in rdx ├──▌ LCL_VAR ref V00 arg0
[000017] H----+-N--- arg0 in rcx └──▌ CNS_INT(h) long 0x7fff78421508 class System.Xml.Schema.XmlSchema
***** BB02 [0001]
STMT00001 ( 0x013[E-] ... 0x014 )
[000005] -----+----- ▌ JTRUE void
[000004] J----+-N--- └──▌ NE int
[000002] -----+----- ├──▌ LCL_VAR ref V01 loc0
[000003] -----+----- └──▌ CNS_INT ref null
------------- BB04 [0003] [016..019) -> BB02(1),BB05(0) (cond), preds={BB01,BB02} succs={BB05,BB02}
+------------ BB04 [0003] [016..019) -> BB02(1),BB05(0) (cond), preds={BB02} succs={BB05,BB02}
***** BB04 [0003]
STMT00003 ( 0x016[E-] ... 0x017 )
- [000011] -----+----- ▌ JTRUE void
- [000010] J----+-N--- └──▌ NE int
- [000008] -----+----- ├──▌ LCL_VAR ref V00 arg0
- [000009] -----+----- └──▌ CNS_INT ref null
+ ( 7, 6) [000011] ----------- ▌ JTRUE void
+ ( 5, 4) [000010] J------N--- └──▌ NE int
+ ( 3, 2) [000008] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000009] ----------- └──▌ CNS_INT ref null
------------ BB05 [0004] [019..01B) (return), preds={BB01,BB02,BB04} succs={}
***** BB05 [0004]
STMT00002 ( 0x019[E-] ... 0x01A )
[000007] -----+----- ▌ RETURN ref
[000006] -----+----- └──▌ LCL_VAR ref V01 loc0
-------------------------------------------------------------------------------------------------------------------
The "duplicate tail" transformation here results in the loop no longer being in loop-inverted shape, so the final block layout looks worse. |
System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1)@@ -12,71 +12,70 @@
;
; V00 this [V00,T01] ( 4, 2.65) byref -> rsi this single-def
; V01 arg1 [V01,T00] ( 5, 4.02) ref -> rbx class-hnd single-def <System.Object>
-; V02 loc0 [V02,T04] ( 4, 1.56) double -> mm0
+; V02 loc0 [V02,T03] ( 4, 1.56) double -> mm6
; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V04 tmp1 [V04,T02] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
-; V05 tmp2 [V05,T03] ( 4, 1.04) ubyte -> rax "Inline return value spill temp"
-; V06 tmp3 [V06,T05] ( 3, 0.78) double -> mm0 "Inlining Arg"
+;* V04 tmp1 [V04 ] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
+; V05 tmp2 [V05,T02] ( 4, 1.04) ubyte -> rax "Inline return value spill temp"
+; V06 tmp3 [V06,T04] ( 3, 0.78) double -> mm0 "Inlining Arg"
;
-; Lcl frame size = 40
+; Lcl frame size = 56
G_M46727_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rsi
push rbx
- sub rsp, 40
+ sub rsp, 56
+ vmovaps xmmword ptr [rsp+0x20], xmm6
mov rsi, rcx
; byrRegs +[rsi]
mov rbx, rdx
; gcrRegs +[rbx]
- ;; size=12 bbWeight=1 PerfScore 2.75
+ ;; size=18 bbWeight=1 PerfScore 4.75
G_M46727_IG02: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
test rbx, rbx
- jne SHORT G_M46727_IG05
+ je SHORT G_M46727_IG07
;; size=5 bbWeight=1 PerfScore 1.25
-G_M46727_IG03: ; bbWeight=0.48, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M46727_IG03: ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
+ mov rcx, 0xD1FFAB1E ; System.Double
+ cmp qword ptr [rbx], rcx
+ jne SHORT G_M46727_IG07
+ ;; size=15 bbWeight=0.50 PerfScore 2.12
+G_M46727_IG04: ; bbWeight=0.52, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
+ mov rcx, 0xD1FFAB1E
+ call CORINFO_HELP_COUNTPROFILE32
+ ; gcr arg pop 0
+ vmovsd xmm6, qword ptr [rbx+0x08]
+ vucomisd xmm6, qword ptr [rsi]
+ jp SHORT G_M46727_IG09
+ jne SHORT G_M46727_IG09
+ ;; size=28 bbWeight=0.52 PerfScore 6.37
+G_M46727_IG05: ; bbWeight=0.26, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[rbx]
; byrRegs -[rsi]
+ mov eax, 1
+ ;; size=5 bbWeight=0.26 PerfScore 0.07
+G_M46727_IG06: ; bbWeight=0.52, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
+ vmovaps xmm6, xmmword ptr [rsp+0x20]
+ add rsp, 56
+ pop rbx
+ pop rsi
+ ret
+ ;; size=13 bbWeight=0.52 PerfScore 3.25
+G_M46727_IG07: ; bbWeight=0.48, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_COUNTPROFILE32
; gcr arg pop 0
xor eax, eax
;; size=17 bbWeight=0.48 PerfScore 0.72
-G_M46727_IG04: ; bbWeight=0.48, epilog, nogc, extend
- add rsp, 40
+G_M46727_IG08: ; bbWeight=0.48, epilog, nogc, extend
+ vmovaps xmm6, xmmword ptr [rsp+0x20]
+ add rsp, 56
pop rbx
pop rsi
ret
- ;; size=7 bbWeight=0.48 PerfScore 1.08
-G_M46727_IG05: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, gcvars, byref, isz
- ; gcrRegs +[rbx]
- ; byrRegs +[rsi]
- mov rcx, 0xD1FFAB1E ; System.Double
- cmp qword ptr [rbx], rcx
- jne SHORT G_M46727_IG03
- ;; size=15 bbWeight=0.50 PerfScore 2.12
-G_M46727_IG06: ; bbWeight=0.52, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
- mov rcx, 0xD1FFAB1E
- call CORINFO_HELP_COUNTPROFILE32
- ; gcr arg pop 0
- vmovsd xmm0, qword ptr [rbx+0x08]
- vucomisd xmm0, qword ptr [rsi]
- jp SHORT G_M46727_IG09
- jne SHORT G_M46727_IG09
- ;; size=28 bbWeight=0.52 PerfScore 6.37
-G_M46727_IG07: ; bbWeight=0.26, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ; gcrRegs -[rbx]
- ; byrRegs -[rsi]
- mov eax, 1
- ;; size=5 bbWeight=0.26 PerfScore 0.07
-G_M46727_IG08: ; bbWeight=0.52, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
- add rsp, 40
- pop rbx
- pop rsi
- ret
- ;; size=7 bbWeight=0.52 PerfScore 1.17
+ ;; size=13 bbWeight=0.48 PerfScore 3.00
G_M46727_IG09: ; bbWeight=0.26, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0040 {rsi}, gcvars, byref, isz
; byrRegs +[rsi]
- vucomisd xmm0, xmm0
+ vucomisd xmm6, xmm6
jp SHORT G_M46727_IG10
je SHORT G_M46727_IG11
;; size=8 bbWeight=0.26 PerfScore 1.04
@@ -85,15 +84,15 @@ G_M46727_IG10: ; bbWeight=0.13, gcrefRegs=0000 {}, byrefRegs=0040 {rsi},
vucomisd xmm0, xmm0
setp al
movzx rax, al
- jmp SHORT G_M46727_IG08
+ jmp SHORT G_M46727_IG06
;; size=16 bbWeight=0.13 PerfScore 1.20
G_M46727_IG11: ; bbWeight=0.13, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
; byrRegs -[rsi]
xor eax, eax
- jmp SHORT G_M46727_IG08
+ jmp SHORT G_M46727_IG06
;; size=4 bbWeight=0.13 PerfScore 0.29
-; Total bytes of code 124, prolog size 6, PerfScore 18.07, instruction count 38, allocated bytes for code 124 (MethodHash=203a4978) for method System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1)
+; Total bytes of code 142, prolog size 12, PerfScore 24.07, instruction count 41, allocated bytes for code 142 (MethodHash=203a4978) for method System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1) Looks like after we folded some control flow we managed to compact some more blocks, and this results in worse block layout now. |
System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1)@@ -9,22 +9,22 @@
; 3 inlinees with PGO data; 0 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 arg0 [V00,T03] ( 12, 46.60) ref -> rbx class-hnd single-def <System.Type>
-; V01 loc0 [V01,T09] ( 3, 0 ) int -> rax
-; V02 loc1 [V02,T01] ( 6, 65.17) int -> rsi
+; V00 arg0 [V00,T03] ( 15, 20.18) ref -> rbx class-hnd single-def <System.Type>
+; V01 loc0 [V01,T07] ( 3, 0 ) int -> rax
+; V02 loc1 [V02,T00] ( 10, 65.17) int -> rsi
; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-; V04 tmp1 [V04,T05] ( 2, 0 ) ubyte -> rax "Inline return value spill temp"
-; V05 tmp2 [V05,T00] ( 5, 92.71) ref -> rbp class-hnd "Inlining Arg" <System.Type>
-;* V06 tmp3 [V06,T06] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
-; V07 tmp4 [V07,T08] ( 3, 0.04) int -> rax "Inline return value spill temp"
-; V08 tmp5 [V08,T07] ( 3, 0.08) int -> rax "guarded devirt return temp"
+; V04 tmp1 [V04,T08] ( 2, 0 ) ubyte -> rax "Inline return value spill temp"
+; V05 tmp2 [V05,T01] ( 7, 63.39) ref -> rbp class-hnd "Inlining Arg" <System.Type>
+;* V06 tmp3 [V06 ] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
+; V07 tmp4 [V07,T06] ( 3, 0.04) int -> rax "Inline return value spill temp"
+; V08 tmp5 [V08,T05] ( 3, 0.08) int -> rax "guarded devirt return temp"
;* V09 tmp6 [V09 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "guarded devirt this exact temp" <System.RuntimeType>
;* V10 tmp7 [V10 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V11 tmp8 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "Inlining Arg" <System.RuntimeType>
;* V12 tmp9 [V12 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline stloc first use temp" <System.RuntimeType>
-; V13 tmp10 [V13,T02] ( 2, 63.09) ref -> rcx "arr expr"
+; V13 tmp10 [V13,T02] ( 4, 63.09) ref -> rax "arr expr"
;* V14 tmp11 [V14 ] ( 0, 0 ) ref -> zero-ref "argument with side effect"
-; V15 cse0 [V15,T04] ( 2, 16.77) ref -> rdi hoist "CSE #01: aggressive"
+; V15 cse0 [V15,T04] ( 4, 16.78) ref -> rdi hoist multi-def "CSE #01: aggressive"
;
; Lcl frame size = 40
@@ -37,65 +37,93 @@ G_M6739_IG01: ; bbWeight=1.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
mov rbx, rcx
; gcrRegs +[rbx]
;; size=11 bbWeight=1.00 PerfScore 4.49
-G_M6739_IG02: ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+G_M6739_IG02: ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
xor esi, esi
+ test rbx, rbx
+ je SHORT G_M6739_IG09
+ mov rcx, 0xD1FFAB1E ; System.RuntimeType
+ cmp qword ptr [rbx], rcx
+ jne SHORT G_M6739_IG09
mov rcx, 0xD1FFAB1E ; const ptr
mov rdi, gword ptr [rcx]
; gcrRegs +[rdi]
- ;; size=15 bbWeight=1.00 PerfScore 2.49
-G_M6739_IG03: ; bbWeight=15.77, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
- mov rcx, rdi
- ; gcrRegs +[rcx]
- mov eax, esi
- mov rbp, gword ptr [rcx+8*rax+0x10]
+ ;; size=35 bbWeight=1.00 PerfScore 7.98
+G_M6739_IG03: ; bbWeight=15.62, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, rdi
+ ; gcrRegs +[rax]
+ mov ecx, esi
+ mov rbp, gword ptr [rax+8*rcx+0x10]
; gcrRegs +[rbp]
cmp rbx, rbp
- je SHORT G_M6739_IG12
- ;; size=15 bbWeight=15.77 PerfScore 59.15
-G_M6739_IG04: ; bbWeight=14.83, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- ; gcrRegs -[rcx]
- test rbx, rbx
- je SHORT G_M6739_IG07
- ;; size=5 bbWeight=14.83 PerfScore 18.54
-G_M6739_IG05: ; bbWeight=14.81, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- test rbp, rbp
- je SHORT G_M6739_IG07
- ;; size=5 bbWeight=14.81 PerfScore 18.51
-G_M6739_IG06: ; bbWeight=13.88, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- mov rcx, 0xD1FFAB1E ; System.RuntimeType
- cmp qword ptr [rbx], rcx
- jne SHORT G_M6739_IG14
- ;; size=15 bbWeight=13.88 PerfScore 58.98
-G_M6739_IG07: ; bbWeight=15.81, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
- ; gcrRegs -[rbp]
+ je SHORT G_M6739_IG15
+ ;; size=15 bbWeight=15.62 PerfScore 58.56
+G_M6739_IG04: ; bbWeight=15.66, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax rbp]
inc esi
cmp esi, 41
jl SHORT G_M6739_IG03
- ;; size=7 bbWeight=15.81 PerfScore 23.72
-G_M6739_IG08: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ ;; size=7 bbWeight=15.66 PerfScore 23.48
+G_M6739_IG05: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
; gcrRegs -[rdi]
test rbx, rbx
- je SHORT G_M6739_IG16
- ;; size=5 bbWeight=0.04 PerfScore 0.05
-G_M6739_IG09: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ je G_M6739_IG19
+ ;; size=9 bbWeight=0.04 PerfScore 0.05
+G_M6739_IG06: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
mov rcx, 0xD1FFAB1E ; System.RuntimeType
cmp qword ptr [rbx], rcx
- jne SHORT G_M6739_IG15
- ;; size=15 bbWeight=0.04 PerfScore 0.18
-G_M6739_IG10: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ jne G_M6739_IG18
+ ;; size=19 bbWeight=0.04 PerfScore 0.18
+G_M6739_IG07: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
mov rcx, rbx
; gcrRegs +[rcx]
call [System.RuntimeType:GetTypeCodeImpl():int:this]
; gcrRegs -[rcx rbx]
; gcr arg pop 0
;; size=9 bbWeight=0.04 PerfScore 0.13
-G_M6739_IG11: ; bbWeight=0.04, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- jmp SHORT G_M6739_IG17
- ;; size=2 bbWeight=0.04 PerfScore 0.08
-G_M6739_IG12: ; bbWeight=0.96, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M6739_IG08: ; bbWeight=0.04, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ jmp G_M6739_IG20
+ ;; size=5 bbWeight=0.04 PerfScore 0.08
+G_M6739_IG09: ; bbWeight=0.01, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ ; gcrRegs +[rbx]
+ mov rax, 0xD1FFAB1E ; const ptr
+ mov rdi, gword ptr [rax]
+ ; gcrRegs +[rdi]
+ ;; size=13 bbWeight=0.01 PerfScore 0.02
+G_M6739_IG10: ; bbWeight=0.16, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, rdi
+ ; gcrRegs +[rax]
+ mov edx, esi
+ mov rbp, gword ptr [rax+8*rdx+0x10]
+ ; gcrRegs +[rbp]
+ cmp rbx, rbp
+ je SHORT G_M6739_IG15
+ ;; size=15 bbWeight=0.16 PerfScore 0.59
+G_M6739_IG11: ; bbWeight=0.15, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax]
+ test rbx, rbx
+ je SHORT G_M6739_IG14
+ ;; size=5 bbWeight=0.15 PerfScore 0.19
+G_M6739_IG12: ; bbWeight=0.15, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ test rbp, rbp
+ je SHORT G_M6739_IG14
+ ;; size=5 bbWeight=0.15 PerfScore 0.19
+G_M6739_IG13: ; bbWeight=0.14, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, 0xD1FFAB1E ; System.RuntimeType
+ cmp qword ptr [rbx], rax
+ jne SHORT G_M6739_IG17
+ ;; size=15 bbWeight=0.14 PerfScore 0.59
+G_M6739_IG14: ; bbWeight=0.16, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rbp]
+ inc esi
+ cmp esi, 41
+ jl SHORT G_M6739_IG10
+ jmp SHORT G_M6739_IG05
+ ;; size=9 bbWeight=0.16 PerfScore 0.55
+G_M6739_IG15: ; bbWeight=0.96, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rbx rdi]
mov eax, esi
;; size=2 bbWeight=0.96 PerfScore 0.24
-G_M6739_IG13: ; bbWeight=0.96, epilog, nogc, extend
+G_M6739_IG16: ; bbWeight=0.96, epilog, nogc, extend
add rsp, 40
pop rbx
pop rbp
@@ -103,7 +131,7 @@ G_M6739_IG13: ; bbWeight=0.96, epilog, nogc, extend
pop rdi
ret
;; size=9 bbWeight=0.96 PerfScore 3.12
-G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M6739_IG17: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs +[rbx rbp rdi]
mov rdx, rbp
; gcrRegs +[rdx]
@@ -112,7 +140,7 @@ G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {r
; gcrRegs -[rdx] +[rax]
; gcr arg pop 0
test rax, rax
- jne SHORT G_M6739_IG07
+ jne SHORT G_M6739_IG14
mov rcx, rbx
; gcrRegs +[rcx]
mov rdx, rbp
@@ -124,10 +152,10 @@ G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {r
; gcrRegs -[rcx rdx rbp]
; gcr arg pop 0
test eax, eax
- je SHORT G_M6739_IG07
- jmp SHORT G_M6739_IG12
+ je SHORT G_M6739_IG14
+ jmp SHORT G_M6739_IG15
;; size=48 bbWeight=0 PerfScore 0.00
-G_M6739_IG15: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+G_M6739_IG18: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
; gcrRegs -[rdi]
mov rcx, rbx
; gcrRegs +[rcx]
@@ -136,17 +164,17 @@ G_M6739_IG15: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byre
call [rax+0x10]<unknown method>
; gcrRegs -[rcx rbx]
; gcr arg pop 0
- jmp SHORT G_M6739_IG11
- ;; size=18 bbWeight=0 PerfScore 0.00
-G_M6739_IG16: ; bbWeight=0.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ jmp G_M6739_IG08
+ ;; size=21 bbWeight=0 PerfScore 0.00
+G_M6739_IG19: ; bbWeight=0.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
xor eax, eax
;; size=2 bbWeight=0.00 PerfScore 0.00
-G_M6739_IG17: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M6739_IG20: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
xor ecx, ecx
cmp eax, 1
cmove eax, ecx
;; size=8 bbWeight=0 PerfScore 0.00
-G_M6739_IG18: ; bbWeight=0, epilog, nogc, extend
+G_M6739_IG21: ; bbWeight=0, epilog, nogc, extend
add rsp, 40
pop rbx
pop rbp
@@ -155,7 +183,7 @@ G_M6739_IG18: ; bbWeight=0, epilog, nogc, extend
ret
;; size=9 bbWeight=0 PerfScore 0.00
-; Total bytes of code 200, prolog size 11, PerfScore 189.68, instruction count 67, allocated bytes for code 200 (MethodHash=2fe8e5ac) for method System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1)
+; Total bytes of code 271, prolog size 11, PerfScore 100.44, instruction count 83, allocated bytes for code 271 (MethodHash=2fe8e5ac) for method System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1) This one just looks like loop cloning now managed to kick in. |
/azp run FUzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
Recognizing a series of handle compares as a switch is not legal since those handles may change values.
cc @dotnet/jit-contrib PTAL @AndyAyersMS Diffs. They are quite mixed, but overall a decent improvement. TP also improves quite a bit. I analyzed some of the regressions above -- a lot of them seems to be that this change results in various different block layout decisions. |
@@ -86,7 +86,7 @@ bool IsConstantTestCondBlock(const BasicBlock* block, | |||
} | |||
|
|||
// We're looking for "X EQ/NE CNS" or "CNS EQ/NE X" pattern | |||
if (op1->IsCnsIntOrI() ^ op2->IsCnsIntOrI()) | |||
if ((op1->IsCnsIntOrI() && !op1->IsIconHandle()) ^ (op2->IsCnsIntOrI() && !op2->IsIconHandle())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already merged as a separate change in #104634.
src/coreclr/jit/fgopt.cpp
Outdated
// No point duplicating this block if it would not remove (part of) the joint. | ||
if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) | ||
{ | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hitting some divergence in fgUpdateFlowGraph
without this check. It only has a handful of diffs on its own.
src/coreclr/jit/fgopt.cpp
Outdated
@@ -1966,6 +1966,12 @@ bool Compiler::fgBlockIsGoodTailDuplicationCandidate(BasicBlock* target, unsigne | |||
return false; | |||
} | |||
|
|||
// No point duplicating this block if it would not remove (part of) the joint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// No point duplicating this block if it would not remove (part of) the joint. | |
// No point duplicating this block if it would not remove (part of) the join. |
src/coreclr/jit/fgopt.cpp
Outdated
@@ -1966,6 +1966,12 @@ bool Compiler::fgBlockIsGoodTailDuplicationCandidate(BasicBlock* target, unsigne | |||
return false; | |||
} | |||
|
|||
// No point duplicating this block if it would not remove (part of) the joint. | |||
if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) | |
if (target->TrueTargetIs(target) || target->FalseTargetIs(target)) |
Thanks for looking into the regressions!
Interesting -- I would hope more compaction wouldn't regress block layout (I'm guessing it's instead pessimizing some other flow opts). I haven't looked closely at the regressions from #103785 though... |
I think we can certainly consider taking this. Can you look at this diff in benchmarks_pgo?
Curious if it's more cloning or whatnot... |
It doesn't look to be more loop cloning. From the codegen diff it looks like different block layout that leads to different register allocation: https://www.diffchecker.com/O3MjiTIP/ |
This diff is interesting (coming from benchmarks.run_pgo.windows.x64.checked.mch), where it stores -G_M58686_IG01: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- ;; size=0 bbWeight=0 PerfScore 0.00
-G_M58686_IG02: ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
+G_M58686_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
+ ;; size=0 bbWeight=1 PerfScore 0.00
+G_M58686_IG02: ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
; gcrRegs +[rcx]
xor eax, eax
mov dword ptr [rcx+0x18], eax
- and dword ptr [rcx+0x18], 0xD1FFAB1E
- ;; size=12 bbWeight=0 PerfScore 0.00
-G_M58686_IG03: ; bbWeight=0, epilog, nogc, extend
+ mov eax, dword ptr [rcx+0x18]
+ and eax, 0xD1FFAB1E
+ mov dword ptr [rcx+0x18], eax
+ ;; size=16 bbWeight=1 PerfScore 4.50
+G_M58686_IG03: ; bbWeight=1, epilog, nogc, extend
ret
- ;; size=1 bbWeight=0 PerfScore 0.00
+ ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 13, prolog size 0, PerfScore 0.00, instruction count 4, allocated bytes for code 13 (MethodHash=f3a21ac1) for method System.Threading.Tasks.Task+SetOnInvokeMres:.ctor():this (Tier1)
+; Total bytes of code 17, prolog size 0, PerfScore 5.50, instruction count 6, allocated bytes for code 17 (MethodHash=f3a21ac1) for method System.Threading.Tasks.Task+SetOnInvokeMres:.ctor():this (Tier1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM overall.
Looks like in the diff we did a CSE that stops us from doing the containment. The current collections do the same CSE even with the baseline it seems, so I can't repro it. |
Ok, managed to find the right context for the above (turns out my collection was the outdated one). Before flowgraph opts we have these blocks: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 100 [000..001)-> BB02(1) (always) i IBC
BB02 [0013] 1 BB01 0 0 [000..001)-> BB06(1) (always) i IBC rare
BB03 [0014] 0 1 100 [000..001)-> BB05(1),BB04(0.000315) ( cond ) i IBC
BB04 [0015] 1 BB03 0.00 0 [000..001)-> BB06(1) (always) i IBC
BB05 [0016] 1 BB03 1.00 100 [000..001)-> BB06(1) (always) i IBC
BB06 [0017] 3 BB02,BB04,BB05 1 100 [000..001)-> BB08(1),BB07(0) ( cond ) i IBC
BB07 [0009] 1 BB06 0 0 [000..001) (throw ) i IBC rare hascall gcsafe
BB08 [0010] 1 BB06 1 100 [000..008)-> BB10(1) (always) i IBC
BB09 [0041] 0 0 0 [000..001)-> BB10(1) (always) i IBC rare hascall gcsafe
BB10 [0042] 2 BB08,BB09 1 100 [000..001)-> BB12(1),BB11(0) ( cond ) i IBC
BB11 [0046] 1 BB10 0 0 [000..001)-> BB12(1) (always) i IBC rare hascall gcsafe
BB12 [0047] 2 BB10,BB11 1 100 [000..009) (return) i IBC
--------------------------------------------------------------------------------------------------------------------------------------------------------------------- Looks like in the base the flowgraph optimizations we do mean we end up with all 0 weight basic blocks going into CSE: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 0 0 [000..001)-> BB03(1),BB02(0) ( cond ) i IBC rare
BB02 [0009] 1 BB01 0 0 [000..001) (throw ) i IBC rare hascall gcsafe
BB03 [0010] 1 BB01 0 0 [000..008)-> BB05(1),BB04(0) ( cond ) i IBC rare
BB04 [0046] 1 BB03 0 0 [000..001)-> BB05(1) (always) i IBC rare hascall gcsafe
BB05 [0047] 2 BB03,BB04 0 0 [000..009) (return) i IBC rare
--------------------------------------------------------------------------------------------------------------------------------------------------------------------- CSE decides to do no CSE because of the 0 weights. OTOH, with the early folding done by this PR there is just one block going into CSE, which appeared after a bunch of compaction. It seems the result is a different weight that means CSE does end up happening: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 100 [000..009) (return) i IBC
--------------------------------------------------------------------------------------------------------------------------------------------------------------------- The "inconsistent weights" problem is probably (similar to) #96614. In the end we also ought to recognize the address mode regardless of the intervening local created by CSE (it ends up single-def single-used after a bunch of transformations). That goes under the #104538 umbrella. |
After early jump threading has kicked in we can frequently prove that a branch will go in one direction. This was previously left up to RBO; instead, try to fold it immediately and straighten out the flow before we build SSA, so that we can refine the phis.
Fix #101176
Diffs here are very mixed... Would have expected it to be generally a good thing to straighten out flow like this, but apparently not unequivocally so.