feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

vezenovm · 2024-09-18T18:33:29Z

Description

Problem*

Resolves

Part of general effort to improve mem2reg.

Summary*

We sometimes have situations such as the following:

            b3():
              v9 = load v0
              v10 = eq v9, Field 2
              constrain v9 == Field 2
              v11 = load v2
              v12 = load v2
              v13 = eq v12, Field 2
              constrain v11 == Field 2

v2 does not have a known value, thus we do not remove the load. The mem2reg pass is acting as expected here. However, without a store or call to the reference between v11 = load v2 and v12 = load v2 we should be able to safely remove v12 = load v2 and map v12 -> v11.

This PR adds this logic as part of the initial mem2reg pass. We have a new last_loads map as part of a Block. This is currently cleared after analyzing block and is meant to only be per block. Unifying these last loads across blocks and the accurate predecessors can come in a follow-up. This is an initial proof of concept to show the optimizations validity.

Given an instruction we act as following:

Load

Check if we have a last load from the current load address. If we do, remove the current current and map its result to the previous load result.
Add a last load for the address.

Store

Remove the last load for the address

Call

Remove the last load for any reference arguments

I have also added two unit tests to mem2reg.rs

Additional Context

Documentation*

Check one:

No documentation needed.
Documentation included in this PR.
[For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

I have tested the changes locally.
I have formatted the changes with Prettier and/or cargo fmt on default settings.

github-actions · 2024-09-18T18:36:56Z

Changes to Brillig bytecode sizes

Generated at commit: ca2a1cd548c9e343efa47d56d4a6e52e1c48b817, compared to commit: 649117570b95b26776150e337c458d478eb48c2e

🧾 Summary (10% most significant diffs)

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	+10 ❌	+5.78%
regression_5252	-106 ✅	-2.29%
poseidonsponge_x5_254	-101 ✅	-2.37%
array_sort	-9 ✅	-2.98%
6_array	-14 ✅	-3.45%
array_to_slice	-57 ✅	-7.74%

Full diff report 👇

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	183 (+10)	+5.78%
sha256	2,212 (-1)	-0.05%
sha256_regression	6,541 (-3)	-0.05%
sha256_var_size_regression	1,705 (-1)	-0.06%
sha256_var_witness_const_regression	1,232 (-1)	-0.08%
array_dynamic_blackbox_input	1,021 (-1)	-0.10%
sha256_brillig_performance_regression	1,632 (-2)	-0.12%
sha256_var_padding_regression	4,763 (-6)	-0.13%
slice_regex	2,163 (-3)	-0.14%
7_function	534 (-1)	-0.19%
keccak256	1,784 (-4)	-0.22%
bigint	1,991 (-5)	-0.25%
brillig_cow_regression	2,137 (-6)	-0.28%
ram_blowup_regression	953 (-3)	-0.31%
nested_array_dynamic	1,985 (-7)	-0.35%
conditional_1	1,177 (-6)	-0.51%
schnorr	1,414 (-8)	-0.56%
hashmap	19,872 (-148)	-0.74%
to_le_bytes	132 (-1)	-0.75%
array_if_cond_simple	131 (-1)	-0.76%
uhashmap	13,196 (-107)	-0.80%
tuple_inputs	364 (-3)	-0.82%
generics	94 (-1)	-1.05%
fold_numeric_generic_poseidon	748 (-10)	-1.32%
no_predicates_numeric_generic_poseidon	748 (-10)	-1.32%
nested_array_in_slice	1,098 (-15)	-1.35%
slices	1,742 (-24)	-1.36%
u128	2,757 (-38)	-1.36%
to_be_bytes	208 (-3)	-1.42%
hash_to_field	136 (-2)	-1.45%
poseidon2	340 (-5)	-1.45%
bench_2_to_17	333 (-5)	-1.48%
fold_2_to_17	571 (-10)	-1.72%
poseidon_bn254_hash_width_3	5,330 (-96)	-1.77%
poseidon_bn254_hash	5,330 (-96)	-1.77%
sha2_byte	2,723 (-51)	-1.84%
databus_two_calldata	213 (-4)	-1.84%
slice_dynamic_index	2,523 (-53)	-2.06%
eddsa	10,215 (-219)	-2.10%
regression_5252	4,526 (-106)	-2.29%
poseidonsponge_x5_254	4,164 (-101)	-2.37%
array_sort	293 (-9)	-2.98%
6_array	392 (-14)	-3.45%
array_to_slice	679 (-57)	-7.74%

vezenovm · 2024-09-18T18:42:23Z

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

vezenovm · 2024-09-18T19:30:05Z

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

The main difference between the SSA on master and this PR looks to be the inc_rc instructions remaining in place. Before mem2reg we have this pattern:

    v61 = load v52
    inc_rc v61
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    v64 = load v51
    inc_rc v64
    v65 = load v52
    inc_rc v65
    v66 = load v52
    inc_rc v66
    v67 = load v52
    v68 = load v53
    v70 = lt v68, u32 4
    constrain v70 == u1 1 '"push out of bounds"'
    v72 = load v52
    v73 = load v53
    v74 = load v52
    v75 = load v53
    v76 = mul v75, u32 4
    v77 = array_set v72, index v76, value Field 0

The repeat loads are removed in this PR, but those follow-up inc_rc instructions remain. I think this can be handled in a follow-up though so I am marking this PR ready for review again.

michaeljklein

A couple cleanup notes, but otherwise LGTM

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

github-actions · 2024-11-22T09:17:10Z

Changes to number of Brillig opcodes executed

Generated at commit: ca2a1cd548c9e343efa47d56d4a6e52e1c48b817, compared to commit: 649117570b95b26776150e337c458d478eb48c2e

🧾 Summary (10% most significant diffs)

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	+14 ❌	+4.79%
hash_to_field	-32 ✅	-3.42%
hashmap	-1,975 ✅	-3.53%
eddsa	-27,694 ✅	-3.78%
sha2_byte	-1,895 ✅	-3.90%
array_to_slice	-159 ✅	-8.88%

Full diff report 👇

Program	Brillig opcodes (+/-)	%
brillig_rc_regression_6123	306 (+14)	+4.79%
ram_blowup_regression	778,664 (-17)	-0.00%
sha256_regression	116,177 (-3)	-0.00%
sha256_var_padding_regression	219,713 (-6)	-0.00%
sha256_var_size_regression	16,344 (-1)	-0.01%
sha256	13,844 (-1)	-0.01%
array_dynamic_blackbox_input	18,179 (-2)	-0.01%
sha256_var_witness_const_regression	6,740 (-1)	-0.01%
brillig_cow_regression	518,944 (-96)	-0.02%
sha256_brillig_performance_regression	22,977 (-16)	-0.07%
array_if_cond_simple	537 (-1)	-0.19%
nested_array_dynamic	3,109 (-7)	-0.22%
slice_regex	3,389 (-9)	-0.26%
conditional_1	5,700 (-18)	-0.31%
uhashmap	146,485 (-526)	-0.36%
slices	2,865 (-19)	-0.66%
nested_array_in_slice	1,459 (-12)	-0.82%
databus_two_calldata	445 (-4)	-0.89%
tuple_inputs	632 (-6)	-0.94%
slice_dynamic_index	4,332 (-42)	-0.96%
u128	25,139 (-291)	-1.14%
schnorr	10,226 (-128)	-1.24%
7_function	2,464 (-32)	-1.28%
to_be_bytes	2,448 (-33)	-1.33%
poseidon2	694 (-11)	-1.56%
fold_numeric_generic_poseidon	5,051 (-81)	-1.58%
no_predicates_numeric_generic_poseidon	5,051 (-81)	-1.58%
keccak256	33,047 (-544)	-1.62%
6_array	1,631 (-36)	-2.16%
bench_2_to_17	576,781 (-13,067)	-2.22%
generics	130 (-3)	-2.26%
fold_2_to_17	1,069,061 (-24,854)	-2.27%
to_le_bytes	1,152 (-31)	-2.62%
poseidonsponge_x5_254	182,435 (-6,075)	-3.22%
poseidon_bn254_hash	161,657 (-5,417)	-3.24%
poseidon_bn254_hash_width_3	161,657 (-5,417)	-3.24%
regression_5252	908,592 (-30,508)	-3.25%
array_sort	563 (-19)	-3.26%
hash_to_field	905 (-32)	-3.42%
hashmap	53,917 (-1,975)	-3.53%
eddsa	704,725 (-27,694)	-3.78%
sha2_byte	46,690 (-1,895)	-3.90%
array_to_slice	1,631 (-159)	-8.88%

vezenovm · 2024-11-22T09:31:48Z

Following some regressions from PR #6505 (#6505 (comment)) I decided to update this PR with master. It looks like we have some benefits from the optimizations aside for brillig_rc_regression_6123. However, it looks to be a minor regression so I am marking this PR ready for review again.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

TomAFrench

still need to do a proper review but a couple of nits

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

Co-authored-by: Tom French <[email protected]>

…into mv/remove-last-loads-per-block

jfecher

LGTM. I'm still not 100% on the interactions with aliases here but I haven't been able to trick it with e.g. nested mutable references to create aliases, passing one of those to a function, mutating it, then loading again trying to get that load removed since only the alias was passed.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

Co-authored-by: jfecher <[email protected]>

.

fix(LSP): use generic self type to narrow down methods to complete (noir-lang/noir#6617) fix!: Disallow `#[export]` on associated methods (noir-lang/noir#6626) chore: redo typo PR by donatik27 (noir-lang/noir#6575) chore: redo typo PR by Dimitrolito (noir-lang/noir#6614) feat: simplify `jmpif`s by reversing branches if condition is negated (noir-lang/noir#5891) fix: Do not warn on unused functions marked with #[export] (noir-lang/noir#6625) chore: Add panic for compiler error described in #6620 (noir-lang/noir#6621) feat(perf): Track last loads per block in mem2reg and remove them if possible (noir-lang/noir#6088) fix(ssa): Track all local allocations during flattening (noir-lang/noir#6619) feat(comptime): Implement blackbox functions in comptime interpreter (noir-lang/noir#6551) chore: derive PartialEq and Hash for FieldElement (noir-lang/noir#6610) chore: ignore almost-empty directories in nargo_cli tests (noir-lang/noir#6611) chore: remove temporary allocations from `num_bits` (noir-lang/noir#6600) chore: Release Noir(1.0.0-beta.0) (noir-lang/noir#6562) feat: Add `array_refcount` and `slice_refcount` builtins for debugging (noir-lang/noir#6584) chore!: Require types of globals to be specified (noir-lang/noir#6592) fix: don't report visibility errors when elaborating comptime value (noir-lang/noir#6498) fix: preserve newlines between comments when formatting statements (noir-lang/noir#6601) fix: parse a bit more SSA stuff (noir-lang/noir#6599) chore!: remove eddsa from stdlib (noir-lang/noir#6591) chore: Typo in oracles how to (noir-lang/noir#6598) feat(ssa): Loop invariant code motion (noir-lang/noir#6563) fix: remove `compiler_version` from new `Nargo.toml` (noir-lang/noir#6590) feat: Avoid incrementing reference counts in some cases (noir-lang/noir#6568) chore: fix typo in test name (noir-lang/noir#6589) fix: consider prereleases to be compatible with pre-1.0.0 releases (noir-lang/noir#6580) feat: try to inline brillig calls with all constant arguments (noir-lang/noir#6548) fix: correct type when simplifying `derive_pedersen_generators` (noir-lang/noir#6579) feat: Sync from aztec-packages (noir-lang/noir#6576)

handle last_loads per block and remove them if possible

90712c2

vezenovm changed the title ~~feat(perf): Track last loads per block and remove them if possible~~ feat(perf): Track last loads per block in mem2reg and remove them if possible Sep 18, 2024

vezenovm added 4 commits September 18, 2024 18:34

remove debugging ssa file

8bf6ff6

cleanup comments

12cd64e

remove test debugging things

8c077aa

Merge branch 'master' into mv/remove-last-loads-per-block

5b88acd

vezenovm requested a review from a team September 18, 2024 18:40

vezenovm marked this pull request as draft September 18, 2024 18:56

actually removei nstruction do not just map value

ccd7298

vezenovm marked this pull request as ready for review September 18, 2024 19:30

vezenovm mentioned this pull request Sep 18, 2024

feat(perf): Remove inc_rc/dec_rc instructions that follow a removed load in mem2reg #6092

Closed

5 tasks

Merge branch 'master' into mv/remove-last-loads-per-block

65a461f

michaeljklein previously requested changes Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs

2fccdbc

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

26f4041

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

18b35e5

vezenovm commented Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

c275357

vezenovm requested review from michaeljklein and jfecher September 19, 2024 16:51

jfecher reviewed Sep 19, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

vezenovm added 3 commits September 19, 2024 21:12

fmt

171108f

cleanup

360654e

one more cleanup

0bf0525

merge conflicts w/ master

f66b2c6

vezenovm marked this pull request as ready for review November 22, 2024 09:27

vezenovm requested a review from a team November 22, 2024 09:32

TomAFrench reviewed Nov 22, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Show resolved Hide resolved

vezenovm added 2 commits November 22, 2024 11:40

update tests to use SSA parser

cb0dd20

Merge branch 'master' into mv/remove-last-loads-per-block

9ac843a

TomAFrench reviewed Nov 22, 2024

View reviewed changes

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved

compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved

vezenovm and others added 4 commits November 22, 2024 17:10

use or_default()

65f8fea

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs

c8bb5cc

Co-authored-by: Tom French <[email protected]>

Merge remote-tracking branch 'origin/mv/remove-last-loads-per-block' …

7067aff

…into mv/remove-last-loads-per-block

cleaup comments

4ef0204

vezenovm requested a review from TomAFrench November 22, 2024 17:28

jfecher approved these changes Nov 22, 2024

View reviewed changes

vezenovm and others added 4 commits November 25, 2024 10:39

Update compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs

ffafdcc

Co-authored-by: jfecher <[email protected]>

move last loads clearing

ac925cb

pr comments and cleanup

40396f5

Merge branch 'master' into mv/remove-last-loads-per-block

e18c6a6

TomAFrench approved these changes Nov 26, 2024

View reviewed changes

merge conflicts w/ master

ce1b777

vezenovm added this pull request to the merge queue Nov 26, 2024

Merged via the queue into master with commit 624ae6c Nov 26, 2024
49 checks passed

vezenovm deleted the mv/remove-last-loads-per-block branch November 26, 2024 14:16

noirwhal mentioned this pull request Nov 26, 2024

chore: Release Noir(1.0.0-beta.1) #6622

Open

AztecBot mentioned this pull request Nov 27, 2024

feat: Sync from noir AztecProtocol/aztec-packages#10110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024

michaeljklein left a comment

github-actions bot commented Nov 22, 2024 •

edited

Loading

vezenovm commented Nov 22, 2024

TomAFrench left a comment

jfecher left a comment

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

Conversation

vezenovm commented Sep 18, 2024 • edited Loading

Description

Problem*

Summary*

Load

Store

Call

Additional Context

Documentation*

PR Checklist*

github-actions bot commented Sep 18, 2024 • edited Loading

Changes to Brillig bytecode sizes

🧾 Summary (10% most significant diffs)

vezenovm commented Sep 18, 2024 • edited Loading

vezenovm commented Sep 18, 2024

michaeljklein left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 22, 2024 • edited Loading

Changes to number of Brillig opcodes executed

🧾 Summary (10% most significant diffs)

vezenovm commented Nov 22, 2024

TomAFrench left a comment

Choose a reason for hiding this comment

jfecher left a comment

Choose a reason for hiding this comment

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 18, 2024 •

edited

Loading

vezenovm commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading