Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(perf): Track last loads per block in mem2reg and remove them if possible #6088

Merged
merged 40 commits into from
Nov 26, 2024

Conversation

vezenovm
Copy link
Contributor

@vezenovm vezenovm commented Sep 18, 2024

Description

Problem*

Resolves

Part of general effort to improve mem2reg.

Summary*

We sometimes have situations such as the following:

            b3():
              v9 = load v0
              v10 = eq v9, Field 2
              constrain v9 == Field 2
              v11 = load v2
              v12 = load v2
              v13 = eq v12, Field 2
              constrain v11 == Field 2

v2 does not have a known value, thus we do not remove the load. The mem2reg pass is acting as expected here. However, without a store or call to the reference between v11 = load v2 and v12 = load v2 we should be able to safely remove v12 = load v2 and map v12 -> v11.

This PR adds this logic as part of the initial mem2reg pass. We have a new last_loads map as part of a Block. This is currently cleared after analyzing block and is meant to only be per block. Unifying these last loads across blocks and the accurate predecessors can come in a follow-up. This is an initial proof of concept to show the optimizations validity.

Given an instruction we act as following:

Load

  • Check if we have a last load from the current load address. If we do, remove the current current and map its result to the previous load result.
  • Add a last load for the address.

Store

  • Remove the last load for the address

Call

  • Remove the last load for any reference arguments

I have also added two unit tests to mem2reg.rs

Additional Context

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@vezenovm vezenovm changed the title feat(perf): Track last loads per block and remove them if possible feat(perf): Track last loads per block in mem2reg and remove them if possible Sep 18, 2024
Copy link
Contributor

github-actions bot commented Sep 18, 2024

Changes to Brillig bytecode sizes

Generated at commit: ca2a1cd548c9e343efa47d56d4a6e52e1c48b817, compared to commit: 649117570b95b26776150e337c458d478eb48c2e

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 +10 ❌ +5.78%
regression_5252 -106 ✅ -2.29%
poseidonsponge_x5_254 -101 ✅ -2.37%
array_sort -9 ✅ -2.98%
6_array -14 ✅ -3.45%
array_to_slice -57 ✅ -7.74%

Full diff report 👇
Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 183 (+10) +5.78%
sha256 2,212 (-1) -0.05%
sha256_regression 6,541 (-3) -0.05%
sha256_var_size_regression 1,705 (-1) -0.06%
sha256_var_witness_const_regression 1,232 (-1) -0.08%
array_dynamic_blackbox_input 1,021 (-1) -0.10%
sha256_brillig_performance_regression 1,632 (-2) -0.12%
sha256_var_padding_regression 4,763 (-6) -0.13%
slice_regex 2,163 (-3) -0.14%
7_function 534 (-1) -0.19%
keccak256 1,784 (-4) -0.22%
bigint 1,991 (-5) -0.25%
brillig_cow_regression 2,137 (-6) -0.28%
ram_blowup_regression 953 (-3) -0.31%
nested_array_dynamic 1,985 (-7) -0.35%
conditional_1 1,177 (-6) -0.51%
schnorr 1,414 (-8) -0.56%
hashmap 19,872 (-148) -0.74%
to_le_bytes 132 (-1) -0.75%
array_if_cond_simple 131 (-1) -0.76%
uhashmap 13,196 (-107) -0.80%
tuple_inputs 364 (-3) -0.82%
generics 94 (-1) -1.05%
fold_numeric_generic_poseidon 748 (-10) -1.32%
no_predicates_numeric_generic_poseidon 748 (-10) -1.32%
nested_array_in_slice 1,098 (-15) -1.35%
slices 1,742 (-24) -1.36%
u128 2,757 (-38) -1.36%
to_be_bytes 208 (-3) -1.42%
hash_to_field 136 (-2) -1.45%
poseidon2 340 (-5) -1.45%
bench_2_to_17 333 (-5) -1.48%
fold_2_to_17 571 (-10) -1.72%
poseidon_bn254_hash_width_3 5,330 (-96) -1.77%
poseidon_bn254_hash 5,330 (-96) -1.77%
sha2_byte 2,723 (-51) -1.84%
databus_two_calldata 213 (-4) -1.84%
slice_dynamic_index 2,523 (-53) -2.06%
eddsa 10,215 (-219) -2.10%
regression_5252 4,526 (-106) -2.29%
poseidonsponge_x5_254 4,164 (-101) -2.37%
array_sort 293 (-9) -2.98%
6_array 392 (-14) -3.45%
array_to_slice 679 (-57) -7.74%

@vezenovm vezenovm requested a review from a team September 18, 2024 18:40
@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 18, 2024

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

@vezenovm vezenovm marked this pull request as draft September 18, 2024 18:56
@vezenovm
Copy link
Contributor Author

reference_only_used_as_alias | +7 ❌ | +2.81%

Hmm this is surprising. I'm guessing that removing some of these loads is perhaps reducing the amount of trivial stores we can remove, but I'm not sure.

The main difference between the SSA on master and this PR looks to be the inc_rc instructions remaining in place. Before mem2reg we have this pattern:

    v61 = load v52
    inc_rc v61
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    inc_rc [u8 0, u8 0, u8 0, u8 0, u8 0]
    v64 = load v51
    inc_rc v64
    v65 = load v52
    inc_rc v65
    v66 = load v52
    inc_rc v66
    v67 = load v52
    v68 = load v53
    v70 = lt v68, u32 4
    constrain v70 == u1 1 '"push out of bounds"'
    v72 = load v52
    v73 = load v53
    v74 = load v52
    v75 = load v53
    v76 = mul v75, u32 4
    v77 = array_set v72, index v76, value Field 0

The repeat loads are removed in this PR, but those follow-up inc_rc instructions remain. I think this can be handled in a follow-up though so I am marking this PR ready for review again.

Copy link
Contributor

@michaeljklein michaeljklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple cleanup notes, but otherwise LGTM

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Nov 22, 2024

Changes to number of Brillig opcodes executed

Generated at commit: ca2a1cd548c9e343efa47d56d4a6e52e1c48b817, compared to commit: 649117570b95b26776150e337c458d478eb48c2e

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 +14 ❌ +4.79%
hash_to_field -32 ✅ -3.42%
hashmap -1,975 ✅ -3.53%
eddsa -27,694 ✅ -3.78%
sha2_byte -1,895 ✅ -3.90%
array_to_slice -159 ✅ -8.88%

Full diff report 👇
Program Brillig opcodes (+/-) %
brillig_rc_regression_6123 306 (+14) +4.79%
ram_blowup_regression 778,664 (-17) -0.00%
sha256_regression 116,177 (-3) -0.00%
sha256_var_padding_regression 219,713 (-6) -0.00%
sha256_var_size_regression 16,344 (-1) -0.01%
sha256 13,844 (-1) -0.01%
array_dynamic_blackbox_input 18,179 (-2) -0.01%
sha256_var_witness_const_regression 6,740 (-1) -0.01%
brillig_cow_regression 518,944 (-96) -0.02%
sha256_brillig_performance_regression 22,977 (-16) -0.07%
array_if_cond_simple 537 (-1) -0.19%
nested_array_dynamic 3,109 (-7) -0.22%
slice_regex 3,389 (-9) -0.26%
conditional_1 5,700 (-18) -0.31%
uhashmap 146,485 (-526) -0.36%
slices 2,865 (-19) -0.66%
nested_array_in_slice 1,459 (-12) -0.82%
databus_two_calldata 445 (-4) -0.89%
tuple_inputs 632 (-6) -0.94%
slice_dynamic_index 4,332 (-42) -0.96%
u128 25,139 (-291) -1.14%
schnorr 10,226 (-128) -1.24%
7_function 2,464 (-32) -1.28%
to_be_bytes 2,448 (-33) -1.33%
poseidon2 694 (-11) -1.56%
fold_numeric_generic_poseidon 5,051 (-81) -1.58%
no_predicates_numeric_generic_poseidon 5,051 (-81) -1.58%
keccak256 33,047 (-544) -1.62%
6_array 1,631 (-36) -2.16%
bench_2_to_17 576,781 (-13,067) -2.22%
generics 130 (-3) -2.26%
fold_2_to_17 1,069,061 (-24,854) -2.27%
to_le_bytes 1,152 (-31) -2.62%
poseidonsponge_x5_254 182,435 (-6,075) -3.22%
poseidon_bn254_hash 161,657 (-5,417) -3.24%
poseidon_bn254_hash_width_3 161,657 (-5,417) -3.24%
regression_5252 908,592 (-30,508) -3.25%
array_sort 563 (-19) -3.26%
hash_to_field 905 (-32) -3.42%
hashmap 53,917 (-1,975) -3.53%
eddsa 704,725 (-27,694) -3.78%
sha2_byte 46,690 (-1,895) -3.90%
array_to_slice 1,631 (-159) -8.88%

@vezenovm vezenovm marked this pull request as ready for review November 22, 2024 09:27
@vezenovm
Copy link
Contributor Author

Following some regressions from PR #6505 (#6505 (comment)) I decided to update this PR with master. It looks like we have some benefits from the optimizations aside for brillig_rc_regression_6123. However, it looks to be a minor regression so I am marking this PR ready for review again.

@vezenovm vezenovm requested a review from a team November 22, 2024 09:32
Copy link
Member

@TomAFrench TomAFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to do a proper review but a couple of nits

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm still not 100% on the interactions with aliases here but I haven't been able to trick it with e.g. nested mutable references to create aliases, passing one of those to a function, mutating it, then loading again trying to get that load removed since only the alias was passed.

compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg/alias_set.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
@vezenovm vezenovm added this pull request to the merge queue Nov 26, 2024
Merged via the queue into master with commit 624ae6c Nov 26, 2024
49 checks passed
@vezenovm vezenovm deleted the mv/remove-last-loads-per-block branch November 26, 2024 14:16
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Nov 27, 2024
fix(LSP): use generic self type to narrow down methods to complete (noir-lang/noir#6617)
fix!: Disallow `#[export]` on associated methods (noir-lang/noir#6626)
chore: redo typo PR by donatik27 (noir-lang/noir#6575)
chore: redo typo PR by Dimitrolito (noir-lang/noir#6614)
feat: simplify `jmpif`s by reversing branches if condition is negated (noir-lang/noir#5891)
fix: Do not warn on unused functions marked with #[export] (noir-lang/noir#6625)
chore: Add panic for compiler error described in #6620 (noir-lang/noir#6621)
feat(perf): Track last loads per block in mem2reg and remove them if possible (noir-lang/noir#6088)
fix(ssa): Track all local allocations during flattening (noir-lang/noir#6619)
feat(comptime): Implement blackbox functions in comptime interpreter (noir-lang/noir#6551)
chore: derive PartialEq and Hash for FieldElement (noir-lang/noir#6610)
chore: ignore almost-empty directories in nargo_cli tests (noir-lang/noir#6611)
chore: remove temporary allocations from `num_bits` (noir-lang/noir#6600)
chore: Release Noir(1.0.0-beta.0) (noir-lang/noir#6562)
feat: Add `array_refcount` and `slice_refcount` builtins for debugging (noir-lang/noir#6584)
chore!: Require types of globals to be specified (noir-lang/noir#6592)
fix: don't report visibility errors when elaborating comptime value (noir-lang/noir#6498)
fix: preserve newlines between comments when formatting statements (noir-lang/noir#6601)
fix: parse a bit more SSA stuff (noir-lang/noir#6599)
chore!: remove eddsa from stdlib (noir-lang/noir#6591)
chore: Typo in oracles how to (noir-lang/noir#6598)
feat(ssa): Loop invariant code motion (noir-lang/noir#6563)
fix: remove `compiler_version` from new `Nargo.toml` (noir-lang/noir#6590)
feat: Avoid incrementing reference counts in some cases (noir-lang/noir#6568)
chore: fix typo in test name (noir-lang/noir#6589)
fix: consider prereleases to be compatible with pre-1.0.0 releases (noir-lang/noir#6580)
feat: try to inline brillig calls with all constant arguments  (noir-lang/noir#6548)
fix: correct type when simplifying `derive_pedersen_generators` (noir-lang/noir#6579)
feat: Sync from aztec-packages (noir-lang/noir#6576)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Nov 27, 2024
fix(LSP): use generic self type to narrow down methods to complete (noir-lang/noir#6617)
fix!: Disallow `#[export]` on associated methods (noir-lang/noir#6626)
chore: redo typo PR by donatik27 (noir-lang/noir#6575)
chore: redo typo PR by Dimitrolito (noir-lang/noir#6614)
feat: simplify `jmpif`s by reversing branches if condition is negated (noir-lang/noir#5891)
fix: Do not warn on unused functions marked with #[export] (noir-lang/noir#6625)
chore: Add panic for compiler error described in #6620 (noir-lang/noir#6621)
feat(perf): Track last loads per block in mem2reg and remove them if possible (noir-lang/noir#6088)
fix(ssa): Track all local allocations during flattening (noir-lang/noir#6619)
feat(comptime): Implement blackbox functions in comptime interpreter (noir-lang/noir#6551)
chore: derive PartialEq and Hash for FieldElement (noir-lang/noir#6610)
chore: ignore almost-empty directories in nargo_cli tests (noir-lang/noir#6611)
chore: remove temporary allocations from `num_bits` (noir-lang/noir#6600)
chore: Release Noir(1.0.0-beta.0) (noir-lang/noir#6562)
feat: Add `array_refcount` and `slice_refcount` builtins for debugging (noir-lang/noir#6584)
chore!: Require types of globals to be specified (noir-lang/noir#6592)
fix: don't report visibility errors when elaborating comptime value (noir-lang/noir#6498)
fix: preserve newlines between comments when formatting statements (noir-lang/noir#6601)
fix: parse a bit more SSA stuff (noir-lang/noir#6599)
chore!: remove eddsa from stdlib (noir-lang/noir#6591)
chore: Typo in oracles how to (noir-lang/noir#6598)
feat(ssa): Loop invariant code motion (noir-lang/noir#6563)
fix: remove `compiler_version` from new `Nargo.toml` (noir-lang/noir#6590)
feat: Avoid incrementing reference counts in some cases (noir-lang/noir#6568)
chore: fix typo in test name (noir-lang/noir#6589)
fix: consider prereleases to be compatible with pre-1.0.0 releases (noir-lang/noir#6580)
feat: try to inline brillig calls with all constant arguments  (noir-lang/noir#6548)
fix: correct type when simplifying `derive_pedersen_generators` (noir-lang/noir#6579)
feat: Sync from aztec-packages (noir-lang/noir#6576)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants