Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce domains readings during unfold at commitment #11328

Closed
awskii opened this issue Jul 25, 2024 · 1 comment
Closed

Reduce domains readings during unfold at commitment #11328

awskii opened this issue Jul 25, 2024 · 1 comment
Assignees
Labels
complexity:medium Issue will takee around a week erigon3 imp1 High importance
Milestone

Comments

@awskii
Copy link
Member

awskii commented Jul 25, 2024

we do read accounts, code if account stated non-empty code hash and storage during HPH.unfold.
Some of those reads are unnecessary since cell position did not changed as well as it's contents so we read latest data from domains to re-verify hash of what's written.

This could be avoided if we carefully track which cell is updated or moved and keep their leaf hash in BranchData. Cell currently stores .h field with cellHash (without leading a0). AS of now to keep commitment working i decided to add another field encoded after all others so could be temporarily introduced and removed later.

Since unfolding is already quite effective, task getting harder and i need every piece of improvement here.

  • hash the plain key as it first comes with ModeDirect. Hash can be nibblized later to keep them compacted during execution. Hashed key should be collected at that moment.
  • computed account leaf hash during fold should be included into BranchData
  • reuse leaf hash when account keeps it's position in tree and does not updated. Position changing could be triggered by update of similar prefix like splitting downHashedKey.
  • load Cells on demand when we are sure that we cannot reuse old hash and need to re-calculate it
  • during branch unfold if branch contains reference keys to domains we dereference them into plain keys. At that moment we 1 read behind of a value from given file, but there is high probability that value is obsolete already. Also at that point we do not know how exactly branch will be changed so we cannot decide if we will need that storage and e.g. put recent value into the SharedDomain cache (which may be useful for Storage keys). But erigon is sequential for now as well as SD is thread unsafe so it actually doesn't matter when we will go to read from disk, now or a bit later when BranchData returned by GetBranch will be processed.
  • with point above, could make sense after loading keep storage values in cache until commitment is finished (to at least estimate if we do read same values several times)

Memoization of storage hashes besides already kept is failed for now: any change in account storage changes storage root which is hanging from account cell as extension and could trigger lots of prefixes to be updated at once.

@awskii awskii self-assigned this Jul 25, 2024
@awskii awskii added this to the 3.0.0-alpha2 milestone Jul 25, 2024
@awskii awskii added imp1 High importance complexity:medium Issue will takee around a week erigon3 labels Jul 25, 2024
@Giulio2002 Giulio2002 modified the milestones: 3.0.0-alpha2, 3.0.0-alpha3 Aug 1, 2024
@VBulikov VBulikov modified the milestones: 3.0.0-alpha4, 3.0.0-alpha5 Sep 27, 2024
@VBulikov VBulikov modified the milestones: 3.0.0-alpha5, 3.0.0-alpha6 Oct 23, 2024
@awskii
Copy link
Member Author

awskii commented Nov 12, 2024

closed long ago by #11546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:medium Issue will takee around a week erigon3 imp1 High importance
Projects
None yet
Development

No branches or pull requests

4 participants