Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gnovm): cache PkgIDFromPkgPath for higher performance #3424

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

odeke-em
Copy link
Contributor

This change comes from noticing that PkgIDFromPkgPath is very heavily invoked within the VM yet its results with the same inputs produced deterministic results aka SHA256(path)[:20] Previously just spinning up the VM would take 80 seconds, with this change that's shaved by ~8-10 seconds down and with repeatable and visible results exhibited through profiles:

  • Before
(pprof) list PkgIDFromPkgPath
Total: 100.94s
ROUTINE ======================== github.com/gnolang/gno/gnovm/pkg/gnolang.PkgIDFromPkgPath in $PATH
     220ms      9.26s (flat, cum)  9.17% of Total
         .          .     74:func PkgIDFromPkgPath(path string) PkgID {
     220ms      9.26s     75:	return PkgID{HashBytes([]byte(path))}
         .          .     76:}
         .          .     77:
         .          .     78:// Returns the ObjectID of the PackageValue associated with path.
         .          .     79:func ObjectIDFromPkgPath(path string) ObjectID {
         .          .     80:	pkgID := PkgIDFromPkgPath(path)
  • After
(pprof) list PkgIDFromPkgPath
Total: 93.22s
ROUTINE ======================== github.com/gnolang/gno/gnovm/pkg/gnolang.PkgIDFromPkgPath in $PATH
     210ms      1.55s (flat, cum)  1.66% of Total
      50ms       50ms     78:func PkgIDFromPkgPath(path string) PkgID {
         .      490ms     79:	pkgIDMu.Lock()
      10ms       10ms     80:	defer pkgIDMu.Unlock()
         .          .     81:
      10ms      730ms     82:	pkgID, ok := pkgIDFromPkgPathCache[path]
         .          .     83:	if !ok {
         .          .     84:		pkgID = new(PkgID)
         .          .     85:		*pkgID = PkgID{HashBytes([]byte(path))}
         .          .     86:		pkgIDFromPkgPathCache[path] = pkgID
         .          .     87:	}
     140ms      270ms     88:	return *pkgID
         .          .     89:}
         .          .     90:
         .          .     91:// Returns the ObjectID of the PackageValue associated with path.
         .          .     92:func ObjectIDFromPkgPath(path string) ObjectID {
         .          .     93:	pkgID := PkgIDFromPkgPath(path)

Fixes #3423

@github-actions github-actions bot added the 📦 🤖 gnovm Issues or PRs gnovm related label Dec 28, 2024
@Gno2D2
Copy link
Collaborator

Gno2D2 commented Dec 28, 2024

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
  • The pull request description provides enough details
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 The pull request was created from a fork (head branch repo: odeke-em/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission
The pull request description provides enough details

If

🟢 Condition met
└── 🟢 Not (🔴 Pull request author is a member of the team: core-contributors)

Can be checked by

  • team core-contributors

Copy link

codecov bot commented Dec 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@n0izn0iz n0izn0iz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, looks good for now IMO

The cache might use a little bit too much memory in the far future but we can revisit then

Please pass CI :)

@odeke-em
Copy link
Contributor Author

Nice one, looks good for now IMO

The cache might use a little bit too much memory in the far future but we can revisit then

Please pass CI :)

@n0izn0iz I considered using a small slice say pre-allocated 100 items as the cache then linear looking up but that wasn't much different.

-var pkgIDFromPkgPathCache = make(map[string]*PkgID)
+type pkgCached struct {
+       path string
+       id   *PkgID
+}
+
+var pkgIDFromPkgPathCache = make([]*pkgCached, 0, 100)
 
 func PkgIDFromPkgPath(path string) PkgID {
        pkgIDMu.Lock()
        defer pkgIDMu.Unlock()
 
-       pkgID, ok := pkgIDFromPkgPathCache[path]
-       if !ok {
-               pkgID = new(PkgID)
-               *pkgID = PkgID{HashBytes([]byte(path))}
-               pkgIDFromPkgPathCache[path] = pkgID
+       // Find the entry if possible, using a slice as
+       // we don't expect too many unique packages and
+       // when small, looking up with a slice is much
+       // cheaper than using a map.
+       for _, cached := range pkgIDFromPkgPathCache {
+               if cached.path == path {
+                       return *cached.id
+               }
+       }
+
+       // Now cache the value.
+       cached := &pkgCached{
+               id:   PkgID{HashBytes([]byte(path))},
+               path: path,
        }
-       return *pkgID
+       pkgIDFromPkgPathCache = append(pkgIDFromPkgPathCache, cached)
+       return *cached.id
 }

A map suffices TBH even if memory increases, it is unlikely that 5,000 packages will be imported and thus we can safely assume memory bloat won't be a problem. The merits of the change to me and in experience massively outweight any minuite RAM increases: being able to even start with 8-10 seconds less allows every contributor to work much faster, and even for Gno to preserve compute resources :-)

@odeke-em odeke-em force-pushed the gnovm-cache-PkgIDFromPkgPath-results branch 2 times, most recently from 5112853 to 13b78e9 Compare December 29, 2024 00:54
This change comes from noticing that PkgIDFromPkgPath is very
heavily invoked within the VM yet its results with the same inputs
produced deterministic results aka SHA256(path)[:20]
Previously just spinning up the VM would take 80 seconds, with this
change that's shaved by ~8-10 seconds down and with repeatable and
visible results exhibited through profiles:

* Before
```shell
(pprof) list PkgIDFromPkgPath
Total: 100.94s
ROUTINE ======================== github.com/gnolang/gno/gnovm/pkg/gnolang.PkgIDFromPkgPath in $PATH
     220ms      9.26s (flat, cum)  9.17% of Total
         .          .     74:func PkgIDFromPkgPath(path string) PkgID {
     220ms      9.26s     75:	return PkgID{HashBytes([]byte(path))}
         .          .     76:}
         .          .     77:
         .          .     78:// Returns the ObjectID of the PackageValue associated with path.
         .          .     79:func ObjectIDFromPkgPath(path string) ObjectID {
         .          .     80:	pkgID := PkgIDFromPkgPath(path)
```

* After
```shell
(pprof) list PkgIDFromPkgPath
Total: 93.22s
ROUTINE ======================== github.com/gnolang/gno/gnovm/pkg/gnolang.PkgIDFromPkgPath in $PATH
     210ms      1.55s (flat, cum)  1.66% of Total
      50ms       50ms     78:func PkgIDFromPkgPath(path string) PkgID {
         .      490ms     79:	pkgIDMu.Lock()
      10ms       10ms     80:	defer pkgIDMu.Unlock()
         .          .     81:
      10ms      730ms     82:	pkgID, ok := pkgIDFromPkgPathCache[path]
         .          .     83:	if !ok {
         .          .     84:		pkgID = new(PkgID)
         .          .     85:		*pkgID = PkgID{HashBytes([]byte(path))}
         .          .     86:		pkgIDFromPkgPathCache[path] = pkgID
         .          .     87:	}
     140ms      270ms     88:	return *pkgID
         .          .     89:}
         .          .     90:
         .          .     91:// Returns the ObjectID of the PackageValue associated with path.
         .          .     92:func ObjectIDFromPkgPath(path string) ObjectID {
         .          .     93:	pkgID := PkgIDFromPkgPath(path)
```

Fixes gnolang#3423
@odeke-em odeke-em force-pushed the gnovm-cache-PkgIDFromPkgPath-results branch from 13b78e9 to b88c690 Compare December 29, 2024 00:56
Copy link
Member

@moul moul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks reasonable. However, an LRU cache would prevent it from causing a DoS. I'm fine with merging now since this DoS is unlikely to occur before other issues arise, but please add a TODO comment.

It also needs a review from @thehowl, @ltzmaxwell, and @petar-dambovaliev.

@odeke-em
Copy link
Contributor Author

Done, thanks @moul I've added the TODO about using an LRU for the future when the trade-off is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 🤖 gnovm Issues or PRs gnovm related
Projects
Status: Triage
4 participants