-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #975
Merged
Merged
Improve performance #975
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
ceb8c9e
Make additional use of the statepoint cache.
joaander 19cc47a
Make JobsCursor.__len__ and .__contains__ O(1).
joaander f7dc01f
Add validate_statepoint argument to Job.init()
joaander 8e6e2fc
Rename statepoint_dict to statepoint_mapping.
joaander df2b495
Read the cache from disk in `open_job`.
joaander e20194f
Restore cache miss logger level to debug.
joaander 712ed09
Instantiate Job by id directly when iterating over ids.
joaander 15c2d60
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 99aaf28
Add statepoint_mapping test.
joaander 57ef8a9
Pass information about the Job directories existence from Project to …
joaander 04fca6e
Populate _statepoint_mapping in additional code paths.
joaander 4f135b6
Increase test coverage.
joaander c95ac0b
Update change log.
joaander f02beb5
Rename statepoint_mapping to cached_statepoint.
joaander 32b8fa9
Doc fixes.
joaander e641333
Update code comments
cbkerr 60e76a2
Use cached_statepoint in to_dataframe.
joaander 7578993
Restore iteration order.
joaander 4ebf74f
Validate cached_statpoing when read from disk.
joaander 1de7155
Use cached_statepoint in groupby.
joaander d48d281
Remove validate argument from update_cache.
joaander 9eb658f
Write state point as two words in doc strings
cbkerr 6639566
Merge branch 'main' into improve-performance
cbkerr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,6 +78,7 @@ The Job class | |
|
||
.. autosummary:: | ||
|
||
Job.cached_statepoint | ||
Job.clear | ||
Job.close | ||
Job.data | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must we add a new public API in order to provide the performance benefits of this PR? I am unsure if we should permit users to call this, or if it should be only leveraged internally as a private property
job._cached_statepoint
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted below, I want to conceal as much as possible about topics like caching and validation from the user API as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intentionally make this public.
Job.statepoint
is writeable and carries significant overhead fromsynced_collections
as shown in the benchmarks - reading one key from every job's statepoint takes 116 seconds even when the statepoint is in the cache.Many workflows only need to read the statepoint, flow in particular. While flow could internally use a private
_cached_statepiont
forstr
keys, users need public access to the fast path so that their user-defined callable methods (key
,select
,sort_by
) can complete quickly. Many users are frustrated with 10+ minute flow status updates. As shown in the benchmarks, the same loop over projects accessingcached_statepoint
completes in 0.379 seconds - 306 times faster. This alone improves flow performance tremendously when using aggregates.The alternative API I considered was to replace
statepoint
with the read-only statepoint and requireupdate_statepoint
to change it. I opted for a new attribute as changingstatepoint
semantics is a massive breaking change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, that is helpful.
I would be open to changing
statepoint
semantics to be read-only in a future major version. We had discussed this at one point as a possibility for signac 2. Let's file an issue for that proposal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in #983.