Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #975
Improve performance #975
Changes from 20 commits
ceb8c9e
19cc47a
f7dc01f
8e6e2fc
df2b495
e20194f
712ed09
15c2d60
99aaf28
57ef8a9
04fca6e
4f135b6
c95ac0b
f02beb5
32b8fa9
e641333
60e76a2
7578993
4ebf74f
1de7155
d48d281
9eb658f
6639566
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must we add a new public API in order to provide the performance benefits of this PR? I am unsure if we should permit users to call this, or if it should be only leveraged internally as a private property
job._cached_statepoint
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted below, I want to conceal as much as possible about topics like caching and validation from the user API as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intentionally make this public.
Job.statepoint
is writeable and carries significant overhead fromsynced_collections
as shown in the benchmarks - reading one key from every job's statepoint takes 116 seconds even when the statepoint is in the cache.Many workflows only need to read the statepoint, flow in particular. While flow could internally use a private
_cached_statepiont
forstr
keys, users need public access to the fast path so that their user-defined callable methods (key
,select
,sort_by
) can complete quickly. Many users are frustrated with 10+ minute flow status updates. As shown in the benchmarks, the same loop over projects accessingcached_statepoint
completes in 0.379 seconds - 306 times faster. This alone improves flow performance tremendously when using aggregates.The alternative API I considered was to replace
statepoint
with the read-only statepoint and requireupdate_statepoint
to change it. I opted for a new attribute as changingstatepoint
semantics is a massive breaking change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, that is helpful.
I would be open to changing
statepoint
semantics to be read-only in a future major version. We had discussed this at one point as a possibility for signac 2. Let's file an issue for that proposal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in #983.
Check warning on line 645 in signac/job.py
Codecov / codecov/patch
signac/job.py#L645
Check warning on line 647 in signac/job.py
Codecov / codecov/patch
signac/job.py#L647