-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffered Synced collection #2
Conversation
@vishav1771 nice work, this is a good starting point! It looks like you've successfully ported the existing signac buffering logic, modifying it to take advantage of the new structure. I also think you've implemented something that I suggested, which is treating the buffer as a new back-end. However, I don't think I've given you sufficient guidance on how caching could look, my apologies on that. Have a look at the signac-2 prototype, particularly the design.md file, which outlines our goals for caching. Some of my next comments might reference ideas in there. Before getting too much into a line-by-line review of the code, I think we need to zoom out and make sure that the overall API and concepts we're developing here will address the various needs and open issues with the current code base. @csadorf @bdice @mikemhenry @atravitz @b-butler here are a few big picture questions that I think we need to address as part of this project. I apologize if there's overlap or if they're not all completely well thought through, but I wanted to get the discussion started and get my thoughts down before I forgot them:
Those are my first thoughts. I'll add more as I have time to ponder, but that's food for thought for other reviewers. I'd like to hash out more of the concepts here. |
Caching and buffering are note the same, see here for the differentation: https://en.wikipedia.org/wiki/Cache_(computing)#Buffer_vs._cache In our context, buffering means that I/O operations can be delayed, because no concurrent access is expected, i.e., they are non-atomic. The way that one can typically take advantage of this kind of mode is by using a cache. Whether a backend uses a cache or another technique to improve performance should be left up to the backend implementation. This means concretely that the buffering mode is merely an indicator to the backend, whether we expect that a completely different instance is going to access the signac data space concurrently. Definition: Buffered mode means that the backend can safely assume that non-atomic I/O operations are currently safe. Whether a specific backend implements a buffered mode or not should be completely transparent. This means that the only difference would be a potential performance increase in buffered mode, but no other change in behavior.
I'm no longer convinced that caching on the "signac level" is a good idea. As long as we are rigorously abstracting all I/O operations by having them implemented purely in the backend, then we do not need to worry about this kind of optimization anymore. In fact, a rigorous backend implementation is going to make this kind of performance optimization much easier. For example, if we use redis or zarr backend, then we can assume that operations are atomic and concurrent access is safe at all times. From the point of signac, the data provided by the backend is the single source of truth, nothing else matters.
As stated above, caching should be handled by the backend, no caching on the signac level.
See my first paragraph. |
I have done the changes. |
@vishav1771 Did you accidentally close this? |
@csadorf I have moved this PR to glotzerlab#363 . |
This PR enables buffering feature to
SyncedCollection
.This related to glotzerlab#249Description
Motivation and Context
glotzerlab#249
Types of Changes
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary:
Example for a changelog entry:
Fix issue with launching rockets to the moon (#101, #212).