-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive/storage: Backport API from chainHead #94
Conversation
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
My opinion of The reasoning is that, compared to the |
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
I've updated the archive/storage to be a plain method, while keeping parity with the chainHead/storage interface. One side effect of this being a method is the introduction of the |
Co-authored-by: James Wilson <[email protected]>
Co-authored-by: James Wilson <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
@tomaka would love to get your feedback again on this 🙏 |
Co-authored-by: James Wilson <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me now, nice one!
I'm a bit skeptical, because I feel like this is designed without a target use case in mind. I have the impression that scanning the archive requires functions such as obtaining the list of blocks where a certain change to a storage value happened, and that kind of things, and I think that the |
One thing I have in mind is that I can fall back to using these functions in subxt in case blocks are unpinned or whatever via the chainHead calls. So basically, if archive fns are available, people can "hold onto" blocks for much longer than if only chainHead is available. Hard to see how often that'll end up being used, but there are things (eg network issues) that can cause recent blocks to be unpinned, so nice to have a fallback where applicable |
The entire point of splitting between If you want the guarantee to be able to read any block even in case of networking issues and such, then just use |
I don't agree; I think it's very reasonable for Subxt to use chainHead to follow and work with current blocks (by far the primary way people will use it), and if we happen to be talking to a node providing archive methods then we can do a little better and still attempt to return valid details for blocks that are no longer pinned via those, if the user has held on to them for longer for some reason. This also means that when the user asks for some details at any block hash, subxt will use a pinned block if one is available, or fall back to using an archive call if that is available, or fail if neither is possible. It makes the experience simple and does a "best effort" to retrieve the requested details. Subxt doesn't try to guarantee anything; instead it simply aims to keep the details for any block that the user is interested in available for as long as possible. |
I really disagree with that. If SubXT can "sometimes" return information about a block and "sometimes" can't, it doesn't make it simple. Users needs to know whether their code is going to work or not, and shrugging at them isn't a very satisfactory thing to do. Telling the user that you're going to try give them some information but without any guarantee ultimately leads either to a bad user experience or to the user having to punch through the abstraction layers in order to understand how things work under the hood. This kind of digging under the hood is precisely why Substrate/Polkadot is so complex nowadays. Everything "sometimes" work, and in order to code anything you have to understand how everything works under the hood. The objective of the new JSON-RPC API is to do what the legacy API didn't do, which is to put a clear barrier here. It is clear and precise enough that users of the API don't have to look under the hood in order to understand whether their request is going to work. |
Subxt offers the best guarantees that I think it can here; it will keep blocks pinned for as long as it can while the user is interested in them, and so as long as you ask for information from a block within a reaosnable time frame, it will always successfully return it. The new API only guarantees that blocks will be pinned for some (currently unspecified and unknown) amount of time, and so if the user holds onto one for too long, I'll eventually have to say "no". What do you think would be a better approach here? My thought then was that if the user does hold onto a block reference for too long, and they are specifically connected to an archive node (presumably this would be a deliberate choice, especially as light client connections will eventually be the default) then they can, though exactly the same Subxt APIs, continue access information about older blocks too, in a seamless sort of way.
A user has to be aware that a request will work for some period of time after a block hash is seen. Currently, they won't know how long that period is, but they will know when it's no longer possible because a "stop" event is emitted. Subxt offers the same promise; if you access details about some block within a reasonable time frame, the access will succeed. We don't currently allow the user to be notified when some given block ref is no longer pinned/accessible, but that's definitely something we can add.
It's not clear to me at least what should be different between archive and chainhead method; do you have any ideas on how the archive methods should differ? I'd likely offer a separate set of APIs for accessing archived block information in Subxt if the methods did differ in a meaningful way, but it would add complexity, and the user would then have to learn two sets of APIs (one for recent blocks and one for old blocks), so I do think that there would have to be at least a slightly compelling reason for any significant differences, but I'm open to all thoughts here! |
I think that the issue here is the "currently unspecified and unknown". To put it otherwise, to me the solution is for example for archive nodes to allow an unlimited amount of pinned blocks, provided that these blocks are in the finalized canonical chain. In other words, the blocks that you would have been able to access through Basically, I don't think that you should be using
It's not clear to me either what the differences are. I've designed For Furthermore there are many uncertainties w.r.t. how to manage the old chain. For example we might want to deprecate things in the future, which means maybe breaking old blocks. Is that acceptable? Is that something to bake in the API? I have no idea. |
That makes a lot of sense to me! I still think that Subxt could provide a single interface to allow fetching of current or old blocks, but could imagine that a user might explicitly pick which of these they are doing and select the appropriate backend for the job (in other words; instruct Subxt to use the archive methods or to use the chainHead methods to drive things). I guess it depends on how different the
In the absence of more information, I think that it'd be good to merge this PR and start with the archive and chainHead methods being reasonably aligned (aside from the subscription vs request sort of approach, which makes sense to differ as this PR has done). Then, we can iterate on the design from there. I can imagine that a bunch of things (like #108) may want to exist in chainHead and archive namespaces if possible (albeit with perhaps a slightly different approach to cater for each API; chainHead maybe giving a "live" diff each block for instance, vs archive just being given the block numbers to diff between up front). |
That sounds like a plan to me, and follow up with use cases for the archive class to make informed decisions about what developers need. @tomaka what do you think? |
@tomaka @josepot would love to get your thoughts on this PR As a first step having the |
To me what you're suggesting is to change from a bad design that shouldn't be used to a different bad design that shouldn't be used. I don't have an opinion on this. |
To me, this PR improves on the state of the So let's merge this by the end of the week unless there are any remaining objections that we can solve, and when we get more data we can make the necessary changes. |
Again I'm neutral here, but to me consistency between |
Agreed; not an objective, but a good default when we don't know better :) |
src/api/archive_unstable_storage.md
Outdated
Yields an item that was found in the storage. | ||
The JSON object corresponds to one of the requested items whose `type` was `"value"` or `"descendantsValues"`. | ||
|
||
If the `key` is not associated with a storage value in the trie, then no response is generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is fine but it would be easier for clients if the server would just send back null
to indicate that key was not found.
For this API, a client probably needs to have a timeout for these request and after for instance 1 min the call is regarded as failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will modify the phrasing of the spec. What I meant by this, given archive/storage is a plain method, no response will be generated for this "key" in the "result" array. Users will still get back once all queries are processed the responses back
src/api/archive_unstable_storage.md
Outdated
The `waiting-for-continue` event is generated after at least one `"item"` event has been generated, and indicates that the JSON-RPC client must call `archive_unstable_storageContinue` before more events are generated. | ||
The JSON object corresponds to one of the requested items whose `type` was `"hash"` or `"descendantsHashes"`. | ||
|
||
If the `key` is not associated with a storage value in the trie, then no response is generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here 👆
Yeah, I agree with ☝️ there lots of unknowns for sure but on the hand we have to start somewhere to know what issues that comes up to "fetch old blocks" with this It doesn't have to be perfect but as long we can get folks to move away from the current RPC API "to fetch old storage" and get feedback for users what doesn't work etc.. to iterate further on it. |
Co-authored-by: Niklas Adolfsson <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
This PR implements the `archive_unstable_storage` method that offers support for: - fetching values - fetching hashes - iterating over keys and values - iterating over keys and hashes - fetching merkle values from the trie-db A common component dedicated to RPC-V2 storage queries is created to bridge the gap between `chainHead/storage` and `archive/storage`. Query pagination is supported by `paginationStartKey`, similar to the old APIs. Similarly to the `chainHead/storage`, the `archive/storage` method accepts a maximum number of queried items. The design builds upon: paritytech/json-rpc-interface-spec#94. Closes #1512. cc @paritytech/subxt-team --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Niklas Adolfsson <[email protected]>
This PR implements the `archive_unstable_storage` method that offers support for: - fetching values - fetching hashes - iterating over keys and values - iterating over keys and hashes - fetching merkle values from the trie-db A common component dedicated to RPC-V2 storage queries is created to bridge the gap between `chainHead/storage` and `archive/storage`. Query pagination is supported by `paginationStartKey`, similar to the old APIs. Similarly to the `chainHead/storage`, the `archive/storage` method accepts a maximum number of queried items. The design builds upon: paritytech/json-rpc-interface-spec#94. Closes paritytech#1512. cc @paritytech/subxt-team --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Niklas Adolfsson <[email protected]>
This PR backports the
chainHead_storage
method toarchive_storage
to bring it up to date.The main changes include:
items
includeDescendants
bool parameter in favor oftype
of the queryDifferences between
chainHead_storage
andarchive_storage
include:archive_storage
is a method that provides the responses directly, without any concepts of operationIdspaginationStartKey
key for descendants queries to resume paginationcc @tomaka @jsdw @josepot @paritytech/subxt-team