Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/storage: Backport API from chainHead #94

Merged
merged 19 commits into from
Nov 15, 2023
Merged

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Sep 12, 2023

This PR backports the chainHead_storage method to archive_storage to bring it up to date.

The main changes include:

  • support for batch requests via the introduced items
  • removal of includeDescendants bool parameter in favor of type of the query
  • support for fetching value, hash, closest merkle value, as well for iterating via descedantsValues and descendantsHashes

Differences between chainHead_storage and archive_storage include:

  • archive_storage is a method that provides the responses directly, without any concepts of operationIds
  • introduced an optional paginationStartKey key for descendants queries to resume pagination

cc @tomaka @jsdw @josepot @paritytech/subxt-team

@tomaka
Copy link
Contributor

tomaka commented Sep 14, 2023

My opinion of archive_storage is actually to have it more similar to the legacy JSON-RPC API, and make it closer to returning just the list of values in its return value. Same for all the archive-prefixed functions in general.

The reasoning is that, compared to the chainHead functions, there's no "real time" component to archive functions.

@lexnv
Copy link
Contributor Author

lexnv commented Sep 15, 2023

I've updated the archive/storage to be a plain method, while keeping parity with the chainHead/storage interface.

One side effect of this being a method is the introduction of the paginationStartKey to avoid repeating a result. Would love to get your thoughts on this 🙏

@lexnv
Copy link
Contributor Author

lexnv commented Sep 25, 2023

@tomaka would love to get your feedback again on this 🙏

jsdw
jsdw previously approved these changes Oct 11, 2023
Copy link
Collaborator

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me now, nice one!

@tomaka
Copy link
Contributor

tomaka commented Oct 12, 2023

I'm a bit skeptical, because I feel like this is designed without a target use case in mind.
Who for example would need the closestDescendantMerkleValue of an old block? I really don't see why you'd want that information.

I have the impression that scanning the archive requires functions such as obtaining the list of blocks where a certain change to a storage value happened, and that kind of things, and I think that the archive-prefixed functions in general should be re-designed from the grounds up.

@jsdw
Copy link
Collaborator

jsdw commented Oct 12, 2023

Who for example would need the closestDescendantMerkleValue of an old block? I really don't see why you'd want that information.

One thing I have in mind is that I can fall back to using these functions in subxt in case blocks are unpinned or whatever via the chainHead calls. So basically, if archive fns are available, people can "hold onto" blocks for much longer than if only chainHead is available.

Hard to see how often that'll end up being used, but there are things (eg network issues) that can cause recent blocks to be unpinned, so nice to have a fallback where applicable

@tomaka
Copy link
Contributor

tomaka commented Oct 12, 2023

if archive fns are available, people can "hold onto" blocks for much longer than if only chainHead is available.

The entire point of splitting between archive and chainHead is to specialize the ways the chain is accessed: either you're watching the chain live, or you're watching a replay.
Trying to create an abstraction that works transparently over both is wrong.

If you want the guarantee to be able to read any block even in case of networking issues and such, then just use archive all the time and not chainHead.

@jsdw
Copy link
Collaborator

jsdw commented Oct 13, 2023

Trying to create an abstraction that works transparently over both is wrong.

I don't agree; I think it's very reasonable for Subxt to use chainHead to follow and work with current blocks (by far the primary way people will use it), and if we happen to be talking to a node providing archive methods then we can do a little better and still attempt to return valid details for blocks that are no longer pinned via those, if the user has held on to them for longer for some reason.

This also means that when the user asks for some details at any block hash, subxt will use a pinned block if one is available, or fall back to using an archive call if that is available, or fail if neither is possible. It makes the experience simple and does a "best effort" to retrieve the requested details.

Subxt doesn't try to guarantee anything; instead it simply aims to keep the details for any block that the user is interested in available for as long as possible.

@tomaka
Copy link
Contributor

tomaka commented Oct 15, 2023

It makes the experience simple and does a "best effort" to retrieve the requested details.

I really disagree with that.

If SubXT can "sometimes" return information about a block and "sometimes" can't, it doesn't make it simple. Users needs to know whether their code is going to work or not, and shrugging at them isn't a very satisfactory thing to do.

Telling the user that you're going to try give them some information but without any guarantee ultimately leads either to a bad user experience or to the user having to punch through the abstraction layers in order to understand how things work under the hood.

This kind of digging under the hood is precisely why Substrate/Polkadot is so complex nowadays. Everything "sometimes" work, and in order to code anything you have to understand how everything works under the hood.
If you try for example to work on the Substrate networking code, you'll realize for example that the client sometimes is able to provide you with a block, but doesn't offer any strong guarantee when, so you have to read the code of the client in order to know when it can and when it can't. Then you'll realize that the Wasm executor sometimes returns an error, but doesn't offer any strong guarantee when, so you have to read the code of the executor. And so on.

The objective of the new JSON-RPC API is to do what the legacy API didn't do, which is to put a clear barrier here. It is clear and precise enough that users of the API don't have to look under the hood in order to understand whether their request is going to work.

@jsdw
Copy link
Collaborator

jsdw commented Oct 16, 2023

If SubXT can "sometimes" return information about a block and "sometimes" can't, it doesn't make it simple. Users needs to know whether their code is going to work or not, and shrugging at them isn't a very satisfactory thing to do.

Subxt offers the best guarantees that I think it can here; it will keep blocks pinned for as long as it can while the user is interested in them, and so as long as you ask for information from a block within a reaosnable time frame, it will always successfully return it. The new API only guarantees that blocks will be pinned for some (currently unspecified and unknown) amount of time, and so if the user holds onto one for too long, I'll eventually have to say "no". What do you think would be a better approach here?

My thought then was that if the user does hold onto a block reference for too long, and they are specifically connected to an archive node (presumably this would be a deliberate choice, especially as light client connections will eventually be the default) then they can, though exactly the same Subxt APIs, continue access information about older blocks too, in a seamless sort of way.

It is clear and precise enough that users of the API don't have to look under the hood in order to understand whether their request is going to work.

A user has to be aware that a request will work for some period of time after a block hash is seen. Currently, they won't know how long that period is, but they will know when it's no longer possible because a "stop" event is emitted. Subxt offers the same promise; if you access details about some block within a reasonable time frame, the access will succeed. We don't currently allow the user to be notified when some given block ref is no longer pinned/accessible, but that's definitely something we can add.

The entire point of splitting between archive and chainHead is to specialize the ways the chain is accessed: either you're watching the chain live, or you're watching a replay.

It's not clear to me at least what should be different between archive and chainhead method; do you have any ideas on how the archive methods should differ?

I'd likely offer a separate set of APIs for accessing archived block information in Subxt if the methods did differ in a meaningful way, but it would add complexity, and the user would then have to learn two sets of APIs (one for recent blocks and one for old blocks), so I do think that there would have to be at least a slightly compelling reason for any significant differences, but I'm open to all thoughts here!

@tomaka
Copy link
Contributor

tomaka commented Oct 16, 2023

The new API only guarantees that blocks will be pinned for some (currently unspecified and unknown) amount of time, and so if the user holds onto one for too long, I'll eventually have to say "no". What do you think would be a better approach here?

I think that the issue here is the "currently unspecified and unknown".
The reason why blocks need to be unpinned is because the node/JSON-RPC server needs to be able to liberate resources. If this node is later able to serve the same block through archive, that means that the resources haven't been liberated, and thus that this block didn't really "need" to be unpinned and could have remained pinned.

To put it otherwise, to me the solution is for example for archive nodes to allow an unlimited amount of pinned blocks, provided that these blocks are in the finalized canonical chain. In other words, the blocks that you would have been able to access through archive anyway can remain pinned.

Basically, I don't think that you should be using archive in order to overcome issues that can be solved by tweaking chainHead.
What matters when designing the API is the intent. Whenever you want to watch what happens at the head of the chain, chainHead should be able to provide whatever you need.

It's not clear to me at least what should be different between archive and chainhead method; do you have any ideas on how the archive methods should differ?

I'd likely offer a separate set of APIs for accessing archived block information in Subxt if the methods did differ in a meaningful way, but it would add complexity, and the user would then have to learn two sets of APIs (one for recent blocks and one for old blocks), so I do think that there would have to be at least a slightly compelling reason for any significant differences, but I'm open to all thoughts here!

It's not clear to me either what the differences are.

I've designed chainHead while having in mind how a full node syncs the head of the chain and how a light client syncs the head of the chain, and how a dApp might want to access the live chain. The API was designed knowing precisely what implementers are capable of and what JSON-RPC clients might want to request.

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

Furthermore there are many uncertainties w.r.t. how to manage the old chain. For example we might want to deprecate things in the future, which means maybe breaking old blocks. Is that acceptable? Is that something to bake in the API? I have no idea.

@jsdw
Copy link
Collaborator

jsdw commented Oct 17, 2023

To put it otherwise, to me the solution is for example for archive nodes to allow an unlimited amount of pinned blocks, provided that these blocks are in the finalized canonical chain. In other words, the blocks that you would have been able to access through archive anyway can remain pinned.

That makes a lot of sense to me!

I still think that Subxt could provide a single interface to allow fetching of current or old blocks, but could imagine that a user might explicitly pick which of these they are doing and select the appropriate backend for the job (in other words; instruct Subxt to use the archive methods or to use the chainHead methods to drive things). I guess it depends on how different the chainHead_ vs archive_ interfaces ended up being to how I'd design this stuff.

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

In the absence of more information, I think that it'd be good to merge this PR and start with the archive and chainHead methods being reasonably aligned (aside from the subscription vs request sort of approach, which makes sense to differ as this PR has done).

Then, we can iterate on the design from there.

I can imagine that a bunch of things (like #108) may want to exist in chainHead and archive namespaces if possible (albeit with perhaps a slightly different approach to cater for each API; chainHead maybe giving a "live" diff each block for instance, vs archive just being given the block numbers to diff between up front).

@lexnv
Copy link
Contributor Author

lexnv commented Oct 17, 2023

In the absence of more information, I think that it'd be good to merge this PR and start with the archive and chainHead methods being reasonably aligned ..
Then, we can iterate on the design from there.

That sounds like a plan to me, and follow up with use cases for the archive class to make informed decisions about what developers need. @tomaka what do you think?

@lexnv
Copy link
Contributor Author

lexnv commented Oct 27, 2023

@tomaka @josepot would love to get your thoughts on this PR

As a first step having the archive slightly aligned with the chainHead would allow us to iterate a bit faster on the design; and would also unblock the substrate side implementation (paritytech/polkadot-sdk#1846). Then we could gather more data about API usage and add/extend storage differences for example (#108).

@tomaka
Copy link
Contributor

tomaka commented Oct 27, 2023

To me what you're suggesting is to change from a bad design that shouldn't be used to a different bad design that shouldn't be used. I don't have an opinion on this.

@jsdw
Copy link
Collaborator

jsdw commented Nov 9, 2023

To me, this PR improves on the state of the archive_ methods by making them more consistent with the chainHead ones. I think that this is a good place to aim for in the absense of any data supporting doing something different.

So let's merge this by the end of the week unless there are any remaining objections that we can solve, and when we get more data we can make the necessary changes.

@tomaka
Copy link
Contributor

tomaka commented Nov 9, 2023

Again I'm neutral here, but to me consistency between archive and chainHead is not an objective.
They're two completely different APIs for two completely different usages.

@jsdw
Copy link
Collaborator

jsdw commented Nov 9, 2023

consistency between archive and chainHead is not an objective. They're two completely different APIs for two completely different usages.

Agreed; not an objective, but a good default when we don't know better :)

Yields an item that was found in the storage.
The JSON object corresponds to one of the requested items whose `type` was `"value"` or `"descendantsValues"`.

If the `key` is not associated with a storage value in the trie, then no response is generated.
Copy link
Member

@niklasad1 niklasad1 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine but it would be easier for clients if the server would just send back null to indicate that key was not found.

For this API, a client probably needs to have a timeout for these request and after for instance 1 min the call is regarded as failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will modify the phrasing of the spec. What I meant by this, given archive/storage is a plain method, no response will be generated for this "key" in the "result" array. Users will still get back once all queries are processed the responses back

The `waiting-for-continue` event is generated after at least one `"item"` event has been generated, and indicates that the JSON-RPC client must call `archive_unstable_storageContinue` before more events are generated.
The JSON object corresponds to one of the requested items whose `type` was `"hash"` or `"descendantsHashes"`.

If the `key` is not associated with a storage value in the trie, then no response is generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here 👆

@niklasad1
Copy link
Member

For archive it's unfortunately way less clear to me how to design it. The users of this namespace would probably be people who use indexers for example, but I don't know precisely what these people want. For example #108 is a legitimate wish, and I don't think I have enough knowledge to anticipate all wishes.

Furthermore there are many uncertainties w.r.t. how to manage the old chain. For example we might want to deprecate things in the future, which means maybe breaking old blocks. Is that acceptable? Is that something to bake in the API? I have no idea.

Yeah, I agree with ☝️ there lots of unknowns for sure but on the hand we have to start somewhere to know what issues that comes up to "fetch old blocks" with this archive API.

It doesn't have to be perfect but as long we can get folks to move away from the current RPC API "to fetch old storage" and get feedback for users what doesn't work etc.. to iterate further on it.

niklasad1
niklasad1 previously approved these changes Nov 15, 2023
@lexnv lexnv dismissed stale reviews from niklasad1 and jsdw via 00d6825 November 15, 2023 10:56
@lexnv lexnv merged commit 6daf453 into main Nov 15, 2023
3 checks passed
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Jan 15, 2024
This PR implements the `archive_unstable_storage` method that offers
support for:
- fetching values
- fetching hashes
- iterating over keys and values
- iterating over keys and hashes
- fetching merkle values from the trie-db

A common component dedicated to RPC-V2 storage queries is created to
bridge the gap between `chainHead/storage` and `archive/storage`.
Query pagination is supported by `paginationStartKey`, similar to the
old APIs.
Similarly to the `chainHead/storage`, the `archive/storage` method
accepts a maximum number of queried items.

The design builds upon:
paritytech/json-rpc-interface-spec#94.
Closes #1512.

cc @paritytech/subxt-team

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Niklas Adolfsson <[email protected]>
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
This PR implements the `archive_unstable_storage` method that offers
support for:
- fetching values
- fetching hashes
- iterating over keys and values
- iterating over keys and hashes
- fetching merkle values from the trie-db

A common component dedicated to RPC-V2 storage queries is created to
bridge the gap between `chainHead/storage` and `archive/storage`.
Query pagination is supported by `paginationStartKey`, similar to the
old APIs.
Similarly to the `chainHead/storage`, the `archive/storage` method
accepts a maximum number of queried items.

The design builds upon:
paritytech/json-rpc-interface-spec#94.
Closes paritytech#1512.

cc @paritytech/subxt-team

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Niklas Adolfsson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants