Ledger Indexing #53
Replies: 1 comment 1 reply
-
Interesting thoughts. I actually think of another configuration for this. We would use another K/V DB engine just storing a set of relevant outputs for the address (e.g. an output might be stored under multiple addresses e.g. sender and recipient). This would cause pretty minimum overhead (as its essentially just the old index in a seperate db now you can turn on and off) and still allows the node to serve todays api requests without any problems. All you had to do was to load the output set from the disk and then bulk-request the ledgerDB for the outputs. Maybe a good middle solution for some nodes if they want to balance between performance and serving wallets? This also brings a third option for the plugin, to only load a small set of outputs into the relational db, resulting in greatly reduce storage overhead, despite we already have three databases handling ledger state now. It however comes at the cost of having the limit that we can only handle requests for addresses we have loaded into that db. For example, an centralized exchange backend server could have code to only load addresses that are relevant to the exchange. Loading is simple, just bulk-request the outputs and insert them. You only have to keep track which addresses you are tracking in the relational DB. So there are essentially 4 modes: no indexing, K/V indexing only, K/V and relational (although the ruleset for which tx is stored where is out of scope here), and fully relational. In terms of architecture I would just have the indexer plugin listen for changes on the ledger, then update the database. Periodically updating it by scanning the database is too slow and will cause users to see their transaction as confirmed but their balance is still zero since their node has not updated their indexes yet. |
Beta Was this translation helpful? Give feedback.
-
Introduction
All participants (nodes) in a DLT maintain a shared ledger that records who owns what. Transactions on the network update the ledger. There are two types of ledger models:
src: https://academy.horizen.io/technology/expert/utxo-vs-account-model/
Accessing the Ledger
The protocol's job is to verify the state transition of the ledger, that is, whenever a new transaction (state update) comes in, check that the ledger transition rules are respected. As such, the protocol doesn't have to care how the transaction is generated or how clients wishing to transact find out about the current state.
In account based systems like Ethereum, fetching the current state of the account is easy as balances are grouped and stored per user account. Smart contracts however create sub-ledgers inside the main ledger, therefore a client has to know the identifier of the smart contract (SC address) to be able to fetch balance in these sub-ledgers, for example an ERC-20 token balance. If you don't know where to look for, you'll never find your tokens.
In UTXO based systems, the balance of a user account comprises of several unspent transaction outputs. Each output defines which account it belongs to, but there is no protocol level grouping of all outputs that belong to a specific account. At least there can not be one when an output might be unlocked by different accounts based on some conditions.
This gave rise to so-called indexers and explorers. An indexer monitors the ledger for updates, extracts information from transactions and outputs to index them in a relational database. Clients then run queries on the structured data in the database, for example to fetch all outputs that contain a specific user account identifier.
Explorers are very similar to indexers, but their main purpose is to show the ledger state, its history and evolution rather then to serve clients like wallets.
Indexers and explorers are not core part of the protocol, they are tooling to help clients interact with the ledger. As such, they are regarded as layer 2 applications that work on top of the raw ledger.
Chrysalis Indexing
The current IOTA mainnet (Chrysalis Part 2) contains the indexer application directly in the node software, and exposes REST API routes to run queries on the indexed ledger database. This works because in Chrysalis Part 2 outputs are simple, can only define one user account (address) and a balance. There is no complicated output locking logic.
Therefore, it is impossible to create an indexation lookup of an output to a user account without spending significant (1 Mi) amount of funds to the targeted address.
There can be any amount of outputs linked to a given address, therefore the REST API imposes limits on how many can be returned in a single call (1000 atm). Unfortunately, pagination of the query results is not possible due to the non-deterministic record lookup, so a client can only get all outputs linked to its address when there are less than 1000 of them.
Hence, in Chrysalis Part 2, when an address query through the REST API has the maximum 1000 results, the client knows that it should sweep some outputs together to decrease their number and try again.
Stardust Indexing
Outputs in Stardust draft TIP-18 are more complicated and can be unlocked by multiple entities based on different conditions.
As a result, not only does the indexing workload increases on nodes, but also the fact that an address is linked to an output does not mean that it actually owns it or can unlock it. In our previous example of getting 1000 outputs as a result of an address query, the client would not be able to sweep the outputs as it might not own them at the moment (or never). This makes us rethink the current indexing architecture and whether it should be part of the core node logic.
Separating the indexer application from the core protocol application would have the following benefits:
While of course there are downsides as well:
Proposed architecture
The core API of nodes (mandatory) that deals with the UTXO ledger should have only one job: to return
Output Object
byOutputID
stored in the node's key-value store. Note, that submitting new transactions is done via posting them in their encapsulating messages.An opt-in node plugin or a standalone application fetches raw data from the UTXO DB. Outputs are examined, indexed, possibly filtered and the extracted infromation is dumped into a relational database. This data structure maps the indices to
OutputIDs
.Clients such as wallets can run queries via the Indexer API to fetch the identifiers of otuputs that are of interest to them. For example, "give me all outputs that sit on my
Address
and were created by a specificSender
".Comparison
Outputs in other UTXO based DLTs such as Bitcoin and Cardano can not be indexed in this way, as usually outputs only contain a commitment to the actual locking script (
P2SH
in bitcoin). Hence, only the creator of the output knows how exactly the output can be spent and by which address(es), unless the pre-image of the committment is publicly revealed.This has a serious impact on client UX especially for L1 smart contracts as in Cardano: wallets need to rely on off-chain, mostly centralized communication to discover the pre-image of locking scripts, assuming the creator made it public.
In IOTA however, the actual locking script (unlokcing conditions) is part of the output itself, hence it is public and accessible to anyone. While it does increase the ledger size compared to the commitment only scheme, it provides transparency and on-chain discoverability for wallets.
Beta Was this translation helpful? Give feedback.
All reactions