Ledger Indexing #53

lzpap · 2022-01-24T14:09:18Z

lzpap
Jan 24, 2022
Maintainer

Introduction

All participants (nodes) in a DLT maintain a shared ledger that records who owns what. Transactions on the network update the ledger. There are two types of ledger models:

Account based ledgers are like traditional accounting ledgers: coins are held in user accounts, addresses. A transaction in this type of ledger takes the form of "move 10 coins from account A to account B".
UTXO based ledgers choose a different type of accounting: instead of user accounts, the coins themselves are the basic accounting unit. Coins are held in unspent transaction outputs, each of which defines who can consume them. A transaction in the UTXO ledger takes the form of "consume the output that has 10 coins owned by account A and create a new output with 10 coins that is owned by account B"

src: https://academy.horizen.io/technology/expert/utxo-vs-account-model/

Accessing the Ledger

The protocol's job is to verify the state transition of the ledger, that is, whenever a new transaction (state update) comes in, check that the ledger transition rules are respected. As such, the protocol doesn't have to care how the transaction is generated or how clients wishing to transact find out about the current state.

In account based systems like Ethereum, fetching the current state of the account is easy as balances are grouped and stored per user account. Smart contracts however create sub-ledgers inside the main ledger, therefore a client has to know the identifier of the smart contract (SC address) to be able to fetch balance in these sub-ledgers, for example an ERC-20 token balance. If you don't know where to look for, you'll never find your tokens.

In UTXO based systems, the balance of a user account comprises of several unspent transaction outputs. Each output defines which account it belongs to, but there is no protocol level grouping of all outputs that belong to a specific account. At least there can not be one when an output might be unlocked by different accounts based on some conditions.

This gave rise to so-called indexers and explorers. An indexer monitors the ledger for updates, extracts information from transactions and outputs to index them in a relational database. Clients then run queries on the structured data in the database, for example to fetch all outputs that contain a specific user account identifier.

Explorers are very similar to indexers, but their main purpose is to show the ledger state, its history and evolution rather then to serve clients like wallets.

Indexers and explorers are not core part of the protocol, they are tooling to help clients interact with the ledger. As such, they are regarded as layer 2 applications that work on top of the raw ledger.

Chrysalis Indexing

The current IOTA mainnet (Chrysalis Part 2) contains the indexer application directly in the node software, and exposes REST API routes to run queries on the indexed ledger database. This works because in Chrysalis Part 2 outputs are simple, can only define one user account (address) and a balance. There is no complicated output locking logic.

Therefore, it is impossible to create an indexation lookup of an output to a user account without spending significant (1 Mi) amount of funds to the targeted address.

There can be any amount of outputs linked to a given address, therefore the REST API imposes limits on how many can be returned in a single call (1000 atm). Unfortunately, pagination of the query results is not possible due to the non-deterministic record lookup, so a client can only get all outputs linked to its address when there are less than 1000 of them.

Hence, in Chrysalis Part 2, when an address query through the REST API has the maximum 1000 results, the client knows that it should sweep some outputs together to decrease their number and try again.

Stardust Indexing

Outputs in Stardust draft TIP-18 are more complicated and can be unlocked by multiple entities based on different conditions.

As a result, not only does the indexing workload increases on nodes, but also the fact that an address is linked to an output does not mean that it actually owns it or can unlock it. In our previous example of getting 1000 outputs as a result of an address query, the client would not be able to sweep the outputs as it might not own them at the moment (or never). This makes us rethink the current indexing architecture and whether it should be part of the core node logic.

Separating the indexer application from the core protocol application would have the following benefits:

Offload network nodes running the actual protocol from running the indexing application,
Protect network nodes from possible client API DoS attacks,
Reduce byte cost of outputs by treating output fields as data rather than indices to be mapped.
Possibility to implement advanced indexing logic without affecting network performance,
Possibility to implement paging in API requests,
Provide a clean and convenient interface for clients to fetch ledger state,
Possibility to implement standard ledger interfaces (Rosetta API, etc.)

While of course there are downsides as well:

Ledger DB needs to be duplicated and kept synced with the node's ledger database.
At the end of the day, you as a client either have to run the indexer application yourself, or rely on a publicly exposed one to be able to use the network.

Proposed architecture

The core API of nodes (mandatory) that deals with the UTXO ledger should have only one job: to return Output Object by OutputID stored in the node's key-value store. Note, that submitting new transactions is done via posting them in their encapsulating messages.

An opt-in node plugin or a standalone application fetches raw data from the UTXO DB. Outputs are examined, indexed, possibly filtered and the extracted infromation is dumped into a relational database. This data structure maps the indices to OutputIDs.

Clients such as wallets can run queries via the Indexer API to fetch the identifiers of otuputs that are of interest to them. For example, "give me all outputs that sit on my Address and were created by a specific Sender".

Comparison

Outputs in other UTXO based DLTs such as Bitcoin and Cardano can not be indexed in this way, as usually outputs only contain a commitment to the actual locking script (P2SH in bitcoin). Hence, only the creator of the output knows how exactly the output can be spent and by which address(es), unless the pre-image of the committment is publicly revealed.

This has a serious impact on client UX especially for L1 smart contracts as in Cardano: wallets need to rely on off-chain, mostly centralized communication to discover the pre-image of locking scripts, assuming the creator made it public.

In IOTA however, the actual locking script (unlokcing conditions) is part of the output itself, hence it is public and accessible to anyone. While it does increase the ledger size compared to the commitment only scheme, it provides transparency and on-chain discoverability for wallets.

WernerderChamp · 2022-02-02T23:19:58Z

WernerderChamp
Feb 2, 2022

Interesting thoughts.

I actually think of another configuration for this. We would use another K/V DB engine just storing a set of relevant outputs for the address (e.g. an output might be stored under multiple addresses e.g. sender and recipient). This would cause pretty minimum overhead (as its essentially just the old index in a seperate db now you can turn on and off) and still allows the node to serve todays api requests without any problems. All you had to do was to load the output set from the disk and then bulk-request the ledgerDB for the outputs. Maybe a good middle solution for some nodes if they want to balance between performance and serving wallets?

This also brings a third option for the plugin, to only load a small set of outputs into the relational db, resulting in greatly reduce storage overhead, despite we already have three databases handling ledger state now. It however comes at the cost of having the limit that we can only handle requests for addresses we have loaded into that db. For example, an centralized exchange backend server could have code to only load addresses that are relevant to the exchange. Loading is simple, just bulk-request the outputs and insert them. You only have to keep track which addresses you are tracking in the relational DB.

So there are essentially 4 modes: no indexing, K/V indexing only, K/V and relational (although the ruleset for which tx is stored where is out of scope here), and fully relational.
In the latter case, I wonder if it makes sense to just completely change the ledger database to a relational one. Of course, lookups are slower, but the I/O usage of having to maintain two databases at once might be the worse thing here.

In terms of architecture I would just have the indexer plugin listen for changes on the ledger, then update the database. Periodically updating it by scanning the database is too slow and will cause users to see their transaction as confirmed but their balance is still zero since their node has not updated their indexes yet.

1 reply

alexsporn Feb 2, 2022
Collaborator

This is already implemented and working. You can have a look at the HORNET develop branch and the new Indexer plugin + API spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ledger Indexing #53

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Ledger Indexing #53

lzpap Jan 24, 2022 Maintainer

Introduction

Accessing the Ledger

Chrysalis Indexing

Stardust Indexing

Proposed architecture

Comparison

Replies: 1 comment · 1 reply

WernerderChamp Feb 2, 2022

alexsporn Feb 2, 2022 Collaborator

lzpap
Jan 24, 2022
Maintainer

Replies: 1 comment 1 reply

WernerderChamp
Feb 2, 2022

alexsporn Feb 2, 2022
Collaborator