Replies: 7 comments
-
TLDR: Merkle stuff is vulnerable to (possibly maliciously crafted) data sets whose necessarily unique tree representation is degenerate; monoid-tree-based range-based set reconciliation protects against this case. With the more straightforward Merkle tree approach, all peers need to agree on a single tree shape, because the tree representation of a data set affects its hash: Unfortunately, agreeing on unique tree shapes for every set such that each such tree has logarithmic height can only be done with balancing schemes for which insertion and deletion take In the worst case, these probabilistic solutions degrade to linear height. And unfortunately, maliciously crafting worst-case instances is very much feasible. Since a straightforward implementation of Merkle reconciliation approaches partitions into subtrees, the number of roundtrips becomes linear in that case as well. The whole idea behind the monoid-tree-based range-based set reconciliation is that partitioning choices are fully decoupled from any concrete tree representation of the (sub-)set to reconcile. Peers can always split their set in |
Beta Was this translation helpful? Give feedback.
-
Section 6 of the range-based paper demonstrates how these two worlds can be combined, using treaps as an example. You keep a probabilistic tree with Merkle labels, but the fingerprints you used for reconciliation are not directly those tree labels. By accepting the risk of degenerate trees, you gain the ability to use standard, non-monoidal hash functions. You still need to compute fingerprints for the ranges you receive in time proportional to the height of your tree (so linear in the worst case, but logarithmic in the expected case), but the number of roundtrips is guaranteed to be logarithmic, even if the tree is degenerate. I haven't looked into a similar construction for sparse Merkle trees, but I imagine it should be quite straightforward. |
Beta Was this translation helpful? Give feedback.
-
Well, I think I understand the attack surface if not the math; in a p2p network, someone can keep creating CIDs so that the Merkle tree used by Iroh takes O(sqrt n) to sync instead of O(log n). I wonder if this means a Strfry is vulnerable to this, especially since it uses integer keys not hashes, so it is easier to create that malicious set. Thanks, @AljoschaMeyer for answering my question, I have had more questions about append-only logs, and now I am encouraged to ask those at Bamboo. |
Beta Was this translation helpful? Give feedback.
-
O(n) even when the tree degenerates to a linked list. For sparse merkle trees, a worst-case tree would consist of the minimal amount of items to have a path to a leaf whose height (256) cannot be compressed. |
Beta Was this translation helpful? Give feedback.
-
Ah, so something like this but all the way to [0; 32]? That makes sense. graph TD
R(["Root"])
R --> i0(("0"))
i0 --> i00(("0"))
i0 --> L01((("01")))
i00 --> L001((("000")))
i00 --> i000((("001")))
And it would be worse in systems using integer keys submitted by users not hashes of content. |
Beta Was this translation helpful? Give feedback.
-
I don't have good input to the discussion topic, but this sentence caught my attention:
@AljoschaMeyer the I thought @dignifiedquire ported it to iroh, but can't find the code... The "english description" I came up with when reimplementing it way back is:
|
Beta Was this translation helpful? Give feedback.
-
@ribasushi Thank you, I didn't know about these. But on a first glance, it looks like these are append-only for new items, whereas for sets we need to support item insertion at arbitrary positions (it has to be a search tree, after all). My instinctual reaction to the trickle dags is that they fall into a similar category as linking schemes or transparency logs- |
Beta Was this translation helpful? Give feedback.
-
Disclaimer: I am mostly learning by asking questions, not speaking from authority, so feel free to ignore this.
I asked about what does the Range-Based Set Reconciliation offer over a Merkle binary tree, and I got this response:
#707 (reply in thread) (thanks @dignifiedquire)
I am not quite sure I understand the response so I will try to assume why would large trees need much larger data transferred, than the binary search approach.
What we know about the Merkle tree that Iroh would create:
1- Leaves are going to be all at the same depth (256) by default since keys are 32 bytes hashes.
2- Leaves are going to be very sparse since they are uniformly distributed on a very large key space.
So a typical tree or a subtree could look like this:
Indeed too many empty and internal nodes.
We can get rid of empty branches by encoding if a given internal node has left/right or both children. So we get this tree:
And with collapsing nodes we can remove more internal nodes to get this tree instead:
If that is possible and feasible, wouldn't that reduce the data needed to almost the same as in the case of Range-Based Set Reconciliation?
Much more elaborate documentation at Quadrable where I learned about both left/right branches encoding, and collapsing nodes.
Beta Was this translation helpful? Give feedback.
All reactions