-
Notifications
You must be signed in to change notification settings - Fork 110
threshold handling during message processing? #1506
Comments
Does it make sense to just drop a peer because the balance cannot be loaded / persisted? Such situation should put the entire failing node offline, as far as I understand it failing to load / persist balances may represent an I/O error and thus a situation that needs to be handled by the owner of the node. |
@diegomasini, @mortelli just was depicting the current state. But as a matter of fact, if we have a corrupt database, or something, then how do we deal with it in a clean way? Probably we should not deal with the remote peer until the issue is fixed, or balances will diverge, resulting in some (possibly more painful) drop later. The simplest approach is to drop the peer right away. Having said that, I am in favor of a better handling, but then we need to specify what it could be. |
I agree on dropping the peer, but it should also trigger a mechanism to put the failing node offline. To avoid other inconsistency errors. |
Keep in mind that even though the remote peer is the one that is dropped in this case, the failure to load or save a state indicates a problem in the local node, not the remote one. |
Exactly, that's my point. |
discrepancies in perceived debt balance between two peers is not fatal, as long as the disagreement amount is small compared to the payment thresholds (or more specifically smaller than the difference between payment and disconnect thresholds). but of course it is not a desirable state of affairs. my gut tells me: don't be too eager to drop peers. avoid it whenever possible. |
The question if you don't drop peers - what do you do? |
@homotopycolimit are you suggesting that when a threshold is reached, the settlement is not for the full amount hence allowing some discrepancies in perceived debt balance? Because if the settlement is for the full amount any discrepancy would likely cause disconnect. |
I think so, yes. Let's say that 80 is the threshold where you require payment. Assume also that you think your peer owes you 80 and they think they owe you 76. Now assume you ask for payment and they send you a cheque over 76, (or even 26 for that matter); why would you disconnect? |
Ah so you are suggesting the nodes just ask "pay me what you think you owe me" rather than "pay me x amount which you owe me". Brilliant! @holisticode @mortelli @diegomasini. |
But this is at a different level. If you reach the payment threshold, the debtor issues a cheque. No disconnection takes place. If the creditor receives the cheque, the balance is restored to 0 (zero). If the cheque would not the full payment threshold value, the balance is restored to the difference (say 4). That is why the cheque has an So when a node sends a
Disconnection is something completely different and happens at the disconnect threshold, which is some allowance above the payment threshold, and happens if no cheque at all has been sent and thus the debtor continues consuming from the creditor until it reaches that threshold. And most of the above discussion was about a corrupt database, which as of decision in meeting https://github.com/ethersphere/user-stories/blob/master/doc/incentives/sync-2019-06-25.md is a fatal error, and thus should result in the local node going offline (and a remote peer drop first). |
I never said 'ignore them'. I suggest refusing to serve retrieval requests coming from the offending node, other traffic is still allowable. In particular if you want to get data from them and they serve it to you (thus reducing the debt) that's fine. You only disallow traffic that would further increase the debt. Disconnection is fine in lower Kademlia bins but gets problematic in the most proximate bin. If we can maintain connectivity there at the cost of blocking certain types of request, then that's probably preferable to a disconnect. The problem is that too many disconnects and blacklistings in the most proximate nbhd breaks our routing assumptions. |
This suggestion makes much sense. You say though "I suggest" - can you formalize this into a requirement? I assume this implies getting consensus and approval from the research track, in possibly written form. How do you guys handle that? @mortelli I suggest that in order to implement this, we would have to create a custom error, which is raised if the "disconnect threshold" would be crossed, then when both |
we cannot be so liberal, but indeed we should find a way to correct minor discrepancies resulting from sent but not received messages |
sure we can. after all, the request is really just "pay me some amount to get us back closer to zero balance (under the threshold)". If nodes always pay the full amount they think they owe, and if the discrepancy came about due to random errors, then we can assume they will even out over time. It is only systematic errors that will accrue over time to the point where they become real obstacles. |
I think we're not properly dealing with this scenario based on the the current flow. We could have this situation:
In summary, while A owes B, B will not be sending or receiving any messages unless a payment is made, even if this message would reduce the incurred debt. What if we ignored the balance threshold when the message price is negative?
I like this. We could use it to make a distinction between errors raised during the accounting process and other errors–maybe this could be a way of dropping chunk messages but allowing the rest. We could drop the messages for these custom errors, but:
|
Here are some potentially problematic situations. Please review if possible. Starting point: Node
|
Following up on this one:
We now have a PR that will cover this scenario:
This one should be solved through #1922 (which closes #1921). |
Hello all,
After discussion with @holisticode, @vojtechsimetka, @diegomasini and others I thought it would be a good idea to document the issue of SWAP thresholds (see #1440) and open up the discussion to everyone.
I'm attaching a flowchart which lays out the flow for message processing considering both the payment and disconnect thresholds, as we understand and are building them today.
Diagram
Notes
peer.Send
,peer.handleIncoming
is called.accounting.Send
,accounting.Receive
is called.Discussion points
Based on this flow, it follows that:
1.1. This can cause a discrepancy between the balances of each of the nodes in an exchange. In turn, it can lead to further problems, such as a node requesting a cheque from another, the latter possibly rejecting it since it would leave it with a negative balance.
2.1. As a result, the first message that sends the balance over the disconnect threshold will still be sent/processed. Only messages that follow that one (assuming no payment is done) will be dropped.
The text was updated successfully, but these errors were encountered: