Network Risk Score recallibration #99

drinkcoffee · 2023-03-04T00:25:14Z

The Network Risk Score currently goes between 0 and 10. However, the rest of the framework is between 0 and 100.

@ermyas and @prototypo, and others: How important are network issues relative to other issues? That is, if the network issues were the worst they possibly could be (protocol uses unfinalised information, the source chain goes offline, is congested, and has safety violations), is that as bad as really bad architectural, implementation, and operational issues? If they are, then we should rescale up network risk score to be out of 100.

chen-robert · 2023-03-07T23:52:38Z

@drinkcoffee Agree that it feels weird to have network consensus risk rated so low. I think there could also be more granularity here, for example for N001 the "worst" category of 1 in a million is rather low probability. If there's a 1 in 10 for example, that'd be much much worse than 1 in a million (and probably imply some sort of bug in the bridge). I think this category and N004 (safety violations) should be weighted much more than liveliness.

This score is also likely different when looking at different chains, would the actual score be an average of all the scores? Or perhaps a TVL weighted average across each chain?

drinkcoffee · 2023-03-08T00:53:08Z

This score is also likely different when looking at different chains, would the actual score be an average of all the scores?

The scoring should be viewed as a score for a bridge between two chains. The same bridge between two other chains will yield a different score, because the chains it is bridging are different.

drinkcoffee · 2023-03-08T00:56:42Z

"worst" category of 1 in a million is rather low probability. If there's a 1 in 10 for

A one in a million probability of relaying still implies (to me) an unreliable and unusable bridge. 1 in 10 is just a bit more unusable. However, both are unusable. As such, the score needs to reflect that.

drinkcoffee · 2023-03-08T00:59:11Z

that it feels weird to have network consensus risk rated so low.

This is being addressed in this PR. Please have a look at comment what you think the scores should be:
#102

Note that 0 is perfect; 100 is completely risky. Generally, the numbers add up within a risk category, capping out at 100, and then the overall score is the worst of all risk categories.

chen-robert · 2023-03-08T00:59:38Z

I see, thanks for clarifying. Does it make sense to scope the score to within two chains? It seems like some of these (N00{2,4}) are intrinsic properties of the chain itself and would affect all bridges, regardless of the underlying bridge implementation.

From my understanding, bridges have multiple spokes where the bridge contract on any chain can send a message to any other chain. In this case, with N spokes you have N (N - 1) / 2 possible pairs/scores to calculate? Maybe it makes more sense to calculate a "chain score" which is separate from the bridge score itself.

chen-robert · 2023-03-08T01:02:16Z

A one in a million probability of relaying still implies (to me) an unreliable and unusable bridge. 1 in 10 is just a bit more unusable. However, both are unusable. As such, the score needs to reflect that.

Curious how these numbers are calculated. One in a million for a particular message implies a ~1% that 10 thousand messages are bridged correctly.

Similarly, one in a billion implies a ~0.01% that 100 thousand messages are bridged correctly. Both these cases feel essentially unusable in practice -- but where is the bright line? I feel like there should be a sharp boolean-like calculation here. Either essentially all messages are bridged correctly, or they aren't and the bridge shouldn't be used.

drinkcoffee · 2023-03-08T22:43:03Z

block and hence transaction finality is different to the wrong message being sent across the bridge. Imagine an event is emitted by a transaction on a source chain, and is included in a block. That block ends up not being included in the canonical chain. The transaction will be put back into the transaction pool, and could well end up in another block. It could even be in the block that replaced the original block in the canonical chain.

The problem arises if there is a situation that diverges. For example, account A on the source chain sends 5 tokens to the bridge account / does a crosschain transfer in a transaction on the source chain. In a separate transaction on the source chain, they send the same tokens to account B on the source chain, thus double spending the tokens. Only one of these transactions can succeed. If the first transaction was going to be in the canonical chain, and then the bridging protocol uses a finality of 1 in a million, assuming a 10% attack on the POW chain, and the user mounts such an attack, and successfully creates a block that goes into the canonical chain that includes the second block. In this case, the bridge will assume the 5 tokens are in escrow on the source chain and the user / attacker will have transferred the tokens to account B.

drinkcoffee · 2023-03-08T22:44:10Z

The degree of finality is a bridge specific configuration. As @chen-robert points out, the other factors are chain specific, independent of bridge. @ermyas , I am keen to hear your thoughts.

ermyas · 2023-03-09T01:24:38Z

Transaction Finality:

The degree of finality to wait for before relaying a message is primarily a parameter of the crosschain protocol. So it makes sense to evaluate the risks associated with it as such.

What the current set of scores doesn't yet capture is that some protocols (partly) delegate this decision to the application layer. In this model, the application decides what level of finality it requires. For such designs, I think we can either assign a score based on the protocol's default finality configuration (i.e. if the application doesn't provide one) or ascribe a very low-risk score.

Network Safety and Liveness Failure:

As mentioned in the network risk section, network safety failures are risks of the underlying chain and impact all protocols that bridge to/from it. This type of risk is largely outside of the control of individual bridges. However, there are three ways they can choose to deal with it:

Ignore this risk
Adopt strategies to mitigate the potential contagion of such events to the rest of their connected ecosystem. This might involve limiting the connectivity of less established chains with the rest of the bridge ecosystem and segregating funds bridged to/from those chains. This would help ensure that the weakest link doesn't compromise the entire ecosystem.
Only connect to chains with strong safety guarantees.

For option 1, I think we should have a high-risk score that directly impacts the overall score of a bridge protocol, based on its weakest connected chain. For option 2, I think we can have a more nuanced score based on the mitigation strategy employed for the whole protocol and offer an additional score that is bridge-leg specific.

chen-robert · 2023-03-14T05:19:56Z

block and hence transaction finality is different to the wrong message being sent across the bridge

Yeah apologies for any confusion, by "wrong" I was referring to a message which is not included in the canonical chain being sent, which is the same as what you mentioned. I suppose maybe my confusion was around if the 100M cost is meant to be incurred per attempt, or if it simply means any attacker with 100M can cause an X% chance that a particular message is dropped.

Network Safety and Liveness Failure:

@ermyas wanted to clarify, how does network liveliness affect the security of the rest of the bridge ecosystem? Even when the chain goes down, you would still be unable to spoof messages right? I'm not sure if this is a fair metric either for bridge security, I think most the risk is born by applications who choose to deploy their endpoints on the various chains.

ermyas · 2023-03-14T05:42:47Z

@chen-robert good catch, that was a typo, I didn't mean to include Liveness in the heading there... as the sentence below it suggests. I discuss the considerations of network liveness failures on certain types of protocols (e.g. Optimistic protocols) separately here.

chen-robert · 2023-03-16T22:28:46Z

Agreed, just checking :)

sambacha · 2023-08-21T10:21:06Z

How do price a network outage for our transaction services? The current existing Gas Pricing API can be used for this purposes by applying the principle of Little’s Law ¹.

$$ L=λW $$

We can take the value for Littles Law (we apply this to a new field called ‘networkCongestion’) and apply it to the time of networkOutage, which is the time that zero transactions are able to be included in the network. A period of networkOutage can be defined as a value that is three standard deviations above networkCongestion. Think of this as the time it takes to return to a normal value of networkCongestion after the outage is over, i.e. how long it takes to return to normal network congestions (how long it takes to process the transactions that have accumulated during an outage) after networkOutage is over.

New Field: networkCongestion

networkCongestion - A normalized number that can be used to gauge the congestion level of the network, with 0 meaning not congested and 1 meaning extremely congested

New Field: networkOutage

networkOutage - A true/false indicating a recognized network outage event. True means we are currently experiencing a network outage

Alberto Leon-Garcia (2008). Probability, statistics, and random processes for electrical engineering (3rd ed.). Prentice Hall. ISBN 978-0-13-147122-1. ↩

sambacha · 2023-08-22T09:04:39Z

As a side note to the above suggestion, we considered applying this for 'taxing' cross chain transfers for assets being moved to less 'secure' networks. I.e. we want to effectuate the transfer balance to reflect the security subsidy that is implicit as an actual % change effect on the dest. chain balances. Meaning you move 100 tokens from Ethereum to 'LessSecureNetwork', you would get only 80 tokens, showing a risk of 20% or something along those lines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network Risk Score recallibration #99

Network Risk Score recallibration #99

drinkcoffee commented Mar 4, 2023

chen-robert commented Mar 7, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

chen-robert commented Mar 8, 2023

chen-robert commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

ermyas commented Mar 9, 2023

chen-robert commented Mar 14, 2023

ermyas commented Mar 14, 2023

chen-robert commented Mar 16, 2023

sambacha commented Aug 21, 2023 •

edited

Loading

sambacha commented Aug 22, 2023

Network Risk Score recallibration #99

Network Risk Score recallibration #99

Comments

drinkcoffee commented Mar 4, 2023

chen-robert commented Mar 7, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

chen-robert commented Mar 8, 2023

chen-robert commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

drinkcoffee commented Mar 8, 2023

ermyas commented Mar 9, 2023

chen-robert commented Mar 14, 2023

ermyas commented Mar 14, 2023

chen-robert commented Mar 16, 2023

sambacha commented Aug 21, 2023 • edited Loading

Footnotes

sambacha commented Aug 22, 2023

sambacha commented Aug 21, 2023 •

edited

Loading