Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Risk Score recallibration #99

Open
drinkcoffee opened this issue Mar 4, 2023 · 14 comments
Open

Network Risk Score recallibration #99

drinkcoffee opened this issue Mar 4, 2023 · 14 comments

Comments

@drinkcoffee
Copy link
Collaborator

The Network Risk Score currently goes between 0 and 10. However, the rest of the framework is between 0 and 100.

@ermyas and @prototypo, and others: How important are network issues relative to other issues? That is, if the network issues were the worst they possibly could be (protocol uses unfinalised information, the source chain goes offline, is congested, and has safety violations), is that as bad as really bad architectural, implementation, and operational issues? If they are, then we should rescale up network risk score to be out of 100.

@chen-robert
Copy link

@drinkcoffee Agree that it feels weird to have network consensus risk rated so low. I think there could also be more granularity here, for example for N001 the "worst" category of 1 in a million is rather low probability. If there's a 1 in 10 for example, that'd be much much worse than 1 in a million (and probably imply some sort of bug in the bridge). I think this category and N004 (safety violations) should be weighted much more than liveliness.

This score is also likely different when looking at different chains, would the actual score be an average of all the scores? Or perhaps a TVL weighted average across each chain?

@drinkcoffee
Copy link
Collaborator Author

This score is also likely different when looking at different chains, would the actual score be an average of all the scores?

The scoring should be viewed as a score for a bridge between two chains. The same bridge between two other chains will yield a different score, because the chains it is bridging are different.

@drinkcoffee
Copy link
Collaborator Author

"worst" category of 1 in a million is rather low probability. If there's a 1 in 10 for

A one in a million probability of relaying still implies (to me) an unreliable and unusable bridge. 1 in 10 is just a bit more unusable. However, both are unusable. As such, the score needs to reflect that.

@drinkcoffee
Copy link
Collaborator Author

that it feels weird to have network consensus risk rated so low.

This is being addressed in this PR. Please have a look at comment what you think the scores should be:
#102

Note that 0 is perfect; 100 is completely risky. Generally, the numbers add up within a risk category, capping out at 100, and then the overall score is the worst of all risk categories.

@chen-robert
Copy link

I see, thanks for clarifying. Does it make sense to scope the score to within two chains? It seems like some of these (N00{2,4}) are intrinsic properties of the chain itself and would affect all bridges, regardless of the underlying bridge implementation.

From my understanding, bridges have multiple spokes where the bridge contract on any chain can send a message to any other chain. In this case, with N spokes you have N (N - 1) / 2 possible pairs/scores to calculate? Maybe it makes more sense to calculate a "chain score" which is separate from the bridge score itself.

@chen-robert
Copy link

A one in a million probability of relaying still implies (to me) an unreliable and unusable bridge. 1 in 10 is just a bit more unusable. However, both are unusable. As such, the score needs to reflect that.

Curious how these numbers are calculated. One in a million for a particular message implies a ~1% that 10 thousand messages are bridged correctly.

Similarly, one in a billion implies a ~0.01% that 100 thousand messages are bridged correctly. Both these cases feel essentially unusable in practice -- but where is the bright line? I feel like there should be a sharp boolean-like calculation here. Either essentially all messages are bridged correctly, or they aren't and the bridge shouldn't be used.

@drinkcoffee
Copy link
Collaborator Author

block and hence transaction finality is different to the wrong message being sent across the bridge. Imagine an event is emitted by a transaction on a source chain, and is included in a block. That block ends up not being included in the canonical chain. The transaction will be put back into the transaction pool, and could well end up in another block. It could even be in the block that replaced the original block in the canonical chain.

The problem arises if there is a situation that diverges. For example, account A on the source chain sends 5 tokens to the bridge account / does a crosschain transfer in a transaction on the source chain. In a separate transaction on the source chain, they send the same tokens to account B on the source chain, thus double spending the tokens. Only one of these transactions can succeed. If the first transaction was going to be in the canonical chain, and then the bridging protocol uses a finality of 1 in a million, assuming a 10% attack on the POW chain, and the user mounts such an attack, and successfully creates a block that goes into the canonical chain that includes the second block. In this case, the bridge will assume the 5 tokens are in escrow on the source chain and the user / attacker will have transferred the tokens to account B.

@drinkcoffee
Copy link
Collaborator Author

The degree of finality is a bridge specific configuration. As @chen-robert points out, the other factors are chain specific, independent of bridge. @ermyas , I am keen to hear your thoughts.

@ermyas
Copy link
Collaborator

ermyas commented Mar 9, 2023

Transaction Finality:

The degree of finality to wait for before relaying a message is primarily a parameter of the crosschain protocol. So it makes sense to evaluate the risks associated with it as such.

What the current set of scores doesn't yet capture is that some protocols (partly) delegate this decision to the application layer. In this model, the application decides what level of finality it requires. For such designs, I think we can either assign a score based on the protocol's default finality configuration (i.e. if the application doesn't provide one) or ascribe a very low-risk score.

Network Safety and Liveness Failure:

As mentioned in the network risk section, network safety failures are risks of the underlying chain and impact all protocols that bridge to/from it. This type of risk is largely outside of the control of individual bridges. However, there are three ways they can choose to deal with it:

Ignore this risk
Adopt strategies to mitigate the potential contagion of such events to the rest of their connected ecosystem. This might involve limiting the connectivity of less established chains with the rest of the bridge ecosystem and segregating funds bridged to/from those chains. This would help ensure that the weakest link doesn't compromise the entire ecosystem.
Only connect to chains with strong safety guarantees.

For option 1, I think we should have a high-risk score that directly impacts the overall score of a bridge protocol, based on its weakest connected chain. For option 2, I think we can have a more nuanced score based on the mitigation strategy employed for the whole protocol and offer an additional score that is bridge-leg specific.

@chen-robert
Copy link

block and hence transaction finality is different to the wrong message being sent across the bridge

Yeah apologies for any confusion, by "wrong" I was referring to a message which is not included in the canonical chain being sent, which is the same as what you mentioned. I suppose maybe my confusion was around if the 100M cost is meant to be incurred per attempt, or if it simply means any attacker with 100M can cause an X% chance that a particular message is dropped.

Network Safety and Liveness Failure:

@ermyas wanted to clarify, how does network liveliness affect the security of the rest of the bridge ecosystem? Even when the chain goes down, you would still be unable to spoof messages right? I'm not sure if this is a fair metric either for bridge security, I think most the risk is born by applications who choose to deploy their endpoints on the various chains.

@ermyas
Copy link
Collaborator

ermyas commented Mar 14, 2023

@chen-robert good catch, that was a typo, I didn't mean to include Liveness in the heading there... as the sentence below it suggests. I discuss the considerations of network liveness failures on certain types of protocols (e.g. Optimistic protocols) separately here.

@chen-robert
Copy link

Agreed, just checking :)

@sambacha
Copy link

sambacha commented Aug 21, 2023

How do price a network outage for our transaction services? The current existing Gas Pricing API can be used for this purposes by applying the principle of Little’s Law 1.

$$ L=λW $$

We can take the value for Littles Law (we apply this to a new field called ‘networkCongestion’) and apply it to the time of networkOutage, which is the time that zero transactions are able to be included in the network. A period of networkOutage can be defined as a value that is three standard deviations above networkCongestion. Think of this as the time it takes to return to a normal value of networkCongestion after the outage is over, i.e. how long it takes to return to normal network congestions (how long it takes to process the transactions that have accumulated during an outage) after networkOutage is over.

New Field: networkCongestion

networkCongestion - A normalized number that can be used to gauge the congestion level of the network, with 0 meaning not congested and 1 meaning extremely congested

New Field: networkOutage

networkOutage - A true/false indicating a recognized network outage event. True means we are currently experiencing a network outage

Footnotes

  1. Alberto Leon-Garcia (2008). Probability, statistics, and random processes for electrical engineering (3rd ed.). Prentice Hall. ISBN 978-0-13-147122-1.

@sambacha
Copy link

As a side note to the above suggestion, we considered applying this for 'taxing' cross chain transfers for assets being moved to less 'secure' networks. I.e. we want to effectuate the transfer balance to reflect the security subsidy that is implicit as an actual % change effect on the dest. chain balances. Meaning you move 100 tokens from Ethereum to 'LessSecureNetwork', you would get only 80 tokens, showing a risk of 20% or something along those lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants