Skip to content

Commit

Permalink
fix: Fix high vote on informal spec (#215)
Browse files Browse the repository at this point in the history
## What ❔

- Fixed high_vote on informal spec.
- Added comments to explain reasoning behind ignoring 2+ subquorums.
- Added link to blog post that explains why rounds are with 5f+1 fault
tolerance.
  • Loading branch information
brunoffranca authored Nov 21, 2024
1 parent 9df9c11 commit c586f85
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 57 deletions.
104 changes: 52 additions & 52 deletions node/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion spec/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ We find that most recent research on consensus algorithms unfortunately has beco
For our particular use case, there are a few lessons that we learned from researching and implementing previous consensus algorithms:

- Chained consensus is not worth it. It doesn’t improve the throughput or the latency while increasing systemic complexity. We always finalize every block.
- Lower fault tolerance to reduce voting rounds. This we learned from FaB Paxos. Decreasing our fault tolerance from *3f+1* to *5f+1* allows us to finalize in just one voting round.
- Lower fault tolerance to reduce voting rounds. This we learned from FaB Paxos. Decreasing our fault tolerance from *3f+1* to *5f+1* allows us to finalize in just one voting round. This [blog post from Decentralized Thoughts](https://decentralizedthoughts.github.io/2021-03-03-2-round-bft-smr-with-n-equals-4-f-equals-1/) explains intuitively how this works.
- Linear communication is not worth it. Quadratic communication for replicas simplifies security (there are fewer cases where we need to consider the effect of a malicious leader), implementation (you can fully separate the leader component) and view changes (constant timeouts are enough, [Jolteon/Ditto](https://arxiv.org/abs/2106.10362) ended up going in that direction after trying to implement HotStuff). Further, the performance drop is likely not significant (see [ParBFT](https://eprint.iacr.org/2023/679.pdf)).
- Re-proposals as a way of guaranteeing that there are no “rogue” blocks. This is a problem that didn’t get any attention so far (as far as we know), and is probably somewhat unique to public blockchains. The issue is that in all committee-based consensus algorithms it is possible that a commit QC (to use HotStuff’s terminology) is formed but that not enough replicas receive it. This will cause a timeout and another block to be proposed. Most algorithms just solve this by saying that the old block is no longer valid. All honest replicas will be in agreement about which block is canonical, but someone who just receives that single block and is not aware of the timeout will think that that particular block was finalized. This breaks the very desirable property of being able to verify that a given block is part of the chain just from seeing the block, without being required to have the entire chain. The way we solve this is to require that block proposals after a timeout (where a commit QC might have been formed) re-propose the previous block. This guarantees that if we see a block with a valid commit QC, then that block is part of the chain (maybe it wasn’t finalized in that particular view, but it was certainly finalized).
- Always justify messages to remove time dependencies. That’s something we got from Fast-HotStuff. Messages should have enough information by themselves that any replica is capable of verifying their validity without any other information (with the exception of having previous blocks, but that’s external to the consensus algorithm anyway). If we don’t, then we introduce subtle timing dependencies. For example, Tendermint had a bug that was only discovered years later, where the solution was that the leader had to wait for the maximum network delay at the end of every round. If that wait doesn’t happen, a lock can occur. Funnily enough, Hotstuff-2 reintroduces this timing dependency in order to get rid of one round-trip, which significantly worsens the difficulty of modelling and implementing such a system.
Expand Down
Loading

0 comments on commit c586f85

Please sign in to comment.