polygon/sync: handle bad blocks on chain tip #12320

taratorio · 2024-10-15T13:44:55Z

High level:

when we get a ErrForkChoiceUpdateBadBlock on the chain tip due to a invalid root hash/receipts mismatch there are 2 scenarios - either we are wrong (due to a bug in our execution or us running on a wrong fork) or the peer which sent us the new block/new block hashes event is wrong
when we get a ErrForkChoiceUpdateBadBlock when handling a new milestone that means we are the one who is wrong

With this in mind a way to handle ErrForkChoiceUpdateBadBlock on tip is to:

if the error happens when handing a new milestone mismatch and/or when in "sync to tip mode" using checkpoints & milestones on initial start - then terminate the node if we have a fork choice bad block err since we know the issue is with us and there is no point in looping
if the error happens while we are in "chain tip mode" and are building our canonical chain after the latest milestone has been correctly processed then we do not know if it us who is at fault or the block our peer has sent - in this case we mark the block as bad in an lru cache and penalise the peer (if this is after a NewBlockHashes event) and move on. Any future blocks that link to bad blocks in the lru get discarded with warn messages. The idea behind this is that 1) if it was indeed our peer who was wrong then we did the right thing & 2) if our peer was correct but we were wrong then we will eventually penalise all our peers and will stop receiving events from p2p - when this happens we will eventually get a new milestone event from heimdall, our canonical chain tip wouldn't match, we will try to download all blocks for the milestone from any peer that connects to us afterwards, will execute those blocks and we will again fail with a bad block err in which case this will be treated as a terminal failure and the node will exit unexpectedly

Whenever we handle a bad block err we make sure to unwind the bridge back to the last known valid block height, so that it is in a consistent state for the next valid block and after shutdown for a clean restarts that would not require manual intervention.

As part of this PR I noticed a few gaps in the logic around unwinding the bridge upon change of forks. Addressed that as part of this change so that we can close the "handle unwinding at tip" ticket.

…k-announcements

…blocks

taratorio added 17 commits October 10, 2024 20:00

wip

a24cee2

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bloc…

0f279a4

…k-announcements

polygon/sync,p2p: publish block announcements to devp2p

23062e6

add todo

39c42b8

move publishNewBlock after InsertBlocks for TD availability

510546f

tidy

49feec0

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bloc…

644aab5

…k-announcements

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bloc…

d53dcce

…k-announcements

add peer_tracker_tests

8bc1543

add message_sender_test

c75cb27

add publisher tests

055514c

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bloc…

cc72e2e

…k-announcements

allow NewBlockMsg and NewBlockHashesMsg in sentry SendMessageById

606ce01

fix linter

11ce255

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bloc…

3cfc9ad

…k-announcements

polygon/sync: bad blocks on chain tip

a92f1f4

add comment

fee8637

Base automatically changed from astrid-block-announcements to main October 15, 2024 19:58

taratorio added 12 commits October 15, 2024 22:39

wip

365ea2d

wip

64747a1

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

794d9e8

…blocks

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

7fe3b9d

…blocks

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

ef403b9

…blocks

wip: double check things

2b81794

add waiting for chain tip events info log

11f4ae5

handleWaypointExecutionErr

382ebdd

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

0fc3d9a

…blocks

implement and test LCA

ddba4c0

header by hash ccb tests

36b816d

prune node tests

69b7969

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

7c7d261

…blocks

taratorio marked this pull request as ready for review October 16, 2024 21:56

taratorio added 6 commits October 16, 2024 23:00

stylistic

c0c59b8

fix compilation

e3f5a3d

simplify bridge reorg and handling

0811185

Merge branch 'main' of github.com:ledgerwatch/erigon into astrid-bad-…

9b56206

…blocks

add log for waypoin exec err

d7bd861

add log for penalizing peer for bad block

c01db8b

taratorio changed the title ~~polygon/sync: remember bad blocks on chain tip~~ polygon/sync: handle bad blocks on chain tip Oct 17, 2024

taratorio requested review from shohamc1 and mh0lt October 17, 2024 16:34

taratorio added the polygon label Oct 17, 2024

mh0lt approved these changes Oct 17, 2024

View reviewed changes

taratorio enabled auto-merge (squash) October 17, 2024 16:36

taratorio merged commit b46552d into main Oct 17, 2024
11 checks passed

taratorio deleted the astrid-bad-blocks branch October 17, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygon/sync: handle bad blocks on chain tip #12320

polygon/sync: handle bad blocks on chain tip #12320

taratorio commented Oct 15, 2024 •

edited

Loading

polygon/sync: handle bad blocks on chain tip #12320

polygon/sync: handle bad blocks on chain tip #12320

Conversation

taratorio commented Oct 15, 2024 • edited Loading

taratorio commented Oct 15, 2024 •

edited

Loading