Don't Reset Replica Offset When Master Changes #4183

andydunstall · 2024-11-25T14:16:49Z

There is currently no way to tell which Dragonfly replica is most up to date when the replicas are configured to replicate to different masters, since the 'replica offset' (LSN) seems to be reset whenever the master changes

Such as if we have scenario:

You have two nodes A and B, where A is master and B is a synced replica
You add two more nodes, C and D, the promote C to master (after it synced with A) (so you now have a synced replica B, A is shutdown, C is master, and D is empty as it's syncing with C)
If C crashes, we can either promote B or D, but we have no way to tell which is most up to date. B could be synced while D is almost empty, or D could have synced and have more up to date data than B

So during an update we effectively have a window where we don't have replication

In Redis/Valkey, it looks like the replica offset isn't reset when the master changes, but continues from the previous replica offset. So if a replica has offset 100, then it's promoted to master, it still has offset 100 (instead of being reset to 1)

Could we do the same in Dragonfly? Then to select which replica to promote after a master failover, we can simply select the replica with the highest offset?

(I raised this a while ago but I think most of the team was on holiday, raising again as we've just seen this scenario in our system tests, where even though replication was enabled, a single node crash at the wrong time meant we dropped data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't Reset Replica Offset When Master Changes #4183

Don't Reset Replica Offset When Master Changes #4183

andydunstall commented Nov 25, 2024

Don't Reset Replica Offset When Master Changes #4183

Don't Reset Replica Offset When Master Changes #4183

Comments

andydunstall commented Nov 25, 2024