Some documentation of catch up proccces #852

alievmirza · 2022-06-21T16:26:46Z

I've investigated catchup procces in JRaft, and made some documentation about it and decided to share it with the community. Hope this will be helpful.

Note, that this is actual for 1.3.5, but I haven't checked, if there were some changes in the algorithm.

Catching up process

Entry point:

We have a closure named NodeImpl.OnCaughtUp, which is responsible for the catching up process for every stale node/replicator on a leader.
This closure is created every time we call NodeImpl.ConfigurationCtx#addNewPeers which happens on a raft configuration change, for example, when we call NodeImpl#changePeers. In NodeImpl.ConfigurationCtx#addNewPeers method we assign OnCaughtUp closure with a corresponding replicator for a stale node. This is done inside ReplicatorGroupImpl#waitCaughtUp by calling Replicator#waitForCaughtUp. To be more precise, we save the closure in a field Replicator#catchUpClosure and also we schedule timer on a replicator to call Replicator#onCatchUpTimedOut (by default it is called after election timeout).

Closure invocation

So, we saved closure, it is time to discuss what happens when the closure is run. It happens in Replicator#notifyOnCaughtUp, we call this method with a status of success or failure of a catching up process and propagate it to the closure by setting Status.setError(int, java.lang.String, java.lang.Object...). When the closure is run, NodeImpl#onCaughtUp is called, this method checks the status of a process and here we have several outcomes:

Status is OK and NodeImpl.ConfigurationCtx#onCaughtUp with a success flag equals true is called, so we can move
to NodeImpl.ConfigurationCtx.nextStage in a configuration changing process.
Status is Error and more specific, it is timeout, so we retry catching up process by creating a new NodeImpl.OnCaughtUp closure
calling the same Replicator#waitForCaughtUp that we described before.
If retrying went wrong or Status is Error and is not a timeout, we call NodeImpl.ConfigurationCtx#onCaughtUp with a success flag
equals false, so the whole process of a configuration changing is reset with RaftError.ECATCHUP.

Where the closure is invoked

Now let's discuss when this closure is run. As we said before, it happens in Replicator#notifyOnCaughtUp, so lets track who call Replicator#Replicator#notifyOnCaughtUp

Calls with successful statuses:

Replicator#onInstallSnapshotReturned -- called after a successful installation of a snapshot
Replicator#onAppendEntriesReturned -- called after a successful appending of new entries from a leader

Calls with error statuses:

Replicator#onCatchUpTimedOut with RaftError.ETIMEDOUT, called when a timer event happens. As was described before, this timer is started
when we call Replicator#waitForCaughtUp.
Replicator#onHeartbeatReturned, Replicator#onAppendEntriesReturned, or Replicator#onTimeoutNowReturned with RaftError.EPERM,
called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and
decide, should we step down or not.
Replicator#onTimeoutNowReturned with RaftError.ESTOP, called when we passed to the method flag stopAfterFinish equals true. It happens
when a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes
(see waking up optimisation), so we call
Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs()) on a leader. For more details see
NodeImpl#stepDown
Replicator#onError with RaftError.ESTOP. This is a general case when some replicator was stopped. For example, it might happen when
a leader stepped down, or when a node was shutdown, etc. Let's consider all places where Replicator#onError with RaftError.ESTOP can happen, to
do that we need to trace Replicator#stop
- NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure) -- node shutdown case
- ReplicatorGroupImpl#stopAll -- happens when a leader steps down, including stopping all replicators. See NodeImpl#stepDown.
- ReplicatorGroupImpl#stopReplicator -- this happens when we call ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status),
  when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers.
- ReplicatorGroupImpl#stopAllAndFindTheNextCandidate -- called when a leader step down, in case we make
  waking up optimisation

Waking up optimisation

When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends TimeoutNowRequest to it. When the node receives that request, it elects itself and starts voting. Failed leader chose such node by searching for the node with the largest log id among peers in the current configuration. For more details see ReplicatorGroupImpl#stopAllAndFindTheNextCandidate

The text was updated successfully, but these errors were encountered:

killme2008 · 2022-06-22T02:32:21Z

Very good, thank you for your sharing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some documentation of catch up proccces #852

Some documentation of catch up proccces #852

alievmirza commented Jun 21, 2022 •

edited

Loading

killme2008 commented Jun 22, 2022

Some documentation of catch up proccces #852

Some documentation of catch up proccces #852

Comments

alievmirza commented Jun 21, 2022 • edited Loading

Catching up process

Entry point:

Closure invocation

Where the closure is invoked

Waking up optimisation

killme2008 commented Jun 22, 2022

alievmirza commented Jun 21, 2022 •

edited

Loading