You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've investigated catchup procces in JRaft, and made some documentation about it and decided to share it with the community. Hope this will be helpful.
Note, that this is actual for 1.3.5, but I haven't checked, if there were some changes in the algorithm.
Catching up process
Entry point:
We have a closure named NodeImpl.OnCaughtUp, which is responsible for the catching up process for every stale node/replicator on a leader.
This closure is created every time we call NodeImpl.ConfigurationCtx#addNewPeers which happens on a raft configuration change, for example, when we call NodeImpl#changePeers. In NodeImpl.ConfigurationCtx#addNewPeers method we assign OnCaughtUp closure with a corresponding replicator for a stale node. This is done inside ReplicatorGroupImpl#waitCaughtUp by calling Replicator#waitForCaughtUp. To be more precise, we save the closure in a field Replicator#catchUpClosure and also we schedule timer on a replicator to call Replicator#onCatchUpTimedOut (by default it is called after election timeout).
Closure invocation
So, we saved closure, it is time to discuss what happens when the closure is run. It happens in Replicator#notifyOnCaughtUp, we call this method with a status of success or failure of a catching up process and propagate it to the closure by setting Status.setError(int, java.lang.String, java.lang.Object...). When the closure is run, NodeImpl#onCaughtUp is called, this method checks the status of a process and here we have several outcomes:
Status is OK and NodeImpl.ConfigurationCtx#onCaughtUp with a success flag equals true is called, so we can move
to NodeImpl.ConfigurationCtx.nextStage in a configuration changing process.
Status is Error and more specific, it is timeout, so we retry catching up process by creating a new NodeImpl.OnCaughtUp closure
calling the same Replicator#waitForCaughtUp that we described before.
If retrying went wrong or Status is Error and is not a timeout, we call NodeImpl.ConfigurationCtx#onCaughtUp with a success flag
equals false, so the whole process of a configuration changing is reset with RaftError.ECATCHUP.
Where the closure is invoked
Now let's discuss when this closure is run. As we said before, it happens in Replicator#notifyOnCaughtUp, so lets track who call Replicator#Replicator#notifyOnCaughtUp
Calls with successful statuses:
Replicator#onInstallSnapshotReturned -- called after a successful installation of a snapshot
Replicator#onAppendEntriesReturned -- called after a successful appending of new entries from a leader
Calls with error statuses:
Replicator#onCatchUpTimedOut with RaftError.ETIMEDOUT, called when a timer event happens. As was described before, this timer is started
when we call Replicator#waitForCaughtUp.
Replicator#onHeartbeatReturned, Replicator#onAppendEntriesReturned, or Replicator#onTimeoutNowReturned with RaftError.EPERM,
called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and
decide, should we step down or not.
Replicator#onTimeoutNowReturned with RaftError.ESTOP, called when we passed to the method flag stopAfterFinish equals true. It happens
when a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes
(see waking up optimisation), so we call Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs()) on a leader. For more details see NodeImpl#stepDown
Replicator#onError with RaftError.ESTOP. This is a general case when some replicator was stopped. For example, it might happen when
a leader stepped down, or when a node was shutdown, etc. Let's consider all places where Replicator#onError with RaftError.ESTOP can happen, to
do that we need to trace Replicator#stop
NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure) -- node shutdown case
ReplicatorGroupImpl#stopAll -- happens when a leader steps down, including stopping all replicators. See NodeImpl#stepDown.
ReplicatorGroupImpl#stopReplicator -- this happens when we call ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status),
when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers.
ReplicatorGroupImpl#stopAllAndFindTheNextCandidate -- called when a leader step down, in case we make waking up optimisation
Waking up optimisation
When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends TimeoutNowRequest to it. When the node receives that request, it elects itself and starts voting. Failed leader chose such node by searching for the node with the largest log id among peers in the current configuration. For more details see ReplicatorGroupImpl#stopAllAndFindTheNextCandidate
The text was updated successfully, but these errors were encountered:
I've investigated catchup procces in JRaft, and made some documentation about it and decided to share it with the community. Hope this will be helpful.
Note, that this is actual for 1.3.5, but I haven't checked, if there were some changes in the algorithm.
Catching up process
Entry point:
We have a closure named
NodeImpl.OnCaughtUp
, which is responsible for the catching up process for every stale node/replicator on a leader.This closure is created every time we call
NodeImpl.ConfigurationCtx#addNewPeers
which happens on a raft configuration change, for example, when we callNodeImpl#changePeers
. InNodeImpl.ConfigurationCtx#addNewPeers
method we assignOnCaughtUp
closure with a corresponding replicator for a stale node. This is done insideReplicatorGroupImpl#waitCaughtUp
by callingReplicator#waitForCaughtUp
. To be more precise, we save the closure in a fieldReplicator#catchUpClosure
and also we schedule timer on a replicator to callReplicator#onCatchUpTimedOut
(by default it is called after election timeout).Closure invocation
So, we saved closure, it is time to discuss what happens when the closure is run. It happens in
Replicator#notifyOnCaughtUp
, we call this method with a status of success or failure of a catching up process and propagate it to the closure by settingStatus.setError(int, java.lang.String, java.lang.Object...)
. When the closure is run,NodeImpl#onCaughtUp
is called, this method checks the status of a process and here we have several outcomes:Status
isOK
andNodeImpl.ConfigurationCtx#onCaughtUp
with a success flag equals true is called, so we can moveto
NodeImpl.ConfigurationCtx.nextStage
in a configuration changing process.Status
isError
and more specific, it is timeout, so we retry catching up process by creating a newNodeImpl.OnCaughtUp
closurecalling the same
Replicator#waitForCaughtUp
that we described before.Status
isError
and is not a timeout, we callNodeImpl.ConfigurationCtx#onCaughtUp
with a success flagequals false, so the whole process of a configuration changing is reset with
RaftError.ECATCHUP
.Where the closure is invoked
Now let's discuss when this closure is run. As we said before, it happens in
Replicator#notifyOnCaughtUp
, so lets track who callReplicator#Replicator#notifyOnCaughtUp
Calls with successful statuses:
Replicator#onInstallSnapshotReturned
-- called after a successful installation of a snapshotReplicator#onAppendEntriesReturned
-- called after a successful appending of new entries from a leaderCalls with error statuses:
Replicator#onCatchUpTimedOut
withRaftError.ETIMEDOUT
, called when a timer event happens. As was described before, this timer is startedwhen we call
Replicator#waitForCaughtUp
.Replicator#onHeartbeatReturned
,Replicator#onAppendEntriesReturned
, orReplicator#onTimeoutNowReturned
withRaftError.EPERM
,called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and
decide, should we step down or not.
Replicator#onTimeoutNowReturned
withRaftError.ESTOP
, called when we passed to the method flagstopAfterFinish
equals true. It happenswhen a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes
(see waking up optimisation), so we call
Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs())
on a leader. For more details seeNodeImpl#stepDown
Replicator#onError
withRaftError.ESTOP
. This is a general case when some replicator was stopped. For example, it might happen whena leader stepped down, or when a node was shutdown, etc. Let's consider all places where
Replicator#onError
withRaftError.ESTOP
can happen, todo that we need to trace
Replicator#stop
NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure)
-- node shutdown caseReplicatorGroupImpl#stopAll
-- happens when a leader steps down, including stopping all replicators. SeeNodeImpl#stepDown
.ReplicatorGroupImpl#stopReplicator
-- this happens when we callConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status)
,when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers.
ReplicatorGroupImpl#stopAllAndFindTheNextCandidate
-- called when a leader step down, in case we makewaking up optimisation
Waking up optimisation
When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends
TimeoutNowRequest
to it. When the node receives that request, it elects itself and starts voting. Failed leader chose such node by searching for the node with the largest log id among peers in the current configuration. For more details seeReplicatorGroupImpl#stopAllAndFindTheNextCandidate
The text was updated successfully, but these errors were encountered: