-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raft_voter_contact() #187
base: main
Are you sure you want to change the base?
Conversation
This returns the number of voting nodes that are recently in contact with the leader, to allow determining if the cluster is currently in a degraded / at risk state.
Hey, thanks, what is your use case for having |
We are using the voter contact information in periodic tick and in the state change callback. In the state change callback it is important for us to be able to distinguish between a leader being disconnected from the rest of the cluster, and a leader that is stepping down to the leadership being manually transferred with raft_transfer. In the first case, we absolutely want to generate some monitoring event that indicates a problem, but in the second case we do not. I found that using the voting count number was the most straightforward way of determining the difference between these two cases, but it then has to be available after the role has changed from being the leader. |
I see, thanks for the detailed explanation. It's an interesting use case, however I'm a bit hesitant to expose the contact details in this way. I'd probably prefer a more generic way that includes other information about followers as well, such as how much they are up-to-date. It's something I'd like to do for the v1 version of the API, and it should meet this use case too. This is a long term plan though. In the meantime, would it be possible for you to perform this kind of bookkeeping from the user side? For example you could keep track that a raft_transfer request is in progress, and use that information to distinguish between the two cases that you describe. Or is there something that would make that not feasible? |
I should think we can do it. I'm not entirely what bits are going to be removed under This still isn't a perfect solution because if we lose network connectivity between the transfer call being made and actually sending that call, then we would incorrectly assume everything was ok. That should be a pretty unlikely event though. |
Not sure to understand your question here. All the bits under The v1 design is not based on callbacks, but rather on a "pull" model, where consumers of the library are in control of the logic flow (they call instead of being called back). The struct raft raft;
struct raft_event event;
struct raft_update update;
event.type = RAFT_TRANSFER;
event.transfer.server_id = 123;
raft_step(&raft, &event, &update); This initiates the transfer, which can be then monitored by clients using the
I can't quite understand what you mean. If Once the callback files, you are guaranteed that the transfer request has completed (either successfully or not). You can check if the request was successful by calling If connectivity is lost, you'll know that there was a problem because Hope I'm not missing something. |
Thanks for the description. I don't think you've missed anything, I was thinking about it in a slightly wrong way. Where does that leave this PR? Are there any modifications that would make it acceptable for you, or would you prefer to work with your long term plan only? |
Thanks. If the suggested solution is a viable option for you, I'd prefer to park this PR for now, mainly to avoid introducing new public APIs that would likely need to be deprecated later down the road. |
This returns the number of voting nodes that are recently in contact with the leader, to allow determining if the cluster is currently in a degraded / at risk state.
This is a port of a change I submitted to the canonical/raft repo, but tweaked to make it a bit more useful - the count is available not just in the leader part of the union.
I don't know what you'd like to do about struct abi compatibility, please let me know if there's anything to change around that or anything else.