Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in chitchat gossiping priority. #120

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

fulmicoton
Copy link
Contributor

Following the original paper, chitchat currently first shares nodes with the highest number of stale values.

As a side effect, nodes that are not emitting many KVs are gossiped last. In quickwit, under a little bit of load (1000 indexes on 10 indexer), it has a very dire effect.

Indexer that reconnect have to gossip the entire cluster state (~10MB) before being able to get any information about the metastore.
Knowing at least one node with the metastore service is required for nodes to declare themselves as live.

This PR changes the gossip order.
It prioritizes nodes for which the node that originated the Syn message has not receives any KV yet.

This is identified by the fact that either the node was not part of the digest at all, or the floor_version is equal to 0.

The latter (floor_version == 0) should not happen today, but this case is done in preparation for another PR updating heartbeat on Syn.

chitchat/src/state.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Show resolved Hide resolved
chitchat/src/state.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Outdated Show resolved Hide resolved
@fulmicoton fulmicoton force-pushed the issue/119-gossip-priority-order branch 2 times, most recently from 9f4081a to ca80b6c Compare February 23, 2024 03:07
fulmicoton and others added 3 commits February 23, 2024 12:20
Following the original paper, chitchat currently first shares nodes with the highest number of stale values.

As a side effect, nodes that are not emitting many KVs are gossiped last.
In quickwit, under a little bit of load (1000 indexes on 10 indexer), it has a very dire effect.

Indexer that reconnect have to gossip the entire cluster state (~10MB) before being able to get any
information about the metastore.
Knowing at least one node with the metastore service is required for nodes to declare themselves as live.

This PR changes the gossip order.
It prioritizes nodes for which the node that originated the Syn message
has not receives any KV yet.

This is identified by the fact that either the node was not part of the
digest at all, or the floor_version is equal to 0.

The latter (floor_version == 0) should not happen today, but this case
is done in preparation for another PR updating heartbeat on Syn.
Co-authored-by: Adrien "Code Monkey" Guillo <[email protected]>
Co-authored-by: François Massot <[email protected]>
@fulmicoton fulmicoton force-pushed the issue/119-gossip-priority-order branch from ca80b6c to fa82ed1 Compare February 23, 2024 03:20
@fulmicoton fulmicoton force-pushed the issue/119-gossip-priority-order branch from fa82ed1 to 728e715 Compare February 23, 2024 03:29
@fulmicoton fulmicoton merged commit 0ec7a75 into main Feb 23, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants