You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Crashing nodes accumulate in cluster state as dead nodes for a long while. As the state grows it takes longer and more memory to synchronize across the cluster. Particularly when a new node is added (or restarts). We have seen initial sync times upwards of 20+ seconds when the cluster is under stress And the cluster state being in the 5-10MB range. In many cases, this will cause kubernetes to kill the pod causing another restart. Which adds another dead + new node to the state.
This constant start up loop also puts a lot of pressure on the control plane and it can become slow to respond - At which point the cluster becomes mostly unresponsive.
We have seen this problem in situations when the database server (postgres) is unavailable or unresponsive for a period of time. It being down causes all things in the write path to crash repeatedly which leads to the growing state problem. The only course of action seems to be to stop everything / scale all components to 0 so there is no starting cluster state.
Steps to reproduce (if applicable)
Steps to reproduce the behavior:
Our ingestion setup is 3 metastore nodes, 1 postgres primary, 30 Indexer pods, and 1 control plane node. We typically are doing 500MB/s ingest across 3K indexes.
After running for some time, kill the postgres instance and let be down for 10 minutes to allow the cluster state accumulate a few hundred dead nodes. Restart postgres wait for ingestion to become healthy again.
Expected behavior
If it isn't really feasible to prune the dead node list or size of the state because of the details around scuttlebut, the cluster should be much more tolerable of the database being down. Crashing may not be the best course of action as it ultimately prevents the cluster from ever becoming healthy again.
Describe the bug
Crashing nodes accumulate in cluster state as dead nodes for a long while. As the state grows it takes longer and more memory to synchronize across the cluster. Particularly when a new node is added (or restarts). We have seen initial sync times upwards of 20+ seconds when the cluster is under stress And the cluster state being in the 5-10MB range. In many cases, this will cause kubernetes to kill the pod causing another restart. Which adds another dead + new node to the state.
This constant start up loop also puts a lot of pressure on the control plane and it can become slow to respond - At which point the cluster becomes mostly unresponsive.
We have seen this problem in situations when the database server (postgres) is unavailable or unresponsive for a period of time. It being down causes all things in the write path to crash repeatedly which leads to the growing state problem. The only course of action seems to be to stop everything / scale all components to 0 so there is no starting cluster state.
Steps to reproduce (if applicable)
Steps to reproduce the behavior:
Our ingestion setup is 3 metastore nodes, 1 postgres primary, 30 Indexer pods, and 1 control plane node. We typically are doing 500MB/s ingest across 3K indexes.
After running for some time, kill the postgres instance and let be down for 10 minutes to allow the cluster state accumulate a few hundred dead nodes. Restart postgres wait for ingestion to become healthy again.
Expected behavior
If it isn't really feasible to prune the dead node list or size of the state because of the details around scuttlebut, the cluster should be much more tolerable of the database being down. Crashing may not be the best course of action as it ultimately prevents the cluster from ever becoming healthy again.
Configuration:
The text was updated successfully, but these errors were encountered: