-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiDB operator cannot scale up the tiflash after scaling down to 0 #5834
Comments
can you show the log of TiFlash? and what's the TidbCluster CR |
Here is the TiFlash log, the key error message is ["failed to start node: StoreTombstone(\"store is tombstone\")"]: TiFlash log
Here is the dump of the TiDBCluster CR Status: TiDBCluster CR Status
|
it seems we need to use a new PV (delete the PVC/PV after scaled in to 0) without the data for the previous Store |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug Report
What version of Kubernetes are you using?
Client Version: v1.31.1
Kustomize Version: v5.4.2
What version of TiDB Operator are you using?
v1.6.0
What's the status of the TiDB cluster pods?
TiFlash pods are in
CrashBackOffLoop
State.What did you do?
We scaled in tiflash from 3 to 0 and then scaled it out from 0 to 3.
How to reproduce
spec.tiflash.replicas
from 3 to 0:spec.tiflash.replicas
back to 3.What did you expect to see?
We expected that TiFlash pods are running and be in
Healthy
stateWhat did you see instead?
The Tiflash pods kept crashing and be in
CrashBackOffLoop
state.Root Cause
We think the root cause of this problem is that when scaling in the TiFlash, the stores will be in
Tombstone
state. After we change thespec.tiflash.replicas
from 0 to 3, the operator will delete the original statefulset and create a new one withreplicas
set to 3 instead of changing the original statefulset. This behaviour bypasses theScaleOut
function at this line https://github.com/pingcap/tidb-operator/blob/master/pkg/manager/member/tiflash_scaler.go#L52.After encounter this issue, the user cannot simply delete the CR and apply it again to make the TiFlash run correctly as the operator will not delete pvcs after user deletes CR causing the new cluster reusing the stores that in
Tombstone
state.The text was updated successfully, but these errors were encountered: