Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valkey-cluster Readiness probe failed: cluster_state:fail - nodes don't join the cluster #28745

Closed
arpan57 opened this issue Aug 7, 2024 · 4 comments
Assignees
Labels
solved stale 15 days without activity tech-issues The user has a technical issue about an application valkey-cluster

Comments

@arpan57
Copy link

arpan57 commented Aug 7, 2024

Name and Version

bitnami/valkey-cluster

What architecture are you using?

None

What steps will reproduce the bug?

  1. On Macbook pro apple silicon, post setting up helm repo - I am trying to run the valkey chart - (valkey-cluster-0.1.8 ) on minikube following the Readme
  2. Command used to install the helmchart - helm install my-release oci://registry-1.docker.io/bitnamicharts/valkey-cluster
  3. It spawned 6 pods .

The pods look like this

k get pods
NAME                          READY   STATUS    RESTARTS      AGE
my-release-valkey-cluster-0   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-1   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-2   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-3   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-4   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-5   0/1     Running   1 (21h ago)   21h

Pod description/events look like following:

❯ k describe pod my-release-valkey-cluster-0
Name:             my-release-valkey-cluster-0

....
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  2m51s (x3257 over 21h)  kubelet  Readiness probe failed: cluster_state:fail


When I tried to connect it using the valkey-cli I notice that it shows only one node (itself) as the part of the cluster.

❯ kubectl exec -it my-release-valkey-cluster-1 -- valkey-cli
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> CLUSTER nodes
c2104e2cd9da1efb779c1c1a82ee40c588fa6a0f 10.244.0.41:6379@16379 myself,master - 0 0 0 connected
127.0.0.1:6379>

The pod logs look like this :

`valkey-cluster 15:27:46.85 INFO  ==> ** Starting Valkey setup **
valkey-cluster 15:27:46.90 INFO  ==> Initializing Valkey
valkey-cluster 15:27:46.95 INFO  ==> Setting Valkey config file
valkey-cluster 15:27:47.15 INFO  ==> Changing old IP 10.244.0.40 by the new one 10.244.0.40
valkey-cluster 15:27:47.20 INFO  ==> Changing old IP 10.244.0.41 by the new one 10.244.0.41
valkey-cluster 15:27:47.30 INFO  ==> Changing old IP 10.244.0.39 by the new one 10.244.0.39
valkey-cluster 15:27:47.40 INFO  ==> Changing old IP 10.244.0.43 by the new one 10.244.0.43
valkey-cluster 15:27:47.45 INFO  ==> Changing old IP 10.244.0.42 by the new one 10.244.0.42
valkey-cluster 15:27:47.50 INFO  ==> Changing old IP 10.244.0.38 by the new one 10.244.0.38

valkey-cluster 15:27:47.50 INFO  ==> ** Valkey setup finished! **
1:C 06 Aug 2024 15:27:47.612 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
1:C 06 Aug 2024 15:27:47.654 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
1:C 06 Aug 2024 15:27:47.654 * Valkey version=7.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 Aug 2024 15:27:47.654 * Configuration loaded
1:M 06 Aug 2024 15:27:47.654 * monotonic clock: POSIX clock_gettime
 
1:M 06 Aug 2024 15:27:47.655 * Node configuration loaded, I'm d9827f7db0ee609373fa6b0d43bc525246c57021
1:M 06 Aug 2024 15:27:47.656 * Server initialized
1:M 06 Aug 2024 15:27:47.656 * Reading RDB base file on AOF loading...
1:M 06 Aug 2024 15:27:47.656 * Loading RDB produced by valkey version 7.2.6
1:M 06 Aug 2024 15:27:47.656 * RDB age 596 seconds
1:M 06 Aug 2024 15:27:47.656 * RDB memory usage when created 1.56 Mb
1:M 06 Aug 2024 15:27:47.656 * RDB is base AOF
1:M 06 Aug 2024 15:27:47.656 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:M 06 Aug 2024 15:27:47.656 * DB loaded from base file appendonly.aof.1.base.rdb: 0.000 seconds
1:M 06 Aug 2024 15:27:47.656 * DB loaded from append only file: 0.000 seconds
1:M 06 Aug 2024 15:27:47.656 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 06 Aug 2024 15:27:47.656 * Ready to accept connections tcp`

What am I missing? Any guidelines on debugging further?

Thanks.

Are you using any custom parameters or values?

No parameters used.
only going with helm install my-release oci://registry-1.docker.io/bitnamicharts/valkey-cluster

What is the expected behavior?

valkey-cluster should be up and pods should be running with ready state 0/1
Using valkey-cli we should be able to list all the nodes

What do you see instead?

k get pods
NAME                          READY   STATUS    RESTARTS      AGE
my-release-valkey-cluster-0   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-1   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-2   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-3   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-4   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-5   0/1     Running   1 (21h ago)   21h
❯ kubectl exec -it my-release-valkey-cluster-1 -- valkey-cli
127.0.0.1:6379> CLUSTER nodes
c2104e2cd9da1efb779c1c1a82ee40c588fa6a0f 10.244.0.41:6379@16379 myself,master - 0 0 0 connected

Additional information

No response

@arpan57 arpan57 added the tech-issues The user has a technical issue about an application label Aug 7, 2024
@github-actions github-actions bot added the triage Triage is needed label Aug 7, 2024
@github-actions github-actions bot removed the triage Triage is needed label Aug 7, 2024
@github-actions github-actions bot assigned andresbono and unassigned carrodher Aug 7, 2024
@andresbono
Copy link
Contributor

Not sure what makes your minikube cluster special... Our CI tests the charts on every release, so this default scenario is covered...

I also tested it on a kind cluster and it worked as expected. The cluster is formed.

helm install my-release oci://registry-1.docker.io/bitnamicharts/valkey-cluster --version 0.1.9

You are using Apple silicon, not sure if there is some sort of emulation active for the minikube VM that could interfere. Also, please make sure there is inter-pod communication:

kubectl exec -it my-release-valkey-cluster-0 -- valkey-cli -h <SOME_OTHER_POD_IP> ping

Copy link

github-actions bot commented Sep 5, 2024

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Sep 5, 2024
@arpan57
Copy link
Author

arpan57 commented Sep 5, 2024

Thanks. I think this issue came in only on one laptop. On the other it worked ok.

@atompie
Copy link

atompie commented Dec 14, 2024

Experienced the same issue. It happens when installed with helm chart, then deleted with helm uninstall and installed again in the same namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved stale 15 days without activity tech-issues The user has a technical issue about an application valkey-cluster
Projects
None yet
Development

No branches or pull requests

4 participants