Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

1.5.1: Cluster never forming #161

Open
bitsofinfo opened this issue Aug 2, 2019 · 3 comments
Open

1.5.1: Cluster never forming #161

bitsofinfo opened this issue Aug 2, 2019 · 3 comments

Comments

@bitsofinfo
Copy link

bitsofinfo commented Aug 2, 2019

Running 1.5.1

My cluster is never forming, no node is joining itself and this process is just going on forever

All nodes can see each other as reported by LowestAddressJoinDecider. All the logged "discovered" IPs/ports are correct. The plugin appears to be talking to the k8s master fine AND getting back the correct list of pods/ips given my configured selectors.

From inside each pod/container I can manually curl each listed node @ 8552/bootstrap/seed-nodes and get back the following:

{"seedNodes":[],"selfNode":"akka.tcp://[email protected]:2552"}

This just goes on and on and on, and not one of these 4 nodes ever joins itself, which they should from what I understand. It would seem at least one of the nodes would join itself, no?

I also notice this:
Exceeded stable margins but missing seed node information from some contact points

Which I assume implies that a few of the nodes cannot be contacted (there are 6 nodes total, 2 of which don't expose 8552 intentionally), yet required-contact-point-nr: 1...

My cluster config is like this: missing something?

"	akka.management {\n" +
			      "  	  cluster.bootstrap {\n" +
				  "	    		contact-point-discovery {\n" +
				  "	      		discovery-method = kubernetes-api\n" +
				  "             required-contact-point-nr: 1\n"+
			      "				resolve-timeout = 10 seconds\n"+	
			      "				interval = 2 second\n"+	
			      "             stable-margin = 5 seconds\n"+
			      "				exponential-backoff-random-factor = 0.5\n"+
			      "			}\n"+	
			      " 			contact-point {\n"+
			      "				probing-failure-timeout: 5 seconds\n"+
			      "				probe-interval = 2 second\n"+
			      "				probe-interval-jitter = 0.5\n"+		
			      " 			}\n"+
				  "  	 }\n" +
				  "	}\n" +
				  
				  " akka.discovery {\n" +
				  "	  	kubernetes-api {\n" +
				  "         class = akka.discovery.kubernetes.KubernetesApiServiceDiscovery\n" +
				  "			pod-namespace = \""+k8Namespace+"\"\n" +
				  "			pod-label-selector = \""+k8PodSelector+"\"\n" +
				  "		}\n" +
				  "  }\n";

@bitsofinfo
Copy link
Author

bitsofinfo commented Aug 2, 2019

So this seems to be a bug to me:

  1. I have 6 pods, all have the same pod labels referenced by hazelcast-kubernetes for its pod-label-selector, but only 4 of those pods run akka (2552/8552)

  2. I have required-contact-point-nr: 1

  3. All 6 node's IPs are properly discovered by kubernetes-api via the k8s master

  4. 4 of the node nodes respond fine to http://podip:8552/bootstrap/seed-nodes and return no seed nodes. 2 of the nodes are connection refused.

  5. Yet despite required-contact-point-nr: 1, no nodes join themselves, it just goes on and on forever.

Seems to me that even if the discovery mechanism is aware of 6 potential seed node endpoints and some subset of that number is unreachable, yet a majority of them are reachable AND required-contact-point-nr: [< number of reachable nodes] ....... one of the nodes should still join itself so things can move forward.

@leszko leszko added this to the 2.1 milestone Jan 9, 2020
@leszko
Copy link

leszko commented Feb 4, 2020

What is this akka.discovery.kubernetes.KubernetesApiServiceDiscovery class? Could you write the steps to reproduce without any frameworks? That would make the issue simpler to analyze.

@bitsofinfo
Copy link
Author

@leszko leszko removed this from the 2.1 milestone Nov 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants