OKD 4.5 bootstrap fails after API available #294

ldcunha76 · 2020-08-10T21:50:29Z

Describe the Issue.

Cluster bootstraping fails waiting for the completion of the process.

openshift-install --dir=./ wait-for bootstrap-complete --log-level=debug DEBUG OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga DEBUG Built from commit 290e3b1de6096ecef2133fb071ff3a71c9c78594 INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.okd.local:6443... INFO API v1.18.3 up INFO Waiting up to 40m0s for bootstrapping to complete... INFO Use the following commands to gather logs from the cluster INFO openshift-install gather bootstrap --help FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition
log-bundle-20200810143626.tar.gz

After first failure, tried to run
openshift-install --dir=./ wait-for bootstrap-complete --log-level=debug
and got

DEBUG OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga DEBUG Built from commit 290e3b1de6096ecef2133fb071ff3a71c9c78594 INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.okd.local:6443... INFO API v1.18.3 up INFO Waiting up to 40m0s for bootstrapping to complete... W0810 16:05:56.350181 6990 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received E0810 16:06:00.359174 6990 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.lab.okd.local:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF ... E0810 16:06:16.400338 6990 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.lab.okd.local:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF INFO Use the following commands to gather logs from the cluster INFO openshift-install gather bootstrap --help FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition
log-bundle-20200810170027.tar.gz

Version
OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga
FCOS 32.20200629.3.0

How reproducible
Both times I tried on different computers the process hanged after API coming up. A lot of times I fought against issues related to non empty install dir. I got the strange warning too:
openshift-install create manifests --dir=./ INFO Consuming Install Config from target directory WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
But I dont have to run create ignition twice:

openshift-install create ignition-configs --dir=./ INFO Consuming Common Manifests from target directory INFO Consuming Master Machines from target directory INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Openshift Manifests from target directory INFO Consuming Worker Machines from target directory
I'm trying to deploy a cluster with no worker nodes. Not sure how do I have to mark master as a worker too, since following the bare-metal install steps, it seems the master nodes can receive work loads by default.

install-config.yaml.txt

Thanks in advance for any help provided!

The text was updated successfully, but these errors were encountered:

ldcunha76 · 2020-08-10T23:15:02Z

I would like to add that I got the warning:

WARNING Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated

Every time after the first install atempt in each computer.

This last time that I reported above, I got the warning as well. Just pasted without the warning.

My setup is 1 master and 0 workers just like #238

Thanks a lot!

ldcunha76 · 2020-08-11T14:02:42Z

I'd like to close this issue. Found a misconfiguration with my setup.

vrutkovs · 2020-08-11T14:17:29Z

Sure, closing it. Could you add more details so that it would be helpful to other hitting the same symptoms?

ldcunha76 · 2020-08-11T16:50:28Z

Of course @vrutkovs.

I was missing the Machine Config Server part of the load balance configuration. Because of that, the 22623 port was not available and the master node wasn't even booting completely. It was stuck waiting for the machine config server (reporting connection refused). When the LB config is OK, but the Machine config server is not up yet in the bootstrap node, the master machine reports EOF, after each attempt to reach the machine config server in port 22623.

Adding to that, my DHCPd config was leasing the IPs for 600s by default. After that time, the master machine was loosing connectivity (it started to report network is unreachable). So I had to modify that to a value compatible to the amount of time necessary for the bootstrap node to provide the machine config server.

I'm following the Craig Robinson's tutorial

After spotting this haproxy misconfig, I'm trying again. So far the master server is booted and hopefully getting up to join the cluster.

I saw a discussion about the necessity to create the manifests twice in #238. And as I reported above, I think I was stumbling on something similar. Not sure if I'd rather report that in #238, as well.

The problem is that I was reusing the same root directory for the openshift-install comand --dir option. I was taking care to remove the cluster files, including the hidden ones. Even taking all this care, I was experiencing trouble.

After hints from other OKD users I recreated my virtual machines and created a new directory (.../okd/root1 instead of .../okd/root) to store the cluster config files. Even after that, the system was reporting the strange:

WARNING Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated

Before a new root directory for cluster configs, my bootstraping was reporting wrong signing authority when attempting to access the API 6443 port.

@cgruver suggested testing scenarios for @pjbrzozowski. Perhaps another valid test if not yet performed, is the scenario with 1 master, 0 workers but using a new directory with different path from the one previously used. Maybe the warning will show itself, but the cluster will successfully bring an accessible kubernetes API up.

dmaf87 · 2020-10-14T05:57:32Z

I'd like to close this issue. Found a misconfiguration with my setup.

Hi @ldcunha76 it's seems I got the same problem. can you post correct haproxy config?
Thanks.

vrutkovs closed this as completed Aug 11, 2020

ldcunha76 changed the title ~~OKD 4.5 bootstrap after API available~~ OKD 4.5 bootstrap fails after API available Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OKD 4.5 bootstrap fails after API available #294

OKD 4.5 bootstrap fails after API available #294

ldcunha76 commented Aug 10, 2020

ldcunha76 commented Aug 10, 2020

ldcunha76 commented Aug 11, 2020

vrutkovs commented Aug 11, 2020

ldcunha76 commented Aug 11, 2020 •

edited

Loading

dmaf87 commented Oct 14, 2020

OKD 4.5 bootstrap fails after API available #294

OKD 4.5 bootstrap fails after API available #294

Comments

ldcunha76 commented Aug 10, 2020

ldcunha76 commented Aug 10, 2020

ldcunha76 commented Aug 11, 2020

vrutkovs commented Aug 11, 2020

ldcunha76 commented Aug 11, 2020 • edited Loading

dmaf87 commented Oct 14, 2020

ldcunha76 commented Aug 11, 2020 •

edited

Loading