-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OKD 4.5 bootstrap fails after API available #294
Comments
I would like to add that I got the warning:
Every time after the first install atempt in each computer. This last time that I reported above, I got the warning as well. Just pasted without the warning. My setup is 1 master and 0 workers just like #238 Thanks a lot! |
I'd like to close this issue. Found a misconfiguration with my setup. |
Sure, closing it. Could you add more details so that it would be helpful to other hitting the same symptoms? |
Of course @vrutkovs. I was missing the Machine Config Server part of the load balance configuration. Because of that, the 22623 port was not available and the master node wasn't even booting completely. It was stuck waiting for the machine config server (reporting connection refused). When the LB config is OK, but the Machine config server is not up yet in the bootstrap node, the master machine reports EOF, after each attempt to reach the machine config server in port 22623. Adding to that, my DHCPd config was leasing the IPs for 600s by default. After that time, the master machine was loosing connectivity (it started to report network is unreachable). So I had to modify that to a value compatible to the amount of time necessary for the bootstrap node to provide the machine config server. I'm following the Craig Robinson's tutorial After spotting this haproxy misconfig, I'm trying again. So far the master server is booted and hopefully getting up to join the cluster. I saw a discussion about the necessity to create the manifests twice in #238. And as I reported above, I think I was stumbling on something similar. Not sure if I'd rather report that in #238, as well. The problem is that I was reusing the same root directory for the openshift-install comand --dir option. I was taking care to remove the cluster files, including the hidden ones. Even taking all this care, I was experiencing trouble. After hints from other OKD users I recreated my virtual machines and created a new directory (.../okd/root1 instead of .../okd/root) to store the cluster config files. Even after that, the system was reporting the strange:
Before a new root directory for cluster configs, my bootstraping was reporting wrong signing authority when attempting to access the API 6443 port. @cgruver suggested testing scenarios for @pjbrzozowski. Perhaps another valid test if not yet performed, is the scenario with 1 master, 0 workers but using a new directory with different path from the one previously used. Maybe the warning will show itself, but the cluster will successfully bring an accessible kubernetes API up. |
Hi @ldcunha76 it's seems I got the same problem. can you post correct haproxy config? |
Describe the Issue.
Cluster bootstraping fails waiting for the completion of the process.
openshift-install --dir=./ wait-for bootstrap-complete --log-level=debug DEBUG OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga DEBUG Built from commit 290e3b1de6096ecef2133fb071ff3a71c9c78594 INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.okd.local:6443... INFO API v1.18.3 up INFO Waiting up to 40m0s for bootstrapping to complete... INFO Use the following commands to gather logs from the cluster INFO openshift-install gather bootstrap --help FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition
log-bundle-20200810143626.tar.gz
After first failure, tried to run
openshift-install --dir=./ wait-for bootstrap-complete --log-level=debug
and got
DEBUG OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga DEBUG Built from commit 290e3b1de6096ecef2133fb071ff3a71c9c78594 INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.okd.local:6443... INFO API v1.18.3 up INFO Waiting up to 40m0s for bootstrapping to complete... W0810 16:05:56.350181 6990 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received E0810 16:06:00.359174 6990 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.lab.okd.local:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF ... E0810 16:06:16.400338 6990 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.lab.okd.local:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF INFO Use the following commands to gather logs from the cluster INFO openshift-install gather bootstrap --help FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition
log-bundle-20200810170027.tar.gz
Version
OpenShift Installer 4.5.0-0.okd-2020-07-14-153706-ga
FCOS 32.20200629.3.0
How reproducible
Both times I tried on different computers the process hanged after API coming up. A lot of times I fought against issues related to non empty install dir. I got the strange warning too:
openshift-install create manifests --dir=./ INFO Consuming Install Config from target directory WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
But I dont have to run create ignition twice:
openshift-install create ignition-configs --dir=./ INFO Consuming Common Manifests from target directory INFO Consuming Master Machines from target directory INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Openshift Manifests from target directory INFO Consuming Worker Machines from target directory
I'm trying to deploy a cluster with no worker nodes. Not sure how do I have to mark master as a worker too, since following the bare-metal install steps, it seems the master nodes can receive work loads by default.
install-config.yaml.txt
Thanks in advance for any help provided!
The text was updated successfully, but these errors were encountered: