Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcloud application-default login required on GCE vm or terraform cluster create hangs #41

Open
joshpadilla opened this issue Feb 5, 2021 · 8 comments

Comments

@joshpadilla
Copy link
Collaborator

null_resource.deploy_anthos_cluster (remote-exec): Creating Anthos Cluster. This will take about 20 minutes...
null_resource.deploy_anthos_cluster (remote-exec): Cluster Created!
null_resource.deploy_anthos_cluster: Creation complete after 5s [id=2553823925035852511]
null_resource.download_kube_config: Creating...
null_resource.download_kube_config: Provisioning with 'local-exec'...
null_resource.download_kube_config (local-exec): Executing: ["/bin/sh" "-c" "scp -i ~/.ssh/anthos-abm-blue-p17o3 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  [email protected]:/root/baremetal/bmctl-workspace/abm-blue-p17o3/abm-blue-p17o3-kubeconfig ."]
null_resource.download_kube_config (local-exec): Warning: Permanently added '139.178.86.49' (ECDSA) to the list of known hosts.
null_resource.download_kube_config (local-exec): scp: /root/baremetal/bmctl-workspace/abm-blue-p17o3/abm-blue-p17o3-kubeconfig: No such file or directory

null_resource.kube_vip_install_first_cp: Still creating... [10s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [20s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
@joshpadilla
Copy link
Collaborator Author

https://github.com/equinix/terraform-metal-anthos-on-baremetal/blob/cbcc4542acce2240d57b4de10cd5498acf60d768/main.tf#L237

When I login to host:

root@abm-blue-p17o3-cp-01:~# ll /root/baremetal/bmctl-workspace/abm-blue-p17o3/
total 24
drwxr-xr-x 3 root root 4096 Feb  5 18:49 ./
drwxr-xr-x 3 root root 4096 Feb  5 18:49 ../
-rw-r--r-- 1 root root 9683 Feb  5 18:49 abm-blue-p17o3.yaml
drwxr-xr-x 3 root root 4096 Feb  5 18:49 log/

There's no kubeconfig in that dir just abm-blue-p17o3.yaml

@joshpadilla
Copy link
Collaborator Author

Changing to

command = "scp -i ~/.ssh/${local.ssh_key_name} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  root@${metal_device.control_plane.0.access_public_ipv4}:/root/baremetal/bmctl-workspace/${local.cluster_name}/${local.cluster_name}.yaml ."

Got rid of the error, but did not solve hanging cluster creation process:

null_resource.kube_vip_install_first_cp: Still creating... [32m10s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m20s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m30s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m40s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...

@joshpadilla
Copy link
Collaborator Author

kubeconfig is never being created. It should not try and download it until after the cluster creation succeeds.
Checking cluster creation log file.
/root/baremetal/cluster_create.log

cluster_create.log has a single line error about GKE hub and gcloud auth login. But the gce vm I’m using has gcloud auth already, still looking at that

@joshpadilla
Copy link
Collaborator Author

gcloud auth application-default login

You are running on a Google Compute Engine virtual machine.
The service credentials associated with this virtual machine
will automatically be used by Application Default
Credentials, so it is not necessary to use this command.

If you decide to proceed anyway, your user credentials may be visible
to others with access to this virtual machine. Are you sure you want
to authenticate with your personal account?

Do you want to continue (Y/n)?
Go to the following link in your browser:
Enter verification code:

Credentials saved to file: [~/.config/gcloud/application_default_credentials.json]

@joshpadilla
Copy link
Collaborator Author

Looks like this a requirement, so if you don't have a the file, ~/.config/gcloud/application_default_credentials.json, then the terraform will hang without error.

@displague
Copy link
Member

#28 (comment)

The README is overdue for some updates.

@displague
Copy link
Member

Actually, we do have some text supporting this:

https://github.com/equinix/terraform-metal-anthos-on-baremetal#install-gcloud

@displague
Copy link
Member

Do you ideas on how we can improve this, @joshpadilla ?

@joshpadilla joshpadilla changed the title Anthos cluster not created (kubeconfig vs yaml) gcloud application-default login required on GCE vm or terraform cluster create hangs Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants