Now it’s time to proceed with the installation of the actual cluster.
In order to do this, we’ll use kubespray’s
ansible playbooks, which involves two steps:
1. Telling Ansible where the playbooks will run (in Ansible terms this is called "creating the inventory")
2. Editing some options for the playbooks
If using terraform
to set up your infra, this should be automatically generated for you, in the
/ansible/<installation_name>/ansible_inventory
file.
If not, this can also be created using a python script (you might need to install python3
— brew install python3
),
by running the following set of commands, available in the
kubespray guide for building your own inventory.
Once you have your inventory ready, it’s time to let Ansible do the heavy lifting for you.
Please make sure to go through the following steps before actually running Ansible.
We had quite some trouble getting this to work, especially when combined with the use of bastion hosts.
The kubespray deployment example also includes bastion hosts. The idea is you SSH into these hosts and from there (and only from there) you’re allowed to ssh into the rest of your cluster.
For this reason, we have bundled an example of our ssh-bastion.conf
in /ansible/cytechmobile/ssh-bastion.conf
.
Please adapt accordingly.
Note
|
We use Debian Jessie for our bastion hosts and Ubuntu for our k8s nodes, so note the different usernames. |
Add the --flush-cache
option to your ansible run, after destroying and recreating EC2 nodes. Ansible uses some
caches that may be interfering with your setup.
Make sure to set the right value in your inventory’s group_vars/all.yml
file.
HINT: If you don’t and you’re deploying on Ubuntu 16.04, you’ll run into the
/usr/bin/python not found
error we explore below.
Getting Ansible to set up your Kubernetes cluster is as simple as:
cd ansible/<installation_name> (1)
ansible-playbook \
--inventory-file=<installation_name>/ansible_inventory \ # (2)
--become \ # (3)
--user=ubuntu \ # (4)
--flush-cache \
../kubespray/cluster.yml
-
By going into this directory, Ansible will pick up the
ansible.cfg
andssh-bastion.conf
giving you access to your hosts. -
The inventory file (i.e. details about your hosts) that you created in the previous section.
-
According to Ansible help this
run operations with become
- i.e. run withsu
-
The user to connect as to your kubernetes cluster hosts. I am emphasizing this because I was confused what I should use in the case that the bastion hosts have a different user than the kubernetes hosts.
Note
|
Leave the efk attribute to false , as we’ll deploy our
own E(F)(L)K
|
There is currently an
open issue about
setting up kubectl
locally by automating the process of pulling
down the keys from the first master node.
As this is not available yet, we combined some of the solutions there and came up with a couple of plays that handle that for you.
-
Please replace
<YOUR_DOMAIN_NAME>
inkubectl_setup.yml
, with yours. -
On your workstation, go to the
./ansible/<installation_name>
folder -
Ensure you have added the SSH private key of the EC2 key pair you created to your ssh agent.
-
Run the below command to add to the
~/.kube/config.one
file the details to connect to the specific client’s Kubernetes cluster. (Repeat for all the different installations you have performed so that you have ALL kubernetes clusters (of all your installations) in one file):export KUBECONFIG=~/.kube/config.one ansible-playbook \ --inventory-file=ansible_inventory \ --extra-vars=installation_name=<installation_name> \ --user=ubuntu \ --flush-cache \ ../kubectl_setup.yml
-
4. Then, make sure you switch to the right context:
kubectl config use-context <installation_name>
-
Some issues we came across during the process.
These were pretty misleading, cause we did have SSH access. Testing manual access via SSH was fine.
You might see errors like this:
fatal: [kubernetes-mcore-gluster1]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"10.139.92.61\". Make sure this host can be reached over ssh", "unreachable": true} fatal: [kubernetes-mcore-gluster0]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"10.139.91.152\". Make sure this host can be reached over ssh", "unreachable": true}
We even enabled the Ansible debug logs (by using -vvv
on the command line)
and we copy-pasted the ssh
command, which worked fine outside of Ansible.
A good way to verify you are having the same problem, is by utilizing the
Ansible ping
module (available out of the box).
`ansible gfs-cluster -m ping -i my_inventory/mcore_hosts -u ubuntu` kubernetes-mcore-gluster1 | FAILED! => { "changed": false, "failed": true, "module_stderr": "/bin/sh: 1: /usr/bin/python: not found\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 127 } kubernetes-mcore-gluster0 | FAILED! => { "changed": false, "failed": true, "module_stderr": "/bin/sh: 1: /usr/bin/python: not found\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 127 }
The issue therefore was not related to SSH at all! In fact the issue
was that /usr/bin/python
was not available on the hosts.
These hosts are started from the official Ubuntu 16.04 EC2 AMIs, where only
python3
exists is available out of the box.
There are 2 solutions:
-
Set the Ansible python interpreter to
python3
E.g. like so (in your inventory file)
[all:vars] ansible_python_interpreter=/usr/bin/python3
-
Have Ansible install python 2 for you before gathering facts. This is supported out of the box by the Kubespray repo, as long as you set the
bootstrap_os
variable to the correct version.
During some of the initial ansible runs, we got:
RUNNING HANDLER [kubernetes/master : Master | wait for kube-scheduler] ***********************************************************************************************************************************
Wednesday 06 September 2017 12:35:20 +0300 (0:00:00.073) 0:39:54.652 ***
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
...
fatal: [kubernetes-mcore-master2]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
FAILED - RETRYING: Master | wait for kube-scheduler (3 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (13 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (2 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (12 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (11 retries left).
fatal: [kubernetes-mcore-master1]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
FAILED - RETRYING: Master | wait for kube-scheduler (10 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (9 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (8 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (7 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (6 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (5 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (4 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (3 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (2 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
fatal: [kubernetes-mcore-master0]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
The problem turned out to be that the EC2 instances did not have enough
resources (we were trying out if t2.micro
would be enough in terms of memory
/ compute).
The solution was to upgrade to t2.small
.
Wow! You have your Kubernetes cluster set up!! Congrats!! Now, let’s look at a few Additional HA Considerations.