Instructions for deploying a GPU cluster with Kubernetes
- Control system to run the install process
- One or more servers on which to install Kubernetes
-
Install a supported operating system on all nodes.
Install a supported operating system on all servers via a 3rd-party solution (i.e. MAAS, Foreman) or utilize the provided OS install container.
-
Set up your provisioning machine.
This will install Ansible and other software on the provisioning machine which will be used to deploy all other software to the cluster. For more information on Ansible and why we use it, consult the Ansible Guide.
# Install software prerequisites and copy default configuration ./scripts/setup.sh
-
Create and edit the Ansible inventory.
Ansible uses an inventory which outlines the servers in your cluster. The setup script from the previous step will copy an example inventory configuration to the
config
directory.Edit the inventory:
# Edit inventory and add nodes to the "KUBERNETES" section # Note: Etcd requires an odd number of servers vi config/inventory # (optional) Modify `config/group_vars/*.yml` to set configuration parameters
Note that as part of the kubernetes deployment process, the default behavior is to also deploy the NVIDIA k8s-device-plugin for GPU support. The GPU Operator is an alternative deployment method, which will deploy the device plugin and leverage driver containers within kubernetes. To enable the GPU Operator in DeepOps...
vi config/group_vars/k8s-cluster.yml # set: deepops_gpu_operator_enabled: true
-
Verify the configuration
ansible all -m raw -a "hostname"
-
Install Kubernetes using Ansible and Kubespray.
# NOTE: If SSH requires a password, add: `-k` # NOTE: If sudo on remote machine requires a password, add: `-K` # NOTE: If SSH user is different than current user, add: `-u ubuntu` ansible-playbook -l k8s-cluster playbooks/k8s-cluster.yml
More information on Kubespray can be found in the official Getting Started Guide
-
Verify that the Kubernetes cluster is running.
# You may need to manually run: `sudo cp ./config/artifacts/kubectl /usr/local/bin` kubectl get nodes
Optionally, test a GPU job to ensure that your Kubernetes setup can tap into GPUs.
kubectl run gpu-test --rm -t -i --restart=Never --image=nvcr.io/nvidia/cuda:10.1-base-ubuntu18.04 --limits=nvidia.com/gpu=1 nvidia-smi
Optionally, verify all GPU nodes plug-ins in the Kubernetes cluster with following script.
export CLUSTER_VERIFY_EXPECTED_PODS=1 # Expected number of GPUs in the cluster ./scripts/k8s/verify_gpu.sh
Now that Kubernetes is installed, consult the Kubernetes Usage Guide for examples of how to use Kubernetes.
The following components are completely optional and can be installed on an existing Kubernetes cluster.
Run the following script to create an administrative user and print out the dashboard URL and access token:
./scripts/k8s/deploy_dashboard_user.sh
The default behavior of DeepOps is to setup an NFS server on the first kube-master
node. This temporary NFS server is used by the nfs-client-provisioner
which is installed as the default StorageClass of a standard DeepOps deployment.
To use an existing nfs server server update the k8s_nfs_server
and k8s_nfs_export_path
variables in config/group_vars/k8s-cluster.yml
and set the k8s_deploy_nfs_server
to false in config/group_vars/k8s-cluster.yml
. Additionally, the k8s_nfs_mkdir
variable can be set to false
if the export directory is already configured on the server.
To manually install or re-install the nfs-client-provisioner
run:
ansible-playbook playbooks/k8s-cluster/nfs-client-provisioner.yml
To skip this installation set k8s_nfs_client_provisioner
to false
.
For a non-nfs based alternative, deploy a Ceph cluster running on Kubernetes for services that require persistent storage (such as Kubeflow):
./scripts/k8s/deploy_rook.sh
Poll the Ceph status by running (this script will return when Ceph initialization is complete):
./scripts/k8s/deploy_rook.sh -w
Deploy NetApp Trident for services that require persistent storage (such as Kubeflow). Note that you must have a NetApp storage system/instance in order to use Trident to provision persistent storage.
-
Set configuration parameters.
vi config/group_vars/netapp-trident.yml
-
Deploy Trident using Ansible.
# NOTE: If SSH requires a password, add: `-k` # NOTE: If sudo on remote machine requires a password, add: `-K` # NOTE: If SSH user is different than current user, add: `-u ubuntu` ansible-playbook -l k8s-cluster playbooks/k8s-cluster/netapp-trident.yml
-
Verify that Trident is running.
./tridentctl -n deepops-trident version
Output of the above command should resemble the following:
+----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 21.01.2 | 21.01.2 | +----------------+----------------+
Deploy Prometheus and Grafana to monitor Kubernetes and cluster nodes:
./scripts/k8s/deploy_monitoring.sh
The services can be reached from the following addresses:
- Grafana: http://<kube-master>:30200
- Prometheus: http://<kube-master>:30500
- Alertmanager: http://<kube-master>:30400
We deploy our monitoring services using the prometheus-operator project. For documentation on configuring and managing the monitoring services, please see the prometheus-operator user guides. The source for our built-in Grafana dashboards can be found in src/dashboards.
To enable syslog forwarding from the cluster nodes to the first Kubernetes controller node, you can set the following variables in your DeepOps configuration:
kube_enable_rsyslog_server: true
kube_enable_rsyslog_client: true
For more information about our syslog forwarding functionality, please see the centralized syslog guide.
Follow the ELK logging Guide to setup logging in the cluster.
The service can be reached from the following address:
- Kibana: http://<kube-master>:30700
The default container registry hostname is registry.local
. To set another hostname (for example, one that is resolvable outside the cluster), add -e container_registry_hostname=registry.example.com
.
ansible-playbook --tags container-registry playbooks/k8s-cluster/container-registry.yml
Many K8s applications require the deployment of a Load Balancer and Ingress. To deploy one, or both, of these services, refer to the Load Balancer and Ingress Guide.
Kubeflow is a popular way for multiple users to run ML workloads. It exposes a Jupyter Notebook interface where users can request access to GPUs via the browser GUI and allows a user to build automated AI pipelines. To deploy Kubeflow refer to the DeepOps Kubeflow Guide.
For more information on Kubeflow, please refer to the official documentation.
DeepOps uses Kubespray to deploy Kubernetes and therefore common cluster actions (such as adding nodes, removing them, draining and upgrading the cluster) should be performed with it. Kubespray is included as a submodule in the submodules/kubespray directory.
To add K8s nodes, modify the config/inventory
file to include the new nodes under [all]
. Then list the nodes as relevant under the [kube-master]
, [etcd]
, and [kube-node]
sections. For example, if adding a new master node, list it under kube-master and etcd. A new worker node would go under kube-node.
Then run the Kubespray scale.yml
playbook...
# NOTE: If SSH requires a password, add: `-k`
# NOTE: If sudo on remote machine requires a password, add: `-K`
# NOTE: If SSH user is different than current user, add: `-u ubuntu`
ansible-playbook -l k8s-cluster submodules/kubespray/scale.yml
More information on this topic may be found in the Kubespray docs.
Removing nodes can be performed with Kubespray's remove-node.yml
playbook and supplying the node names as extra vars...
# NOTE: If SSH requires a password, add: `-k`
# NOTE: If sudo on remote machine requires a password, add: `-K`
# NOTE: If SSH user is different than current user, add: `-u ubuntu`
ansible-playbook submodules/kubespray/remove-node.yml --extra-vars "node=nodename0,nodename1"
This will drain nodename0
& nodename1
, stop Kubernetes services, delete certificates, and finally execute the kubectl command to delete the nodes.
More information on this topic may be found in the Kubespray docs.
Sometimes a cluster will get into a bad state - perhaps one where certs are misconfigured or different across nodes. When this occurs it's often helpful to completely reset the cluster. To accomplish this, run the remove-node.yml
playbook for all k8s nodes...
# NOTE: Explicitly list ALL nodes in the cluster. Do not use an ansible group name such as k8s-cluster.
ansible-playbook submodules/kubespray/remove-node.yml --extra-vars "node=nodename0,nodename1,<...>"
NOTE: There is also a Kubespray
reset.yml
playbook, but this does not do a complete tear-down of the cluster. Certificates and other artifacts might persist on each host, leading to a problematic redeployment in the future. Theremove-node.yml
playbook runsreset.yml
as part of the process.
Refer to the Kubespray Upgrade docs for instructions on how to upgrade the cluster.