All of these scripts and configurations are specific to my home cluster. Do not expect any configurations to "just work" if you plan on using them.
This repo contains the Argo app-of-apps configuration, which installs Argo projects and apps. See apps/apps
.
- create-cluster.sh: Installs and configures k3s on all nodes.
- destroy-cluster.sh: Uninstalls k3s from all nodes. (I have had to rebuild the cluster many, many times.)
- install-metallb.sh: Installs metallb
- uninstall-metallb.sh: Uninstalls metallb
- install-argo.sh: Installs Argo CD
- uninstall-argo.sh: Uninstalls Argo CD
k3s is installed with as little as possible. There is no Traefik (we will use our own ingress controller) or servicelb (we're using metallb) installed.
I'm using metallb instead of servicelb because I use a LoadBalancer service with a loadBalancerIP
, which is unsupported by servicelb. It's also very convenient to have a virtual IP address for external services.
Re-run create-cluster.sh
.
Re-run install-argo.sh
.
To check which charts are out-of-date, run ./scripts/helm-tools/compare-helm-versions.js
.
The process:
- Manually bump the chart dependency version
- Run
helm dependencies update && helm dependencies build
to create an updatedChart.lock
- Make one PR per updated dependency and roll out changes on-by-one
As soon as you set up a cluster, do yourself a favour and test inter-node communication works. The cluster can appear to be working initially, but big and confusing issues prop up if this is broken and you don't know.
I test with:
krun nicolaka/netshoot -H snowkube
krun nicolaka/netshoot -H suplex
krun ubuntu -H sentinel
(nicolaka/netshoot
isn't available on arm64)
Run ip a
and note the IP address, and then run iperf -s
on one of the pods.
Use iperf -c <IP>
on all other nodes. They should all be communicating at roughly network speeds.
Note: krun
is a custom fish script.
If you find there is no communication between nodes, try:
- restarting k3s:
sudo systemctl restart k3s
on the main node - Restarting all nodes
- Destroying the cluster and starting again
- Ensure routes are correctly set up on the nodes, iptables is configured, etc. See also: k3s known issues
Caused by an x86 image running on ARM.
Unfortunately ARM is a second class citizen in the k8s world and there are many images that are not supported. You can either build your own ARM image, or use the following to de-select ARM machines from scheduling:
nodeSelector:
kubernetes.io/arch: amd64
I experienced the Argo Application Controller in a crash loop, tailing the logs I found:
runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+
fatal error: mlock failed
Manually upgrading the kernel to 5.4.28
appears to have fixed the issue.
For Ubuntu, download and dpkg -i *.deb
: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4.28/ the filed listed under Build for amd64 succeeded
, except lowlatency
labelled packages.
Hostname | Arch | OS | CPU | RAM | Storage |
---|---|---|---|---|---|
Suplex | x86_64 | Arch Linux | E3-1245 v3 @ 3.40GHz | 32GB | 458GB SSD, 30TB spinning rust (ZFS) |
Snowkube | x86_64 | Ubuntu Server 20.04 LTS | i7-8700B CPU @ 3.20GHz | 22GB | 200GB SSD |
Sentinel | aarch64 | Ubuntu Server 20.04 LTS | ARM Cortex-A72 @ 1.50GHz | 2GB | 59GB MicroSD |