By default, Kubernetes stores cluster state information in a etcd datastore. This datastore is usally distributed across each of the master nodes. The data is highly dynamic and should be backed up very frequently in case a restore is needed.
Red Hat OpenShift Container Platform (OCP) provides a utility for taking point in time etcd backups (/usr/local/bin/cluster-backup.sh). This script is available on each master node and will create a local backup of the etcd datastore and a set of critical k8s manifests.
There are several problems with this approach:
- The backup is kept on the local system. If that system fails, the backup is no longer available.
- There is no facility to manage a history of backup sets.
- There is no native facility to manage scheduling of backups.
This utility will accomplish three goals:
- Create a regular schedule of backups.
- Synchornize backup sets to nodes and external storage.
- Purge stale backup sets that are no longer viable.
There are two operating modes available. Both can be configured and used, but the intent is that only one is. They are functionally equivalent, though systemd Mode requires hand-coding to sync backup sets to external storage. The preferred mode is Privileged Pod Mode.
systemd Mode uses a unit file and timer to schedule the backup and, optionally, a second unit file and timer to schedule the syncing (e.g. "spraying") of the backup sets to other nodes and external storage.
Privileged Pod Mode uses a Helm chart to deploy a privileged workload (CronJob) that schedule the backup and, optionally, synchronize the backup sets to other nodes and external storage. Currently, only NFS external storage is supported.
-
Access the cluster as a user with a binding to
ClusterRole/cluster-admins
. -
Create a namespace to deploy the manifests to (eg.
oc new-project etcd-snapshot
). -
Designate one master node as the snapshot node by adding the
etcd-snapshot.example.com: snapshot
label. -
If using sync mode, designate at least one master node as a mirror node by adding the
etcd-snapshot.example.com: mirror
label. -
If using sync mode, Create a secret called
.Values.sync.sshSecret
in the namespace from above that contains aprivate
key with a value that is the encoded SSH private key used when originally configuring the OCP cluster. -
Create a
my-values.yaml
file based off of the includedvalues.yaml
. -
Choose the correct
.Values.mode
("pv" | "sync"). PV mode will synchronize backup sets to external storage via a PersistentVolume that is created and mantained by the chart. Sync mode will synchronize backup sets to other master nodes.- PV Mode: Edit the
.Values.pv
section to reflect the target NFS service. - Sync Mode: This mode uses rsync over SSH to sync ("spray") the backup sets to other master nodes in the cluster.
- Designate master nodes to receive the backup sets by adding the
etcd-snapshot.example.com/role: mirror
label.
- Designate master nodes to receive the backup sets by adding the
- PV Mode: Edit the