core: add command to restore mon quorum

When quorum is lost, restoring quorum to a single mon is currently a complex manual process. Now with this krew command the admin can with less risk reset the mon quorum and restore the cluster again in disaster scenarios. Signed-off-by: Travis Nielsen <[email protected]>
rook · Oct 19, 2022 · 8d60ac6 · 8d60ac6
1 parent e9c483e
commit 8d60ac6
Show file tree

Hide file tree

Showing 5 changed files with 304 additions and 17 deletions.
diff --git a/.github/workflows/ci-for-diff-ns.yaml b/.github/workflows/ci-for-diff-ns.yaml
@@ -36,7 +36,7 @@ jobs:
           POD=$(kubectl -n test-operator get pod -l app=rook-ceph-operator -o jsonpath="{.items[0].metadata.name}")
           kubectl rook_ceph -o test-operator -n test-cluster operator restart
 
-          # let's wait for operator pod to be restart
+          # let's wait for operator pod to be restarted
           kubectl -n test-operator wait --for=delete pod/$POD --timeout=100s
           tests/github-action-helper.sh wait_for_operator_pod_to_be_ready_state_custom
           kubectl rook_ceph -o test-operator -n test-cluster operator set ROOK_LOG_LEVEL DEBUG
@@ -49,6 +49,15 @@ jobs:
           sleep 5
           kubectl rook_ceph -o test-operator -n test-cluster rbd ls replicapool
 
+          # test the mon restore to restore to mon a, delete mons b and c, then add d and e
+          export ROOK_PLUGIN_SKIP_PROMPTS=true
+          kubectl rook_ceph -o test-operator -n test-cluster mons restore-quorum a
+          kubectl -n test-cluster wait pod -l app=rook-ceph-mon-b --for=delete --timeout=90s
+          kubectl -n test-cluster wait pod -l app=rook-ceph-mon-c --for=delete --timeout=90s
+          tests/github-action-helper.sh wait_for_three_mons test-cluster
+          kubectl -n test-cluster wait deployment rook-ceph-mon-d --for condition=Available=True --timeout=90s
+          kubectl -n test-cluster wait deployment rook-ceph-mon-e --for condition=Available=True --timeout=90s
+
           # for testing osd purge scale the osd deplyment
           kubectl --namespace test-cluster scale deploy/rook-ceph-osd-0 --replicas=0
           # we need to sleep so the osd will be marked down before purging the osd

diff --git a/README.md b/README.md
@@ -58,6 +58,7 @@ These are args currently supported:
 - `rbd <args>` : Call a 'rbd' CLI command with arbitrary args
 
 - `mons` : Print mon endpoints
+  - `restore-quorum <mon-name>` : Restore the mon quorum based on a single healthy mon since quorum was lost with the other mons
 
 - `health` : check health of the cluster and common configuration issues
 
@@ -88,7 +89,7 @@ Visit docs below for complete details about each command and their flags uses.
 
 1. [Running ceph commands](docs/ceph.md)
 1. [Running rbd commands](docs/rbd.md)
-1. [Getting mon endpoints](docs/mons.md)
+1. [Get mon endpoints](docs/mons.md#print-mon-endpoints)
 1. [Get cluster health status](docs/health.md)
 1. [Update configmap rook-ceph-operator-config](docs/operator.md#set)
 1. [Restart operator pod](docs/operator.md#restart)
@@ -98,6 +99,7 @@ Visit docs below for complete details about each command and their flags uses.
 1. [Get specific CR status](docs/rook.md#status-cr-name)
 1. [To purge OSD](docs/rook.md#operator.md)
 1. [Debug OSDs and Mons](docs/debug.md)
+1. [Restore mon quorum](docs/mons.md#restore-quorum)
 1. [Disaster Recovery](docs/dr-health.md)
 
 ## Examples

diff --git a/docs/mons.md b/docs/mons.md
@@ -1,4 +1,6 @@
-# Mons
+# Mon Commands
+
+## Print Mon Endpoints
 
 This is used to print mon endpoints.
 
@@ -7,3 +9,32 @@ kubectl rook-ceph mons
 
 # 10.98.95.196:6789,10.106.118.240:6789,10.111.18.121:6789
 ```
+
+## Restore Quorum
+
+Mon quorum is critical to the Ceph cluster. If majority of mons are not in quorum,
+the cluster will be down. If the majority of mons are also lost permanently,
+the quorum will need to be restore to a remaining good mon in order to bring
+the Ceph cluster up again.
+
+To restore the quorum in this disaster scenario:
+
+1. Identify that mon quorum is lost. Some indications include:
+   - The Rook operator log shows timeout errors and continuously fails to reconcile
+   - All commands in the toolbox are unresponsive
+   - Multiple mon pods are likely down
+2. Identify which mon has good state.
+   - Exec to a mon pod and run the following command
+     - `ceph daemon mon.<name> mon_status`
+     - For example, if connecting to mon.a, run: `ceph daemon mon.a mon_status`
+   - If multiple mons respond, find the mon with the highest `election_epoch`
+3. Start the toolbox pod if not already running
+4. Run the command below to restore quorum to that good mon
+5. Follow the prompts to confirm that you want to continue with each critical step of the restore
+6. The final prompt will be to restart the operator, which will add new mons to restore the full quorum size
+
+In this example, quorum is restore to mon **a**.
+
+```bash
+kubectl rook-ceph mons restore-quorum a
+```