From e43d40fdf299b8f8d220ab7c9e7f85ad9b638d6f Mon Sep 17 00:00:00 2001 From: Milos Stankovic <82043364+morph-dev@users.noreply.github.com> Date: Wed, 30 Oct 2024 18:25:49 +0200 Subject: [PATCH] docs: update node reboot instructions --- .../contributing/releases/deployment.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/book/src/developers/contributing/releases/deployment.md b/book/src/developers/contributing/releases/deployment.md index 85aca014f..402c8eede 100644 --- a/book/src/developers/contributing/releases/deployment.md +++ b/book/src/developers/contributing/releases/deployment.md @@ -26,6 +26,11 @@ - Log in to Docker with: `docker login` - Ask Nick to be added as collaborator on Docker repo +- Needed for [rebooting nodes](#what-do-i-do-if-ansible-says-a-node-is-unreachable) + - [Install doctl](https://docs.digitalocean.com/reference/doctl/how-to/install/) + - Contact `@paulj` to get `doctl` API key + - Make sure API key works by running: `doctl auth init` + ## Each Deployment ### Prepare @@ -135,10 +140,16 @@ It means your key isn't working. Check with `@paulj`. If using `gpg` and decryption problems persist, see [this potential fix](https://github.com/getsops/sops/issues/304#issuecomment-377195341). ### What do I do if Ansible says a node is unreachable? + You might see this during a deployment: -> fatal: [trin-ams3-18]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 178.128.253.26 port 22: Connection timed out", "unreachable": true} -Retry once more. If it times out again, ask `@paulj` to reboot the machine. +> fatal: [trin-ams3-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host XXX.XXX.XXX.XXX port XX: Connection timed out", "unreachable": true} + +Retry once more. If it times out again, run [reboot script](https://github.com/ethereum/cluster/blob/master/portal-network/trin/ansible/reboot_node.sh) (check [First time Setup](#first-time-setup) chapter for setup): + +```shell +./reboot_node.sh ,,..., +``` ### What if everything breaks and I need to rollback the deployment? If you observe things breaking or (significantly) degraded network performance after a deployment, you might want to rollback the changes to a previously working version until the breaking change can be identified and fixed. Keep in mind that you might want to rollback just the bridge nodes, or the backfill nodes, as opposed to every node on the network.