Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add retry limits for ssh related commands #373

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions src/nixos-anywhere.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ fi
postKexecSshPort=22
buildOnRemote=n
envPassword=
sshRetryLimit=-1
rebootRetryLimit=-1

declare -A diskEncryptionKeys
declare -a nixCopyOptions
Expand Down Expand Up @@ -86,6 +88,10 @@ Options:
disko: first unmount and destroy all filesystems on the disks we want to format, then run the create and mount mode
install: install the system
reboot: reboot the machine
* --ssh-retry-limit <limit>
set the number of times to retry the ssh connection before giving up
* --reboot-retry-limit <limit>
set the number of times to wait for the reboot before giving up.
USAGE
}

Expand Down Expand Up @@ -213,6 +219,14 @@ parseArgs() {
--vm-test)
vmTest=y
;;
--ssh-retry-limit)
sshRetryLimit=$2
shift
;;
--reboot-retry-limit)
rebootRetryLimit=$2
shift
;;
*)
if [[ -z ${sshConnection-} ]]; then
sshConnection="$1"
Expand Down Expand Up @@ -316,6 +330,7 @@ uploadSshKey() {
fi

step Uploading install SSH keys
local retryCount=0
until
if [[ -n ${envPassword} ]]; then
sshpass -e \
Expand All @@ -339,7 +354,11 @@ uploadSshKey() {
"$sshConnection"
fi
do
sleep 3
sleep 5
retryCount=$((retryCount + 1))
if [[ $sshRetryLimit -ne -1 ]] && [[ $retryCount -ge $sshRetryLimit ]]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy about having a lot of these timeouts everywhere...

abort "Reached ssh retry limit of $sshRetryLimit"
fi
done
}

Expand Down Expand Up @@ -581,7 +600,14 @@ main() {

if [[ ${phases[reboot]-} == 1 ]]; then
step Waiting for the machine to become unreachable due to reboot
while runSshTimeout -- exit 0; do sleep 1; done
retryCount=0
until runSsh -o ConnectTimeout=10 -- exit 0; do
sleep 5
retryCount=$((retryCount + 1))
if [[ $rebootRetryLimit -ne -1 ]] && [[ $retryCount -ge $rebootRetryLimit ]]; then
abort "Machine didn't come online after reboot connection limit of $rebootRetryLimit retries"
fi
done
fi

step "Done!"
Expand Down
Loading