Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAYSAT-1893: Make default BOS timeouts infinite #259

Merged
merged 1 commit into from
Sep 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [3.31.1] - 2024-09-02

### Changed
- Changed `sat bootsys` to enable default timeouts to infinite for BOS boot, shutdown, and reboot.

## [3.31.0] - 2024-08-21

### Fixed
Expand Down
10 changes: 5 additions & 5 deletions docs/man/sat-bootsys.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Perform boot, shutdown, or reboot actions on the system
-------------------------------------------------------

:Author: Hewlett Packard Enterprise Development LP.
:Copyright: Copyright 2020-2023 Hewlett Packard Enterprise Development LP.
:Copyright: Copyright 2020-2024 Hewlett Packard Enterprise Development LP.
:Manual section: 8

SYNOPSIS
Expand Down Expand Up @@ -243,7 +243,7 @@ These options set the timeouts of various parts of the stages of the
**--bos-shutdown-timeout** *BOS_SHUTDOWN_TIMEOUT*
Timeout, in seconds, to wait until compute and
application nodes have completed their BOS shutdown.
Defaults to 600. Overrides the option
No default timeout is set. It is infinite. Overrides the option
bootsys.bos_shutdown_timeout in the config file.

**--ncn-shutdown-timeout** *NCN_SHUTDOWN_TIMEOUT*
Expand Down Expand Up @@ -297,7 +297,7 @@ These options set the timeouts of various parts of the stages of the
**--bos-boot-timeout** *BOS_BOOT_TIMEOUT*
Timeout, in seconds, to wait until compute and
application nodes have completed their BOS boot.
Defaults to 900. Overrides the option
No default timeout is set. It is infinite. Overrides the option
bootsys.bos_boot_timeout in the config file.

REBOOT TIMEOUT OPTIONS
Expand All @@ -309,7 +309,7 @@ These options set the timeouts of various parts of the stages of the
**--bos-shutdown-timeout** *BOS_SHUTDOWN_TIMEOUT*
Timeout, in seconds, to wait until compute and
application nodes have completed their BOS shutdown.
Defaults to 600. Overrides the option
No default timeout is set. It is infinite. Overrides the option
bootsys.bos_shutdown_timeout in the config file.

For a reboot, the --bos-shutdown-timeout and
Expand All @@ -318,7 +318,7 @@ These options set the timeouts of various parts of the stages of the
**--bos-boot-timeout** *BOS_BOOT_TIMEOUT*
Timeout, in seconds, to wait until compute and
application nodes have completed their BOS boot.
Defaults to 900. Overrides the option
No default timeout is set. It is infinite. Overrides the option
bootsys.bos_boot_timeout in the config file.

For a reboot, the --bos-shutdown-timeout and
Expand Down
15 changes: 11 additions & 4 deletions sat/cli/bootsys/bos.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,13 +626,16 @@ def do_parallel_bos_operations(session_templates, operation, timeout, limit=None

completed_state = 'stage' if stage else 'complete'
completed_state_past_tense = 'staged' if stage else 'completed'
LOGGER.info(f'Waiting up to {timeout} seconds for {session_plural} to {completed_state}.')
if timeout == -1:
LOGGER.info(f'Waiting for {session_plural} to {completed_state}.')
else:
LOGGER.info(f'Waiting up to {timeout} seconds for {session_plural} to {completed_state}.')

active_threads = {t.session_template: t for t in bos_session_threads}
failed_session_templates = []
just_finished = []

while active_threads and elapsed_time < timeout:
while active_threads and (timeout == -1 or elapsed_time < timeout):
if just_finished:
LOGGER.info(f'Still waiting on session(s) for template(s): '
f'{", ".join(active_threads.keys())}')
Expand Down Expand Up @@ -940,10 +943,14 @@ def do_bos_reboots(args: Namespace):
prompt_continue('reboot of nodes using BOS')

try:
boot_timeout = get_config_value("bootsys.bos_boot_timeout")
shutdown_timeout = get_config_value("bootsys.bos_shutdown_timeout")
total_timeout = boot_timeout + shutdown_timeout
if total_timeout < -1:
total_timeout = -1
do_bos_operations(
"reboot",
get_config_value("bootsys.bos_boot_timeout") +
get_config_value("bootsys.bos_shutdown_timeout"),
total_timeout,
limit=args.bos_limit,
recursive=args.recursive,
stage=args.staged_session
Expand Down
4 changes: 2 additions & 2 deletions sat/cli/bootsys/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@
'BGP routes report that they are established on management switches.'),
TimeoutSpec('hsn', ['boot'], 300,
'the high-speed network (HSN) has returned to its pre-shutdown state.'),
TimeoutSpec('bos-shutdown', ['shutdown', 'reboot'], 600,
TimeoutSpec('bos-shutdown', ['shutdown', 'reboot'], -1,
'compute and application nodes have completed their BOS shutdown.'),
TimeoutSpec('bos-boot', ['boot', 'reboot'], 900,
TimeoutSpec('bos-boot', ['boot', 'reboot'], -1,
'compute and application nodes have completed their BOS boot.'),
TimeoutSpec('ncn-shutdown', ['shutdown'], 900,
'management NCNs have completed a graceful shutdown and have reached '
Expand Down
22 changes: 14 additions & 8 deletions sat/config.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# MIT License
#
# (C) Copyright 2019-2023 Hewlett Packard Enterprise Development LP
# (C) Copyright 2019-2024 Hewlett Packard Enterprise Development LP
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
Expand Down Expand Up @@ -30,6 +30,7 @@
import logging
import os
from textwrap import dedent, indent
import math

import toml

Expand Down Expand Up @@ -342,13 +343,18 @@ def get_config_value(query_string):
if len(parts) != EXPECTED_LEVELS:
raise ValueError("Wrong number of levels in query string passed to get_config_value(). "
"(Should be {}, was {}.)".format(EXPECTED_LEVELS, len(parts)))
else:
section, option = parts
if not section or not option:
raise ValueError("Improperly formatted query string supplied to get_config_value(). "
"(Got '{}'.)".format(query_string))
else:
return CONFIG.get(section, option)

section, option = parts
if not section or not option:
raise ValueError("Improperly formatted query string supplied to get_config_value(). "
"(Got '{}'.)".format(query_string))

value = CONFIG.get(section, option)

if option.lower() == 'timeout' and value == '-1':
return math.inf
haasken-hpe marked this conversation as resolved.
Show resolved Hide resolved

return value


def read_config_value_file(query_string):
Expand Down
Loading