Skip to content

Commit

Permalink
Adding ceph health check bypass in sat bootsys ncn-power
Browse files Browse the repository at this point in the history
IM:CRAYSAT-1787
Reviewer:Ryan
Adding ceph health check bypass prompt for the user to decide whether to wait
or proceed with skipping the health check after unfreezing of ceph is done.
As it may take some time and the next steps may not explicitly require, by the
time it comes back it would be good to use.
  • Loading branch information
Shivaprasad Ashok Metimath committed Jul 25, 2024
1 parent ec8154e commit 53b4981
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 2 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
ignorable.
- Automate the procedure of setting next boot device to disk before the management nodes are
powered off as part of the full-system shutdown.
- Adding a ceph health check bypass prompt to take input from user and act accordingly.
unfreezing of ceph would be done, only the wait period will be skipped if user wishes to.

### Fixed
- Updated `sat bootsys` to increase the default management NCN shutdown timeout
Expand Down
13 changes: 11 additions & 2 deletions sat/cli/bootsys/mgmt_power.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,10 +477,19 @@ def do_power_on_ncns(args):
if ncn_group == included_ncn_groups['storage']:
try:
do_ceph_unfreeze(included_ncn_groups)
LOGGER.info('Ceph unfreeze completed successfully on storage NCNs.')

except FatalPlatformError as err:
LOGGER.error(f'Failed to unfreeze Ceph on storage NCNs: {err}')
sys.exit(1)
LOGGER.info('Ceph unfreeze completed successfully on storage NCNs.')
# Use pester_choices to prompt the user
user_choice = pester_choices('Ceph is not healthy. Do you want to continue anyway?',
['yes', 'no'])
if user_choice == 'no':
LOGGER.info('Exiting as per user\'s decision.')
sys.exit(1)
else:
LOGGER.info('Continuing despite Ceph not being healthy as per user\'s input, '
'make sure to verify it later.')

# Mount Ceph and S3FS filesystems on ncn-m001
try:
Expand Down

0 comments on commit 53b4981

Please sign in to comment.