Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAYSAT-1787: Adding ceph health check bypass in sat bootsys ncn-power #239

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
ignorable.
- Automate the procedure of setting next boot device to disk before the management nodes are
powered off as part of the full-system shutdown.
- Adding a ceph health check bypass prompt to take input from user and act accordingly.
unfreezing of ceph would be done, only the wait period will be skipped if user wishes to.

### Fixed
- Updated `sat bootsys` to increase the default management NCN shutdown timeout
Expand Down
13 changes: 11 additions & 2 deletions sat/cli/bootsys/mgmt_power.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,10 +477,19 @@ def do_power_on_ncns(args):
if ncn_group == included_ncn_groups['storage']:
try:
do_ceph_unfreeze(included_ncn_groups)
LOGGER.info('Ceph unfreeze completed successfully on storage NCNs.')

except FatalPlatformError as err:
LOGGER.error(f'Failed to unfreeze Ceph on storage NCNs: {err}')
sys.exit(1)
LOGGER.info('Ceph unfreeze completed successfully on storage NCNs.')
# Use pester_choices to prompt the user
user_choice = pester_choices('Ceph is not healthy. Do you want to continue anyway?',
['yes', 'no'])
if user_choice == 'no':
LOGGER.info('Exiting as per user\'s decision.')
sys.exit(1)
else:
LOGGER.info('Continuing despite Ceph not being healthy as per user\'s input, '
'make sure to verify it later.')

# Mount Ceph and S3FS filesystems on ncn-m001
try:
Expand Down
Loading