Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #424 udev rule construction #425

Conversation

christopher-horn
Copy link
Contributor

Modifies create-disk-link.sh udev rule construction to key off the UUID as follows:

echo "ENV{DM_UUID}=="$(sudo udevadm info --root --name="$storage_device" | sudo grep DM_UUID | sudo cut -f2 -d'=')" SYMLINK+="$storage_disk_name"" | sudo tee /etc/udev/rules.d/99-custom-ocp.rules;

Resulting in a rule constructed as follows:

ENV{DM_UUID}=="mpath-360050768108002dac800000000000422" SYMLINK+="disk/pv-storage-disk"

Also moves the rule into the user rules.d directory.

@ppc64le-cloud-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: christopher-horn
To complete the pull request process, please assign cs-zhang after the PR has been reviewed.
You can assign the PR to them by writing /assign @cs-zhang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ppc64le-cloud-bot
Copy link
Contributor

Welcome @christopher-horn! It looks like this is your first PR to ocp-power-automation/ocp4-upi-powervs 🎉

@ppc64le-cloud-bot ppc64le-cloud-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 7, 2022
@ppc64le-cloud-bot
Copy link
Contributor

Hi @christopher-horn. Thanks for your PR.

I'm waiting for a ocp-power-automation member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yussufsh
Copy link
Contributor

yussufsh commented Jun 8, 2022

@christopher-horn I assume this change is supporting dm multipath environment. Will it work on sd non-multipath SCSI disk?

@christopher-horn
Copy link
Contributor Author

AFAIK all PowerVS deploys use virtual fiber channel so multi-pathing should always be enabled and in use. We use vSCSI for our on-premise PowerVC deploys and with dual VIOS again the same applies.

I suppose could test and use the previous method if multi-pathing driver not enabled and active. You would rather see that?

@yussufsh
Copy link
Contributor

yussufsh commented Jun 8, 2022

AFAIK all PowerVS deploys use virtual fiber channel so multi-pathing should always be enabled and in use. We use vSCSI for our on-premise PowerVC deploys and with dual VIOS again the same applies.

Agreed that PowerVS setup is always multipath.

I suppose could test and use the previous method if multi-pathing driver not enabled and active. You would rather see that?

That would be good, a simple if else can be implemented in ocp4-upi-powervm as well.

@ppc64le-cloud-bot
Copy link
Contributor

ppc64le-cloud-bot commented Jun 20, 2022

@christopher-horn: PR is not mergeable.

The PR state is: behind

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yussufsh
Copy link
Contributor

@christopher-horn any update?

@christopher-horn
Copy link
Contributor Author

I still owe you update for this, my apologies as have not had time with my day to day responsibilities. I am going to close this one and open new PR when have time.

@yussufsh
Copy link
Contributor

@christopher-horn do you mind if I take up this change? We need to enable multipath as a requirement on powervs nodes.

@christopher-horn
Copy link
Contributor Author

Yes certainly go ahead. I am sorry I have not had time.

I forget if this automation sets the bootlist or not as well. I use the following on the bastion in ours though could probably be done better:

#!/bin/bash

function set_bootlist
{
    host=$1

    ssh core@${host} uname -a 2> /dev/null | grep -q ppc64le
    (($?!=0)) && echo "Error: Architecture not ppc64le on ${host}" && exit 1

    ssh core@${host} systemctl is-active multipathd 1> /dev/null
    (($?!=0)) && echo "Error: Multipathing not active on ${host}" && exit 1
    
    disk=$(ssh core@${host} sudo bootlist -m normal -o | grep -E "^sd" | head -n1)
    [[ -z ${disk} ]] && echo "Error: Can not identify a boot block device for ${host}" && exit 1

    mapper=$(ssh core@${host} ls /sys/block/${disk}/holders)
    [[ -z ${mapper} ]] && echo "Error: Can not identify device mapper managing ${disk} for ${host}" && exit 1

    devices=$(echo $(ssh core@${host} ls /sys/block/${mapper}/slaves | awk '{print "/dev/"$1}'))
    [[ -z ${devices} ]] && echo "Error: Can not identify block devices managed by ${mapper} for ${host}" && exit 1

    ssh core@${host} sudo bootlist -m normal -o ${devices}
    (($?!=0)) && echo "Error: Problem setting boot list to ${devices} for ${host}" && exit 1
    echo "Set ${host} boot list to ${devices}"
}

for node in $(/usr/local/bin/oc get nodes | awk '!/NAME/ {print $1}') ; do
    set_bootlist ${node}
done
exit 0

@christopher-horn
Copy link
Contributor Author

The above can be done much better, you really want to set the list so it round robins over the paths especially as the boot list can only hold 5 entries and in PowerVS you have 8 split into 4 and 4 since it is vFC and not vSCSI. I just realize that today.

@christopher-horn
Copy link
Contributor Author

This seems to be a better approach for the bootlist:

#!/bin/bash

function set_bootlist
{
    host=$1

    ssh core@${host} uname -a 2> /dev/null | grep -q ppc64le
    (($?!=0)) && echo "Error: Architecture not ppc64le on ${host}" && exit 1

    ssh core@${host} systemctl is-active multipathd 1> /dev/null
    (($?!=0)) && echo "Error: Multipathing not active on ${host}" && exit 1
    
    disk=$(ssh core@${host} sudo bootlist -m normal -o | grep -E "^sd" | head -n1)
    [[ -z ${disk} ]] && echo "Error: Can not identify a boot block device for ${host}" && exit 1

    mpath_list="$(ssh core@${host} sudo multipath -l)"
    mpath_name=$(echo "${mpath_list}" | tr -d '|' | awk -v BLOCK=$disk 'BEGIN {MPATH=""}; /^mpath/ {MPATH=$1; next}; ($3 == BLOCK) {print MPATH; exit 0}')
    [[ -z ${mpath_name} ]] && echo "Error: Can not identify multipath device managing ${disk} for ${host}" && exit 1

    mpath_device="$(ssh core@${host} sudo multipath -l ${mpath_name})"
    boot_list=$(echo "${mpath_device}" | tr -d '|' | awk '/ sd/ {print $2" "$3}' | sort | head -n5 | cut -d ' ' -f 2 | tr '\n' ' ')
    [[ -z ${boot_list} ]] && echo "Error: Can not identify block devices managed by ${mpath_name} for ${host}" && exit 1

    ssh core@${host} sudo bootlist -m normal -o ${boot_list}
    (($?!=0)) && echo "Error: Problem setting boot list to ${boot_list} for ${host}" && exit 1
    echo "Set ${host} boot list to ${boot_list}"
}

for node in $(/usr/local/bin/oc get nodes | awk '!/NAME/ {print $1}') ; do
    set_bootlist ${node}
done
exit 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/non-mergeable needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bastion stuck in emergency shell on boot unable to mount /dev/disk/pv-storage-disk
3 participants