-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Brain when logged in user CWDed into ZFS volume #38
Comments
Relevant portion of log is (sorrry for cut but i was into a split-screened shell):
looks like when something is using the filesystem locally the resource agent is unable to stop the fs, then crashes and triggers a fence event. Fencing that doesn't happen (I've configured idrac but doesn't power cycle the node if I trigger a fence). But this is another story. Same behaviour occours with a zfs send in progress. In order to mitigate this issue I wrote an helper script, that i put in /usr/lib/ocf/lib/heartbeat/helpers/zfs-helper: #!/bin/bash
# Pre-Export script for ZFS Pool
# Check if there is some process using files in Zpool and kill them
# Requires lsof, ps, awk, sed
zpool_pre_export () {
# Forcibly Terminate all pids using zpool
ZPOOL=$1
#Exits gracefully anyway, for now
RET=0
lsof /$ZPOOL{*,/*} | awk '{print ($2)}' | sed -e "1d" | \
while read PID
do
echo "Terminating PID $PID"
kill -9 $PID
done
# Check if some blocking ZFS operations are running, such
# zfs send ...
ps aux | grep $ZPOOL | awk '{print ($2)}' | \
while read PID
do
echo "Terminating PID $PID"
kill -9 $PID
done
exit $RET
}
case $1 in
pre-export)
zpool_pre_export $2
;;
esac |
Wouldn't using the multihost protection prevent the second host from mounting the pool? |
Wasn't aware of this feature. I've enabled it and testing it. |
I have put it in /usr/lib/ocf/lib/heartbeat/zfs-helper.sh, as there is no helper directory in RHEL8 and there are other scripts in this directory. Does anything else have to be done for this on RHEL8? |
I don't remember since months are passed, but is possible that I needed to create the required directory. |
Hi,
Consider this scenario:
I observed that a fence action is triggered.
The worst thing happened is that the fence action don't work as expected: the volume stays mounted on both nodes, causing ZFS errors (and file corruption). I assume SCSI reservations are somehow not honored.
I triple checked the configuration and looks like ok.
Since I'm planning to add sanoid/syncoid for snapshot/replica send, I would like to avoid a split brain in case of failover in the middle of a process on node using the filesystem.
I think this behaviour it reproducible with ease.
The text was updated successfully, but these errors were encountered: