-
Notifications
You must be signed in to change notification settings - Fork 130
Resilience
It is essential that Smart Home systems are reliable.
The following techniques increase the reliability of a single MisterHouse installation;
- MisterHouse Watchdog
- Use a hardware watchdog e.g. The Raspberry Pi Watchdog
- Use SSD's instead of SD cards
- Have a backup/recovery strategy
With the widespread availability of very inexpensive, powerful computers like the Raspberry Pi, it is now viable to build in redundancy.
This design has two identical Raspberry Pis with identical software, one with a hostname of Master
and the other Slave
. If MisterHouse stops responding on the Master for any reason, the Slave takes over.
CAVEAT: This technique only works where the MisterHouse instance is not directly connected to the sensors, so for example it wouldn't work if you have only have one CM11 connected to the serial port of the Master, where once the Master dies, connection to that CM11 stops, you would need a CM11 connected to each MisterHouse instance.
Use the hostname
in scripts and MisterHouse to decide how to behave so the code on both machines is identical.
Both machines run a crontab script every few minutes, the slave checks the Master and the Master runs the watchdog script to monitor itself e.g.
THISHOST=$(hostname -s)
case $THISHOST in
Master )
#echo "Starting watchcron"
/home/pi/mh/scripts/watchcron.sh;;
Slave )
#echo "Starting launchslave"
perl /home/pi/mh/scripts/launchslave.pl ;;
* )
echo "ERROR: Invalid hostname"
exit 1 ;;
esac
The Master watchcron.sh
looks like this and restarts Misterhouse if the watchdog file is older than 5 minutes:
#!/bin/sh
WATCHDOGFILE=$HOME/mh/data/watchdog
# if watchdog hasn't been touched in 5 minutes, restart MisterHouse
MAXDIFFERENCE=300
watchdog=`stat --format=%Y $WATCHDOGFILE`;
now=`date +%s`;
difference=$(( $now - $watchdog))
if [ $difference -gt $MAXDIFFERENCE ]; then
echo "Misterhouse needs restart"
echo `date` Restarting Misterhouse >> $HOME/mh/data/logs/watchcron.log
touch $WATCHDOGFILE
sudo systemctl start misterhouse
fi
The launchslave.pl
script does the following
Quit if run on the Master machine, otherwise;
Ping the Master's http port to detect if MisterHouse is running
If MisterHouse is running on the Master and MisterHouse is also running on the Slave,
stop MisterHouse on the Slave and email what happened.
If MisterHouse is running on the Master copy over the mh_temp.saved_states
so that if we need to start the Slave in the future, it has a recent status.
If MisterHouse is NOT running on the Master and MisterHouse is NOT running on the Slave,
start MisterHouse on the Slave and email what happened