-
Notifications
You must be signed in to change notification settings - Fork 130
Resilience
It is essential that Smart Home systems are reliable.
MisterHouse is perfectly capable of running all of your smart home on its own as a Monolith, however if you have hardwired sensors e.g. wired PIRs, CM11, or 1-wire temperature sensors it becomes the single point of failure and it is very difficult to have redundant standby instances to take over automatically in the event of a failure.
The resolution to this is to implement a distributed architecture with wireless communications between the sensors and MisterHouse. For example the ESP8266 developer boards cost less than $5, have wifi and up to 5 or 6 usable I/O pins, and when loaded with the Tasmota firmware have drivers for many devices (PIR, 1-wire etc..) and MQTT and a web server to communicate with MisterHouse.
Unfortunately WiFi based systems tend to use too much power for Battery power supplies to be on all of the time. Sometimes you can overcome this with devices that start and stop just when they need to transmit a signal, but usually you need to resort to a different wireless topology for battery powered 'always-on' devices such as remote controls, and then have a powered gateway device to link them to MisterHouse.
The following techniques increase the reliability of a single MisterHouse installation;
- MisterHouse Watchdog
- Use a hardware watchdog e.g. The Raspberry Pi Watchdog
- Use SSD's instead of SD cards. See here for how to minimise disk writes.
- Have a backup/recovery strategy
With the widespread availability of very inexpensive, powerful computers like the Raspberry Pi, it is now viable to build in redundancy.
This design has two identical Raspberry Pis with identical software, one with a hostname of Master
and the other Slave
. If MisterHouse stops responding on the Master for any reason, the Slave takes over.
CAVEAT: This technique only works where the MisterHouse instance is not directly connected to the sensors, so for example it wouldn't work if you have only have one CM11 connected to the serial port of the Master, where once the Master dies, connection to that CM11 stops, you would need a CM11 connected to each MisterHouse instance.
Use the hostname
in scripts and MisterHouse to decide how to behave so the code on both machines is identical.
Both machines run a crontab script every few minutes, the slave checks the Master and the Master runs the watchdog script to monitor itself e.g.
THISHOST=$(hostname -s)
case $THISHOST in
Master )
#echo "Starting watchcron"
/home/pi/mh/scripts/watchcron.sh;;
Slave )
#echo "Starting launchslave"
perl /home/pi/mh/scripts/launchslave.pl ;;
* )
echo "ERROR: Invalid hostname"
exit 1 ;;
esac
The Master watchcron.sh
looks like this and restarts Misterhouse if the watchdog file is older than 5 minutes:
#!/bin/sh
WATCHDOGFILE=$HOME/mh/data/watchdog
# if watchdog hasn't been touched in 5 minutes, restart MisterHouse
MAXDIFFERENCE=300
watchdog=`stat --format=%Y $WATCHDOGFILE`;
now=`date +%s`;
difference=$(( $now - $watchdog))
if [ $difference -gt $MAXDIFFERENCE ]; then
echo "Misterhouse needs restart"
echo `date` Restarting Misterhouse >> $HOME/mh/data/logs/watchcron.log
touch $WATCHDOGFILE
sudo systemctl start misterhouse
fi
The launchslave.pl
script does the following
Quit if run on the Master machine, otherwise;
Ping the Master's http port to detect if MisterHouse is running
If MisterHouse is running on the Master and MisterHouse is also running on the Slave,
stop MisterHouse on the Slave and email what happened.
If MisterHouse is running on the Master copy over the mh_temp.saved_states
so that if we need to start the Slave in the future, it has a recent status.
If MisterHouse is NOT running on the Master and MisterHouse is NOT running on the Slave,
start MisterHouse on the Slave and email what happened