Resilience

It is essential that Smart Home systems are reliable.

Monolith Vs Distributed

MisterHouse is perfectly capable of running all of your smart home on its own as a Monolith, however if you have hardwired sensors e.g. wired PIRs, CM11, or 1-wire temperature sensors it becomes the single point of failure and it is very difficult to have redundant standby instances to take over automatically in the event of a failure.

The resolution to this is to implement a distributed architecture with wireless communications between the sensors and MisterHouse. For example the ESP8266 developer boards cost less than $5, have wifi and up to 5 or 6 usable I/O pins, and when loaded with the Tasmota firmware have drivers for many devices (PIR, 1-wire etc..) and MQTT and a web server to communicate with MisterHouse.

Unfortunately WiFi based systems tend to use too much power for Battery power supplies to be on all of the time. Sometimes you can overcome this with devices that start and stop just when they need to transmit a signal, but usually you need to resort to a different wireless topology for battery powered 'always-on' devices such as remote controls, and then have a powered gateway device to link them to MisterHouse.

Increasing Reliability of a single MisterHouse instance

The following techniques increase the reliability of a single MisterHouse installation;

MisterHouse Watchdog
Use a hardware watchdog e.g. The Raspberry Pi Watchdog
Use SSD's instead of SD cards. See here for how to minimise disk writes.
Have a backup/recovery strategy

With the widespread availability of very inexpensive, powerful computers like the Raspberry Pi, it is now viable to build in redundancy.

Simple Redundant design

This design has two identical Raspberry Pis with identical software, one with a hostname of Master and the other Slave. If MisterHouse stops responding on the Master for any reason, the Slave takes over.

CAVEAT: This technique only works where the MisterHouse instance is not directly connected to the sensors, so for example it wouldn't work if you have only have one CM11 connected to the serial port of the Master, where once the Master dies, connection to that CM11 stops, you would need a CM11 connected to each MisterHouse instance.

Use the hostname in scripts and MisterHouse to decide how to behave so the code on both machines is identical.

Both machines run a crontab script every few minutes, the slave checks the Master and the Master runs the watchdog script to monitor itself e.g.

THISHOST=$(hostname -s)
case $THISHOST in
    Master )   
      #echo "Starting watchcron"     
      /home/pi/mh/scripts/watchcron.sh;;
   
    Slave )  
      #echo "Starting launchslave"       
      perl /home/pi/mh/scripts/launchslave.pl ;;
    * )               
      echo "ERROR: Invalid hostname"
      exit 1 ;;
esac

The Master watchcron.sh looks like this and restarts Misterhouse if the watchdog file is older than 5 minutes:

#!/bin/sh
WATCHDOGFILE=$HOME/mh/data/watchdog
# if watchdog hasn't been touched in 5 minutes, restart MisterHouse
MAXDIFFERENCE=300
watchdog=`stat --format=%Y $WATCHDOGFILE`;
now=`date +%s`;
difference=$(( $now - $watchdog))

if [ $difference -gt $MAXDIFFERENCE ]; then
 echo "Misterhouse needs restart"
 echo `date` Restarting Misterhouse >> $HOME/mh/data/logs/watchcron.log
 touch $WATCHDOGFILE
 sudo systemctl start misterhouse

fi

The launchslave.pl script does the following

    Quit if run on the Master machine, otherwise;

    Ping the Master's http port to detect if MisterHouse is running

    If MisterHouse is running on the Master and MisterHouse is also running on the Slave, 
    stop MisterHouse on the Slave and email what happened.

    If MisterHouse is running on the Master copy over the mh_temp.saved_states 
    so that if we need to start the Slave in the future, it has a recent status.

    If MisterHouse is NOT running on the Master and MisterHouse is NOT running on the Slave, 
    start MisterHouse on the Slave and email what happened

Main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resilience

Monolith Vs Distributed

Increasing Reliability of a single MisterHouse instance

Simple Redundant design

Clone this wiki locally