-
Notifications
You must be signed in to change notification settings - Fork 179
Document high availability for Lita #159
Comments
Redis HA can be handled by using Sentinel. Lita itself I have no idea :/ |
@brodock we have the same understanding as well. In short term, redis HA can be done using sentinel, for lita server itself, i was thinking it will be nice to add conul/etcd based leader election support. That will enable us to run multiple lita servers in different DC while only one being active any given time, and others can take over if the leader dies. |
I mentioned this briefly to Tristan the other day, but the way I would approach this is the way the podmaster program works for the Kubernetes scheduler and controller manager. It uses a distributed lock via etcd to ensure that only one instance of the application is running at a time and that another one starts up if the one that was running stops. In short: Each host periodically attempts to set the value of some key K to its own hostname with a TTL of T. It does the set via an atomic compare-and-swap to avoid race conditions. For each host H, there are three possible results:
This could probably be implemented with Redis instead of etcd—I think Redis also supports atomic operations. As far as it being part of Lita itself, it would be possible, but I'd rather see it prototyped as a either another program or a Lita plugin first. For reference, documentation on the Kubernetes podmaster which does this can be found here: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/admin/high-availability.md#master-elected-components. The code for the podmaster program itself is here: https://github.com/kubernetes/contrib/tree/be436560df6fa839fb92a2f88ae4c4b7da4e58e4/pod-master |
If you want to do this with consul, you can use the |
Probably relevant to this discussion, after solving the problem of Lita itself being HA you'll need a data store that offers replication and failover—Sentinel is probably okay as long as you don't care about losing data as part of the failover process, and possibly ending up with multiple redis masters: https://aphyr.com/posts/287-asynchronous-replication-with-failover. You'll need a different datastore if you need consistent data while running stores fail over. |
@esigler @ranjib and I have been talking about how we can ensure that our Lita instance keeps running in the event of a host failure. It would be very helpful to have some information regarding a recommended deployment scenario for a high availability setup. I'll let them chime in with specific issues so that we can keep track of this discussion publicly.
The text was updated successfully, but these errors were encountered: