-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIVY-231: Multi node HA for batch sessions #222
base: master
Are you sure you want to change the base?
Conversation
We are trying to get a stable build for 0.3 first for Spark 2.0 and session recovery. Do you mind if we handle HA after 0.3 is released? |
That would work. |
62e1cde
to
b6e52fd
Compare
Codecov Report@@ Coverage Diff @@
## master #222 +/- ##
==========================================
- Coverage 71.53% 70.74% -0.79%
==========================================
Files 91 89 -2
Lines 4697 4601 -96
Branches 811 780 -31
==========================================
- Hits 3360 3255 -105
- Misses 861 910 +49
+ Partials 476 436 -40
Continue to review full report at Codecov.
|
I updated the pull request to fix the merge errors. |
1. stores the session ID in Zookeeper/Filesystem 2. builds a cache on top of Zookeeper. The cache keeps metadata of all batch sessions across all Livy servers connected to it and notifies all livy servers with changes in cache. 3. Adds callback methods from SessionStore to SessionManager. SessionStore watches events in the ZooKeeper cache and calls into proper callback methods in SessionManager. Task-url: https://issues.cloudera.org/browse/LIVY-231
Mocking batchSessionsCache in ZooKeeperStateStore. Without mocking, Livy test fails to start a ZooKeeperStateStore. Task-url: https://issues.cloudera.org/browse/LIVY-231
b6e52fd
to
6e663a8
Compare
Hi, @meisam, @alex-the-man, is there any progress of this issue. Livy ha is import to us. |
Hi @shenh062326, We had enabled MultiNode HA long time back on Paypal and have been using for close to a year and we already submitted 1.3 million spark jobs through livy multinode HA. Even today we presented our updates on Spark Summit and committed to open source all our enhancements. @meisam will send updated PR soon and we will merge it soon. Thanks |
Is there any progress on multi-node HA for interactive sessions? |
This is a preliminary PR for LIVY-231 (https://issues.cloudera.org/browse/LIVY-231) and it has known issues, but we can use it to discuss the design of multi-node HA for Livy.
The PR uses a cache for each Livy node. The cache keeps sessions' metadata in sync with ZooKeeper. Any change in ZooKKeeper data updates signals cache and updates the local copy of the data on Livy nodes.
The cache is implemented using Apache curator's "Path Cache" recipe: http://curator.apache.org/curator-recipes/path-cache.html.
This PR should be revised based on #220 (JIRA ticket: https://issues.cloudera.org/browse/LIVY-239)