-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Spark to the cluster #18
base: 2.7
Are you sure you want to change the base?
Conversation
I have continued to add changes to my fork: fiddled with the HDFS replication (so files aren't available on every node, which is realistic) and updated version of the tools (to Hadoop 2.7.1 and other current versions). Certainly feel free to cherry-pick as necessary if these aren't considered relevant to this project's goals. |
…a will be on all nodes that way)
…a will be on all nodes that way)
…cluster into hive Conflicts: manifests/master-single.pp manifests/master.pp modules/phoenix/manifests/init.pp
Looks cool! I may fork off this to add parquet-tools (https://github.com/Parquet/parquet-mr/tree/master/parquet-tools) |
Greg, this is really great! One thing: hbase has moved from 1.1.1->1.1.2. The build only works for me if I make that change in |
Fix a bug in vagrant file causing multiple initializations of puppet
This adds Spark 1.4.0 to the cluster setup. I have tested it a little: spark jobs can access HDFS files (as hdfs://master.local:9000/home/vagrant/...) and jobs can be sent out to the cluster with a command like this:
The download required during the provisioning is about 240MB: I don't know if that's enough to make you think that leaving the spark manifest commented out in manifests/master-single.pp is wise.
I haven't updated the README: again, I'm not sure if it's worth advertising there.