Running the kite-sdk commands in mapreduce mode #426

malathit · 2015-12-17T13:15:31Z

Hi,

I had a look at the kite dataset code and found that kite internally uses apache crunch to run map reduce pipeline.

In my case, I invoke the kite cli from oozie to import the json data. But I noticed that by default, the apache crunch program is running mapreduce in LocalRunner mode. If I want to run the program in distributed mapreduce mode, how do I achieve that?

Regards,
Malathi

rdblue · 2015-12-17T17:17:58Z

Kite will use MR on the cluster if both source and destination datasets are distributed. So Local to HDFS uses the local runner, while HDFS to Hive uses MR.

malathit · 2015-12-18T02:40:29Z

Hi,

Thanks for the reply. In my case, I am using the data in hdfs to be written to the hive dataset created by hive. But still the program runs as localrunner. Any ideas if I have missed something obvious?

rdblue · 2015-12-18T16:43:11Z

What is the command you're running? If you don't specify hdfs:/... then Kite assumes you mean local. So if you run hdfs -put file.csv and then run kite-dataset csv-import file.csv ... Kite will find and use the local version instead of the one you just put in HDFS. You have to use the full URI like this: kite-dataset csv-import hdfs:/user/me/file.csv ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the kite-sdk commands in mapreduce mode #426

Running the kite-sdk commands in mapreduce mode #426

malathit commented Dec 17, 2015

rdblue commented Dec 17, 2015

malathit commented Dec 18, 2015

rdblue commented Dec 18, 2015

Running the kite-sdk commands in mapreduce mode #426

Running the kite-sdk commands in mapreduce mode #426

Comments

malathit commented Dec 17, 2015

rdblue commented Dec 17, 2015

malathit commented Dec 18, 2015

rdblue commented Dec 18, 2015