You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a look at the kite dataset code and found that kite internally uses apache crunch to run map reduce pipeline.
In my case, I invoke the kite cli from oozie to import the json data. But I noticed that by default, the apache crunch program is running mapreduce in LocalRunner mode. If I want to run the program in distributed mapreduce mode, how do I achieve that?
Regards,
Malathi
The text was updated successfully, but these errors were encountered:
Kite will use MR on the cluster if both source and destination datasets are distributed. So Local to HDFS uses the local runner, while HDFS to Hive uses MR.
Thanks for the reply. In my case, I am using the data in hdfs to be written to the hive dataset created by hive. But still the program runs as localrunner. Any ideas if I have missed something obvious?
What is the command you're running? If you don't specify hdfs:/... then Kite assumes you mean local. So if you run hdfs -put file.csv and then run kite-dataset csv-import file.csv ... Kite will find and use the local version instead of the one you just put in HDFS. You have to use the full URI like this: kite-dataset csv-import hdfs:/user/me/file.csv ...
Hi,
I had a look at the kite dataset code and found that kite internally uses apache crunch to run map reduce pipeline.
In my case, I invoke the kite cli from oozie to import the json data. But I noticed that by default, the apache crunch program is running mapreduce in LocalRunner mode. If I want to run the program in distributed mapreduce mode, how do I achieve that?
Regards,
Malathi
The text was updated successfully, but these errors were encountered: