-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output with LZO compression #8
Comments
I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4? But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful |
It seems people are recommending lzo for the splittable compression aspect, particularly for Impala queries. That's what has been motivating this particular attempt. I'll see if I can identify code changes that might fix this. Sent from my Windows Phone From: edwardcapriolomailto:[email protected] I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4? But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful — |
Can you help to know the LZO compression Technique in this ? |
LZO seems to almost work; I'm not sure if it's known to or if there is a slight bug. I hope I'm missing something everyone else can see
When I run this command:
hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush
--compress com.hadoop.compression.lzo.LzopCodec
--input-format text
/user/hive/warehouse/test/actionlog/
/user/hive/warehouse/temp/test/actionlog/
20101121121212
It completes the map and reduce tasks then throws an exception at the very end. The part I'm curious about is that it seems to be trying to be expecting sequence file, when really it's a text file. I see the same results if I specify the --output-format to be either sequence or text:
Exception in thread "main" java.io.EOFException: Premature EOF from inputStream
at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
at com.hadoop.compression.lzo.LzopInputStream.(LzopInputStream.java:54)
at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1916)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
at com.m6d.filecrush.crush.Crush.moveOutput(Crush.java:824)
at com.m6d.filecrush.crush.Crush.run(Crush.java:668)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The text was updated successfully, but these errors were encountered: