Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output with LZO compression #8

Open
joshuaclausen opened this issue Nov 22, 2014 · 3 comments
Open

Output with LZO compression #8

joshuaclausen opened this issue Nov 22, 2014 · 3 comments

Comments

@joshuaclausen
Copy link

LZO seems to almost work; I'm not sure if it's known to or if there is a slight bug. I hope I'm missing something everyone else can see

When I run this command:
hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush
--compress com.hadoop.compression.lzo.LzopCodec
--input-format text
/user/hive/warehouse/test/actionlog/
/user/hive/warehouse/temp/test/actionlog/
20101121121212

It completes the map and reduce tasks then throws an exception at the very end. The part I'm curious about is that it seems to be trying to be expecting sequence file, when really it's a text file. I see the same results if I specify the --output-format to be either sequence or text:

Exception in thread "main" java.io.EOFException: Premature EOF from inputStream
at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
at com.hadoop.compression.lzo.LzopInputStream.(LzopInputStream.java:54)
at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1916)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
at com.m6d.filecrush.crush.Crush.moveOutput(Crush.java:824)
at com.m6d.filecrush.crush.Crush.run(Crush.java:668)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

@edwardcapriolo
Copy link
Owner

I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4?

But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful

@joshuaclausen
Copy link
Author

It seems people are recommending lzo for the splittable compression aspect, particularly for Impala queries. That's what has been motivating this particular attempt.

I'll see if I can identify code changes that might fix this.

Sent from my Windows Phone


From: edwardcapriolomailto:[email protected]
Sent: ý11/ý22/ý2014 8:12 AM
To: edwardcapriolo/filecrushmailto:[email protected]
Cc: Joshua Clausenmailto:[email protected]
Subject: Re: [filecrush] Output with LZO compression (#8)

I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4?

But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful


Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-64085183.

@venkat-phani22
Copy link

Can you help to know the LZO compression Technique in this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants