Output with LZO compression #8

joshuaclausen · 2014-11-22T00:20:12Z

LZO seems to almost work; I'm not sure if it's known to or if there is a slight bug. I hope I'm missing something everyone else can see

When I run this command:
hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush
--compress com.hadoop.compression.lzo.LzopCodec
--input-format text
/user/hive/warehouse/test/actionlog/
/user/hive/warehouse/temp/test/actionlog/
20101121121212

It completes the map and reduce tasks then throws an exception at the very end. The part I'm curious about is that it seems to be trying to be expecting sequence file, when really it's a text file. I see the same results if I specify the --output-format to be either sequence or text:

Exception in thread "main" java.io.EOFException: Premature EOF from inputStream
at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
at com.hadoop.compression.lzo.LzopInputStream.(LzopInputStream.java:54)
at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1916)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
at com.m6d.filecrush.crush.Crush.moveOutput(Crush.java:824)
at com.m6d.filecrush.crush.Crush.run(Crush.java:668)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

edwardcapriolo · 2014-11-22T16:12:16Z

I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4?

But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful

joshuaclausen · 2014-11-22T19:11:08Z

It seems people are recommending lzo for the splittable compression aspect, particularly for Impala queries. That's what has been motivating this particular attempt.

I'll see if I can identify code changes that might fix this.

Sent from my Windows Phone

From: edwardcapriolomailto:[email protected]
Sent: ý11/ý22/ý2014 8:12 AM
To: edwardcapriolo/filecrushmailto:[email protected]
Cc: Joshua Clausenmailto:[email protected]
Subject: Re: [filecrush] Output with LZO compression (#8)

I ask this this but you really want lzo text rather then snappy sequence? I know lzo has some press back in the day but it feels like people moved on to snappy/lz4?

But I see your point moveOutput is assuming it can open what was produced with a sequence file reader in this case which is not right. I will take a look but any suggestions you can make are useful

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-64085183.

venkat-phani22 · 2021-07-30T16:31:09Z

Can you help to know the LZO compression Technique in this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output with LZO compression #8

Output with LZO compression #8

joshuaclausen commented Nov 22, 2014

edwardcapriolo commented Nov 22, 2014

joshuaclausen commented Nov 22, 2014

venkat-phani22 commented Jul 30, 2021

Output with LZO compression #8

Output with LZO compression #8

Comments

joshuaclausen commented Nov 22, 2014

edwardcapriolo commented Nov 22, 2014

joshuaclausen commented Nov 22, 2014

venkat-phani22 commented Jul 30, 2021