Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Logging needed when fileformat is in question #9

Open
alexmc6 opened this issue Apr 10, 2015 · 1 comment
Open

Better Logging needed when fileformat is in question #9

alexmc6 opened this issue Apr 10, 2015 · 1 comment

Comments

@alexmc6
Copy link

alexmc6 commented Apr 10, 2015

I managed to run filecrush for the first time and after everything seemed to finish successfully I got this error. In fact although it reported loads of files to crush it did not crush any...

Exception in thread "main" java.io.IOException: not a gzip file
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2281)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2304)
at com.m6d.filecrush.crush.Crush.cloneOutput(Crush.java:769)
at com.m6d.filecrush.crush.Crush.run(Crush.java:666)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

My command line

hadoop jar ./filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush --info --clone --verbose --compress gzip --input-format text --output-format text /user/camus/tests/topics/ /user/camus/tests/topics_orig/ 20101121121212

Why does it say "SequenceFile"? I have gzipped json (ie text). Soon to be snappy json

@alexmc6
Copy link
Author

alexmc6 commented Apr 10, 2015

Invoking map reduce

15/04/10 15:51:46 INFO impl.TimelineClientImpl: Timeline service address: http://bruathdp002.redacted.local:8188/ws/v1/timeline/
15/04/10 15:51:46 INFO client.RMProxy: Connecting to ResourceManager at bruathdp002.redacted.local/10.34.37.2:8050
15/04/10 15:51:46 INFO impl.TimelineClientImpl: Timeline service address: http://bruathdp002.redacted.local:8188/ws/v1/timeline/
15/04/10 15:51:46 INFO client.RMProxy: Connecting to ResourceManager at bruathdp002.redacted.local/10.34.37.2:8050
15/04/10 15:51:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 16477 for camus on ha-hdfs:uatcluster
15/04/10 15:51:46 INFO security.TokenCache: Got dt for hdfs://uatcluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:uatcluster, Ident: (HDFS_DELEGATION_TOKEN token 16477 for camus)
15/04/10 15:51:46 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/10 15:51:46 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/10 15:51:46 INFO mapreduce.JobSubmitter: number of splits:3
15/04/10 15:51:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427208025863_5476
15/04/10 15:51:46 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:uatcluster, Ident: (HDFS_DELEGATION_TOKEN token 16477 for camus)
15/04/10 15:51:47 INFO impl.YarnClientImpl: Submitted application application_1427208025863_5476
15/04/10 15:51:47 INFO mapreduce.Job: The url to track the job: http://bruathdp002.redacted.local:8088/proxy/application_1427208025863_5476/
15/04/10 15:51:47 INFO mapreduce.Job: Running job: job_1427208025863_5476
15/04/10 15:51:53 INFO mapreduce.Job: Job job_1427208025863_5476 running in uber mode : false
15/04/10 15:51:53 INFO mapreduce.Job: map 0% reduce 0%
15/04/10 15:52:00 INFO mapreduce.Job: map 33% reduce 0%
15/04/10 15:52:01 INFO mapreduce.Job: map 100% reduce 0%
15/04/10 15:52:10 INFO mapreduce.Job: map 100% reduce 68%
15/04/10 15:52:16 INFO mapreduce.Job: map 100% reduce 69%
15/04/10 15:52:28 INFO mapreduce.Job: map 100% reduce 70%
15/04/10 15:52:47 INFO mapreduce.Job: map 100% reduce 71%
15/04/10 15:53:02 INFO mapreduce.Job: map 100% reduce 72%
15/04/10 15:53:05 INFO mapreduce.Job: map 100% reduce 73%
15/04/10 15:53:11 INFO mapreduce.Job: map 100% reduce 74%
15/04/10 15:53:14 INFO mapreduce.Job: map 100% reduce 75%
15/04/10 15:53:17 INFO mapreduce.Job: map 100% reduce 76%
15/04/10 15:53:20 INFO mapreduce.Job: map 100% reduce 77%
15/04/10 15:53:26 INFO mapreduce.Job: map 100% reduce 78%
15/04/10 15:53:29 INFO mapreduce.Job: map 100% reduce 79%
15/04/10 15:53:32 INFO mapreduce.Job: map 100% reduce 80%
15/04/10 15:53:38 INFO mapreduce.Job: map 100% reduce 81%
15/04/10 15:53:51 INFO mapreduce.Job: map 100% reduce 82%
15/04/10 15:54:06 INFO mapreduce.Job: map 100% reduce 83%
15/04/10 15:54:09 INFO mapreduce.Job: map 100% reduce 84%
15/04/10 15:54:15 INFO mapreduce.Job: map 100% reduce 85%
15/04/10 15:54:34 INFO mapreduce.Job: map 100% reduce 86%
15/04/10 15:54:40 INFO mapreduce.Job: map 100% reduce 87%
15/04/10 15:54:46 INFO mapreduce.Job: map 100% reduce 88%
15/04/10 15:54:58 INFO mapreduce.Job: map 100% reduce 89%
15/04/10 15:55:13 INFO mapreduce.Job: map 100% reduce 90%
15/04/10 15:55:22 INFO mapreduce.Job: map 100% reduce 91%
15/04/10 15:55:31 INFO mapreduce.Job: map 100% reduce 92%
15/04/10 15:55:34 INFO mapreduce.Job: map 100% reduce 93%
15/04/10 15:55:37 INFO mapreduce.Job: map 100% reduce 94%
15/04/10 15:55:43 INFO mapreduce.Job: map 100% reduce 95%
15/04/10 15:55:49 INFO mapreduce.Job: map 100% reduce 96%
15/04/10 15:55:55 INFO mapreduce.Job: map 100% reduce 97%
15/04/10 15:56:01 INFO mapreduce.Job: map 100% reduce 98%
15/04/10 15:56:07 INFO mapreduce.Job: map 100% reduce 99%
15/04/10 15:56:10 INFO mapreduce.Job: map 100% reduce 100%
15/04/10 15:56:11 INFO mapreduce.Job: Job job_1427208025863_5476 completed successfully
15/04/10 15:56:11 INFO mapreduce.Job: Counters: 57
File System Counters
FILE: Number of bytes read=1835766
FILE: Number of bytes written=4168355
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=560020818
HDFS: Number of bytes written=554471025
HDFS: Number of read operations=24674
HDFS: Number of large read operations=0
HDFS: Number of write operations=918
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=30502
Total time spent by all reduces in occupied slots (ms)=995608
Total time spent by all map tasks (ms)=15251
Total time spent by all reduce tasks (ms)=248902
Total vcore-seconds taken by all map tasks=15251
Total vcore-seconds taken by all reduce tasks=248902
Total megabyte-seconds taken by all map tasks=62468096
Total megabyte-seconds taken by all reduce tasks=2039005184
Map-Reduce Framework
Map input records=8221
Map output records=8220
Map output bytes=1812988
Map output materialized bytes=1835778
Input split bytes=816
Combine input records=0
Combine output records=0
Reduce input groups=916
Reduce shuffle bytes=1835778
Reduce input records=8220
Reduce output records=8220
Spilled Records=16440
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=735
CPU time spent (ms)=158640
Physical memory (bytes) snapshot=3666952192
Virtual memory (bytes) snapshot=19778822144
Total committed heap usage (bytes)=4359979008
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
com.m6d.filecrush.crush.MapperCounter
DIRS_ELIGIBLE=916
DIRS_FOUND=1619
DIRS_SKIPPED=703
FILES_ELIGIBLE=8220
FILES_FOUND=8541
FILES_SKIPPED=321
com.m6d.filecrush.crush.ReducerCounter
FILES_CRUSHED=8220
RECORDS_CRUSHED=5426454
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=83317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant