首页 文章

在hadoop程序中压缩映射输出结果异常

提问于
浏览
0

在Hadoop程序中,我试图压缩 Map 结果,我写了下面的代码:

conf.setBoolean("mapred.compress.map.output",true);
conf.setClass("mapred.map.output.compression.codec",GzipCodec.class,CompressionCodec.class);

并运行它,我得到以下异常,任何人都知道原因?

WARN mapred.LocalJobRunner: job_local1149103367_0001 
java.io.IOException: not a gzip file  
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)    
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)   
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.mapred.IFile$Reader.positionToNextRecord(IFile.java:400)
at org.apache.hadoop.mapred.IFile$Reader.nextRawKey(IFile.java:425)
at org.apache.hadoop.mapred.Merger$Segment.nextRawKey(Merger.java:323)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:613)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:558)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:70)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445)

今天,我再次测试它,我发现如果在创建作业对象之前放置2行,

Job job = new Job(conf, "MyCounter");

错误会发生,如果在那之后,不会发生错误,为什么会发生这种情况?

1 回答

  • 0

    你使用的是MRv1还是MRv2?如果您使用的是MRv2,请使用以下作业配置 .

    config.setBoolean("mapreduce.output.fileoutputformat.compress", true); config.setClass("mapreduce.output.fileoutputformat.compress.codec",GzipCodec.class,CompressionCodec.class);

    另外你可以设置

    config.set("mapreduce.output.fileoutputformat.compress.type",CompressionType.NONE.toString());

    BLOCK | NONE | RECORD是三种类型的压缩 .

相关问题