在oozie工作流中读取avro数据文件时出错 - 类与新的 Map API模式不兼容-Java 学习之路

我正在尝试使用avro数据文件作为输入和输出从oozie工作流运行MR作业 . Mapper发出Text和IntWritable . 我正在使用一个新的先生api - mapreduce . 我的工作流程定义如下：

<workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
<global>
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
        <property>
            <name>mapreduce.job.queuename</name>
            <value>${queueName}</value>
        </property>
    </configuration>
</global>

    <start to="mr-node"/>

    <action name="mr-node">
        <map-reduce>
            <prepare>
                <delete path="${nameNode}/${outputDir}"/>
            </prepare>
            <configuration>
                <!-- BEGIN: SNIPPET TO ADD IN ORDER TO MAKE USE OF NEW HADOOP API -->
                <property>
                  <name>mapred.reducer.new-api</name>
                  <value>true</value>
                </property>
                <property>
                  <name>mapred.mapper.new-api</name>
                  <value>true</value>
                </property>
                <!-- END: SNIPPET -->
                <property>
                    <name>mapreduce.map.class</name>
                    <value>com.ncr.bigdata.mr.avro.AvroPifDriver$PifMapper</value>
                </property>
                <property>
                    <name>mapreduce.reduce.class</name>
                    <value>com.ncr.bigdata.mr.avro.AvroPifDriver$PifReducer</value>
                </property>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${nameNode}/${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${nameNode}/${outputDir}</value>
                </property>                 
                <property>
                    <name>mapred.input.format.class</name>
                    <value>org.apache.avro.mapreduce.AvroKeyInputFormat</value>
                </property>
                <property>
                    <name>avro.schema.input.key</name>
                    <value>{"type":"record","name":"SampleRecord","namespace":"org.co.sample.etl.domain","fields":[{"name":"requiredName","type":"string"},{"name":"optionalName","type":["null","string"]},{"name":"dataItemLong","type":"long"},{"name":"dataItemInt","type":"int"},{"name":"startTime","type":"long"},{"name":"endTime","type":"long"}]}</value>
                </property>


                <property>
                    <name>mapred.output.format.class</name>
                    <value>org.apache.avro.mapreduce.AvroKeyValueOutputFormat</value>
                </property> 
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.avro.mapred.AvroKey</value>
                </property>   
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.avro.mapred.AvroValue</value>
                </property> 

                <property>
                    <name>avro.schema.output.key</name>
                    <value>string</value>
                </property> 
                <property>
                    <name>avro.schema.output.value</name>
                    <value>int</value>
                </property> 

            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

我的映射器看起来如下：

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

 public static class PifMapper extends Mapper<AvroKey<PosData>, NullWritable, Text, IntWritable> {

        @Override
        public void map(AvroKey<PosData> key, NullWritable value, Context context)
                throws IOException, InterruptedException {
        ...
        }
}

我收到以下错误：

140807041959771-oozie-oozi-W@mr-node] Launcher exception: mapred.input.format.class is incompatible with new map API mode.
java.io.IOException: mapred.input.format.class is incompatible with new map API mode.
    at org.apache.hadoop.mapreduce.Job.ensureNotSet(Job.java:1172)
    at org.apache.hadoop.mapreduce.Job.setUseNewAPI(Job.java:1198)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1261)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
...

我使用的是Hadoop 2.2.0（HDP 2.0），Oozie 4.0.0，Avro 1.7.4

通过驱动程序类提交的Map减少作业工作正常 . org.apache.avro.mapreduce.AvroKeyInputFormat也应该是新mapreduce的实现 .

为了确保没有lib冲突，我从ozzie中删除了共享库，并且所有lib都包含在工作流lib dir中 .

任何提示？

2 回答

所有这些只是找到正确的作业配置属性 . 过时的文档有点误导 . 以下为我们工作：

<workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
<global>
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
        <property>
            <name>mapreduce.job.queuename</name>
            <value>${queueName}</value>
        </property>
    </configuration>
</global>

    <start to="mr-node"/>

    <action name="mr-node">
        <map-reduce>
            <prepare>
                <delete path="${nameNode}/${outputDir}"/>
            </prepare>
            <configuration>
                <!-- BEGIN: SNIPPET TO ADD IN ORDER TO MAKE USE OF NEW HADOOP API -->
                <property>
                  <name>mapred.reducer.new-api</name>
                  <value>true</value>
                </property>
                <property>
                  <name>mapred.mapper.new-api</name>
                  <value>true</value>
                </property>
                <!-- END: SNIPPET -->
                <property>
                    <name>mapreduce.job.map.class</name>
                    <value>com.ncr.bigdata.mr.avro.AvroPifDriver$PifMapper</value>
                </property>
                <property>
                    <name>mapreduce.job.reduce.class</name>
                    <value>com.ncr.bigdata.mr.avro.AvroPifDriver$PifReducer</value>
                </property>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapreduce.input.fileinputformat.inputdir</name>
                    <value>${nameNode}/${inputDir}</value>
                </property>
                <property>
                    <name>mapreduce.output.fileoutputformat.outputdir</name>
                    <value>${nameNode}/${outputDir}</value>
                </property>                 
                <property>
                    <name>mapreduce.job.inputformat.class</name>
                    <value>org.apache.avro.mapreduce.AvroKeyInputFormat</value>
                </property>

                <property>
                    <name>avro.schema.input.key</name>
                    <value>{"type":"record","name":"SampleRecord","namespace":"org.co.sample.etl.domain","fields":[{"name":"requiredName","type":"string"},{"name":"optionalName","type":["null","string"]},{"name":"dataItemLong","type":"long"},{"name":"dataItemInt","type":"int"},{"name":"startTime","type":"long"},{"name":"endTime","type":"long"}]}</value>
                </property>


                <property>
                    <name>mapreduce.job.outputformat.class</name>
                    <value>org.apache.avro.mapreduce.AvroKeyValueOutputFormat</value>
                </property> 



                <property>
                    <name>mapreduce.map.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>   
                <property>
                    <name>mapreduce.map.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property> 


                <property>
                    <name>mapreduce.job.output.key.class</name>
                    <value>org.apache.avro.mapred.AvroKey</value>
                </property>   
                <property>
                    <name>mapreduce.job.output.value.class</name>
                    <value>org.apache.avro.mapred.AvroValue</value>
                </property> 

                <property>
                    <name>avro.schema.output.key</name>
                    <value>"string"</value>
                </property> 
                <property>
                    <name>avro.schema.output.value</name>
                    <value>"int"</value>
                </property> 

            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

回复于 2024-04-29T00:01:38+08:00

3

我想这会对你有所帮助：how to execute mapreduce programs in oozie with hadoop 2.2

回复于 2024-04-29T00:01:38+08:00

在oozie工作流中读取avro数据文件时出错 - 类与新的 Map API模式不兼容

2 回答

相关问题