为什么reducer有不同的输入/输出键，hadoop map / reduce中的值？

提问于 2024-04-28T21:24:26+08:00

浏览次

2

由于Map / Reduce应用程序的性质， reduce 函数可能被调用多次，因此输入/输出键值必须与Map / Reduce MongoDB的实现相同 . 我想知道为什么在Hadoop实现中它有所不同？（我最好说允许它是不同的）

org.apache.hadoop.mapreduce.Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

第二个问题：hadoop如何知道reduce函数的输出应该在下次运行时再次返回还是写入HDFS？例如：

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>
    public void reduce(Text key, Iterable<IntWritable> values, Context context) {
        context.write(key, value) /* this key/value will be returned to reduce in next run or will be written to HDFS? */
    } 
}

1 回答

2

考虑输入是文档名称（作为键）和文档行（值）并且结果是行长度的STDDEV（标准偏差）的示例 .
概括 - 聚合类型不必匹配输入数据的类型 . 因此，Hadoop为开发人员留下了自由 .
对于你的第二个问题 - Hadoop没有类似于MongoDB增量MapReduce的机制，因此reducer的结果总是保存到HDFS（或其他DFS）并且永远不会返回减少 .

回复于 2024-04-28T21:24:26+08:00

相关问题