哪个Mapper代码的内存效率更高？-Java 学习之路

我想构建一个具有的Mapreduce

Input ：

key1 \ t3,2 | 17412 | 553 | 15186,19199 | 15186,3947 | 15186,5938 | 15186,15517 key2 \ t925 | 10295 | 65182555,7344 | 7344,925 | 10295,7344 | 3,2 key3 \ t8747 | 18466 | 13289 | 3,2 | 13289,5106 | 12222,5106 | 5106,6374 ........

Output ：min \ t（2,3）这是value1的每个元素，value2的每个元素，....和valueN之间的交集 .

所以，我设计了我的映射器

mapper1将包含key1，key2，key3的值之间的交集，

mapper2将包含key4，key5，key6的值之间的交叉点...

.......

然后我的Reducers再次从这些映射器中获取结果以找到最终的交叉点 . 所以，基本上我的mapper和reducer使用相同的代码 . 在我的代码中，我按顺序找到交集，即首先找到value1和value2之间的交集，然后结果将用于与value3交叉，依此类推 .

我的Mapper .

Mapper-Code1:

public static class MapAPP extends Mapper<Text, Text, Text, Text>{     
    public static int j=0,k=0;
    public static List<String> min_pre = new ArrayList<>();
    public static List<String> min_current = new ArrayList<>();
    public static Set<String> min_p1 = new HashSet<>();
    public static Set<String> min_c1 = new HashSet<>();
    public static List<String> min_result = new ArrayList<>(); 
    public static Boolean no_exist_min=false;

    public void map(Text key, Text value, Context con) throws IOException, InterruptedException
    {
        String[] v=value.toString().split("\t");
        // aggregate min
        if (no_exist_min==false){
            if (j==0){
                    min_pre= Arrays.asList(v[1].toString().trim().split("\\|"));
                    j=1;
                 }else{
                    min_current= Arrays.asList(v[1].toString().trim().split("\\|")); 
                    for (String p: min_pre){                   
                       min_p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: min_current){
                           min_c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           min_c1.retainAll(min_p1);
                           if (!min_c1.isEmpty()){
                               Joiner m_comma = Joiner.on(",").skipNulls();
                               String buff = m_comma.join(min_c1);
                               if (!min_result.contains(buff))
                                    min_result.add(buff);
                           }                       
                       }                   
                    }
                    if (min_result.isEmpty()){
                        no_exist_min=true;          
                    } else {                    
                        min_pre=new ArrayList(min_result);
                        min_result.clear();                       
                    }
            }                   
        }            
    }

    protected void cleanup(Context con) throws IOException, InterruptedException {
        Joiner m_pipe = Joiner.on("|").skipNulls();
        if (no_exist_min==true){
            con.write(new Text("min"), new Text("no_exist"));
        }else {               
            String min_str = m_pipe.join(min_pre);
            con.write(new Text("min"), new Text(min_str)); 
        }            
    }
}

My Reducer （与Mapper几乎相同）：

public static class ReduceAPP extends Reducer<Text, Text, Text, Text>
{
    public void reduce(Text key, Iterable<Text> values, Context con) throws IOException, InterruptedException
    {
        List<String> pre = new ArrayList<>();
        List<String> current = new ArrayList<>();
        Set<String> p1 = new HashSet<>();
        Set<String> c1 = new HashSet<>();
        List<String> result = new ArrayList<>();
        Joiner comma = Joiner.on(",").skipNulls(); 
        Joiner pipe = Joiner.on("|").skipNulls(); 
        Boolean no_exist=false;
        int i=0;
        // aggregate
        for(Text value: values){
             if (value.toString().trim()=="no_exist"){
                 no_exist=true;
                 break;
                }
             if (i==0){
                    pre= Arrays.asList(value.toString().trim().split("\\|"));
                    i=1;
             }else{
                    current= Arrays.asList(value.toString().trim().split("\\|")); 
                    for (String p: pre){                   
                       p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: current){
                           c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           c1.retainAll(p1);
                           if (!c1.isEmpty()){
                               String buff = comma.join(c1);
                               if (!result.contains(buff))
                                    result.add(buff);
                           }                       
                       }                   
                    }
                    if (result.isEmpty()){
                        no_exist=true;
                        break;
                    }
                    pre=new ArrayList(result);
                    result.clear();                       
             }                   

        }
        if (no_exist==true){
            con.write(key, new Text("no_exist"));
        }
        else{
            String preStr = pipe.join(pre);
            con.write(key, new Text(preStr)); 
        }            
    }
    public static <T> Set<T> union(Set<T> setA, Set<T> setB) {
        Set<T> tmp = new TreeSet<T>(setA);
        tmp.addAll(setB);
        return tmp;
    }
}

我在小输入文件上运行完美但在大文件中总是内存不足（~450Mb文本文件） . 所以，我怀疑我的java代码不是内存效率 . 在我的Reducers中，我使用了所有局部变量，当这些Reducer函数完成时，这些变量将被销毁，所以我不担心Reducers . 但是在我的Mapper中，我必须使用静态变量 . 在我的Mapper-code1中，我使用了所有静态变量，而在我的Mapper-code2中，我尝试使用尽可能少的静态变量 .

我有两个问题？

1) In my Mapper-code1, every static variable is shared between mappers or it is exclusively for 1 mapper? 例如，假设我有5个映射器，是否会创建1个min_pre列表并在5个映射器之间共享，或者5个映射器会有5个min_pre列表吗？我想要的是后者 . 如何设计我的映射器，以便如果我有5个映射器，将有5个min_pre列表？

2) Mapper-code1 and Mapper-code2, which consumes less memory?

Mapper-Code2:

public static class MapAPP extends Mapper<Text, Text, Text, Text>{     
    public static int j=0,k=0;
    public static List<String> min_pre = new ArrayList<>();
    public static List<String> min_result = new ArrayList<>(); 
    public static Boolean no_exist_min=false;

    public void map(Text key, Text value, Context con) throws IOException, InterruptedException
    {
        String[] v=value.toString().split("\t");
        // aggregate min
        if (no_exist_min==false){
            if (j==0){
                    min_pre= Arrays.asList(v[1].toString().trim().split("\\|"));
                    j=1;
                 }else{
                    List<String> min_current= Arrays.asList(v[1].toString().trim().split("\\|")); 
                    for (String p: min_pre){                   
                       Set<String> min_p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: min_current){
                           Set<String> min_c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           min_c1.retainAll(min_p1);
                           if (!min_c1.isEmpty()){
                               Joiner m_comma = Joiner.on(",").skipNulls();
                               String buff = m_comma.join(min_c1);
                               if (!min_result.contains(buff))
                                    min_result.add(buff);
                           }                       
                       }                   
                    }
                    if (min_result.isEmpty()){
                        no_exist_min=true;          
                    } else {                    
                        min_pre=new ArrayList(min_result);
                        min_result.clear();                       
                    }
            }                   
        }            
    }

    protected void cleanup(Context con) throws IOException, InterruptedException {
        Joiner m_pipe = Joiner.on("|").skipNulls();
        if (no_exist_min==true){
            con.write(new Text("min"), new Text("no_exist"));
        }else {               
            String min_str = m_pipe.join(min_pre);
            con.write(new Text("min"), new Text(min_str)); 
        }            
    }
}

哪个Mapper代码的内存效率更高？

相关问题