首页 文章

MPI Fox的算法非阻塞发送和接收

提问于
浏览
0

我是MPI的新手,我正在尝试编写Fox算法的实现(AxB = C,其中A和B是维度为nxn的矩阵) . 我的程序运行正常,但我想看看我是否可以通过在矩阵B中的块移位与产品矩阵的计算之间重叠通信来加速它(B的块矩阵在周期中向上移动)算法) . 2D笛卡尔网格中的每个过程根据算法具有来自矩阵A,B和C的块 . 我现在拥有的是这个,这是福克斯的算法

if (stage > 0){  


   //shifting b values in all proccess

    MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm);
    MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);   
    MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);                         
    MPI_Wait(&my_request1, &status);
    MPI_Wait(&my_request2, &status);
    multiplyMatrix(a_temp,b,c,n_local);
}

子矩阵a_temp,b,b_temp是double类型的指针,指向块n / numprocess * n / numprocesses(这是块矩阵的大小,例如b =(double *)calloc(n / numprocess * n / numprocesses,sizeof) (双))) .

我想在MPI_Wait调用之前有multiplyMatrix函数(这将构成通信和计算的重叠),但我不知道该怎么做 . 我需要有两个独立的缓冲区,并在不同的阶段交替使用它们吗?

(我知道我可以使用MPI_Sendrecv_replace,但这对重叠没有帮助,因为它使用阻塞发送和接收 . 对于MPI_Sendrecv也是如此)

1 回答

  • 0

    我实际上想出了如何做到这一点 . 这个问题应该被删除 . 但由于我是MPI的新手,我会在这里发布这些解决方案,如果有人提出改进建议,我会很高兴,如果他们分享它们 . 方法1:

    // Fox's algorithm
     double * b_buffers[2];
     b_buffers[0] = (double *) malloc(n_local*n_local*sizeof(double));
     b_buffers[1] = b;
     for (stage =0;stage < q; stage++){
           //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row
            for (i=0;i< n_local*n_local; i++)
                a_temp[i]=a[i];
            if (stage == 0) {
               MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm);
               multiplyMatrix(a_temp,b,c,n_local);
               MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);    
               MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);
               MPI_Wait(&my_request2, &status);
               MPI_Wait(&my_request1, &status);
          }
    
    
           if (stage > 0)
           {        
               //shifting b values in all procces
                MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm);
                MPI_Isend(b_buffers[(stage)%2], n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);    
                MPI_Irecv(b_buffers[(stage+1)%2], n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);
                    multiplyMatrix(a_temp, b_buffers[(stage)%2], c, n_local);           
                MPI_Wait(&my_request2, &status);
                MPI_Wait(&my_request1, &status);
    
         }      
    }
    

    方法2:

    // Fox's algorithm
    
     for (stage =0;stage < q; stage++){
           //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row
            for (i=0;i< n_local*n_local; i++)
                a_temp[i]=a[i];
            if (stage == 0) {
               MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm);
               multiplyMatrix(a_temp,b,c,n_local);
               MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);    
               MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);
               MPI_Wait(&my_request2, &status);
               MPI_Wait(&my_request1, &status);
          }
    
    
           if (stage > 0)
           {        
               //shifting b values in all proccess
                memcpy(b_temp, b, n_local*n_local*sizeof(double));
                    MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm);
                MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);   
                    MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);
                    multiplyMatrix(a_temp, b_temp, c, n_local);         
                   MPI_Wait(&my_request2, &status);
                    MPI_Wait(&my_request1, &status);
    
         }
    

    这两个似乎都有效,但正如我所说,我是MPI的新手,如果您有任何意见或建议,请分享 .

相关问题