我使用MPI和非阻塞通信在C中进行矩阵乘法算法 . 我已经尝试了http://siber.cankaya.edu.tr/ozdogan/GraduateParallelComputing.old/ceng505/node133.html的示例代码,但我无法让它工作 .

使用单个节点执行时它可以正常工作,但不止于此而且它还没有完成 .

我试图在4个节点上使用随机4 * 4矩阵 . 从我可以收集的内容绝对没有调试工具,除了printf进程,等级0完成执行该功能,但其他进程卡在第68行的MPI_Wait上 . 我是C和并行编程的新手,无法弄清楚是什么错误 .

编辑:在调试过程中,我发现了一个令人难以置信的令人困惑的事情 - 调用MPI_Barrier似乎挂起了整个程序 .

在乘法功能完成后,我将主要调用MPI_Barrier . 为了使调试更容易,我决定将功能代码分成几部分并用障碍物分开 . 将屏障放置在主计算循环的末尾(以便每个进程必须完成一次迭代以便任何进程启动下一个进程)导致循环甚至没有达到第二次迭代 .

似乎当非阻塞发送/接收和随后调用MPI_Wait后单个进程到达障碍时,它会导致其他进程无限期地等待并且永远不会到达障碍 .

这是一个示例代码:

void generateArray(int * vector, int size){
int i;
for (i = 0; i < size; i ++){
    vector[i] = rand()%10 + 1;
}
}

int test_size = 4;

int main( int argc, char **argv )
{
MPI_Init (&argc, &argv); 

int comm_size, myrank;
int my2drank, mycoords[2]; 
MPI_Comm_size(MPI_COMM_WORLD, &comm_size); 
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);


MPI_Request requests[4];
MPI_Status status[4];

MPI_Comm comm_2d;
int dims[2], periods[2];
dims[0] = dims[1] = sqrt(comm_size);
periods[0] = periods[1] = 1; 
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 1, &comm_2d);

int sizelocal = test_size/dims[0];

srand(time(NULL));
int a[sizelocal*sizelocal], b[sizelocal*sizelocal];
generateArray(a, sizelocal*sizelocal);                              
generateArray(b, sizelocal*sizelocal);

MPI_Comm_rank(comm_2d, &my2drank); 
MPI_Cart_coords(comm_2d, my2drank, 2, mycoords); 

int uprank, downrank, leftrank, rightrank; 
MPI_Cart_shift(comm_2d, 0, -1, &rightrank, &leftrank);
MPI_Cart_shift(comm_2d, 1, -1, &downrank, &uprank); 

int *buffersA[2], *buffersB[2];

buffersA[0] = a; 
buffersA[1] = (int *)malloc(sizelocal*sizelocal*sizeof(int)); 
buffersB[0] = b; 
buffersB[1] = (int *)malloc(sizelocal*sizelocal*sizeof(int));

int shiftsource, shiftdest; 
MPI_Cart_shift(comm_2d, 0, -mycoords[0], &shiftsource, &shiftdest); 
MPI_Sendrecv_replace(buffersA[0], sizelocal*sizelocal, MPI_INT,shiftdest, 1, shiftsource, 1, comm_2d, &status[0]); 

MPI_Cart_shift(comm_2d, 1, -mycoords[1], &shiftsource, &shiftdest); 
MPI_Sendrecv_replace(buffersB[0], sizelocal*sizelocal, MPI_INT,shiftdest, 1, shiftsource, 1, comm_2d, &status[0]);

printf("Rank %d at point 1\n", my2drank);    

int i;
for (i = 0; i < 2; i ++){
    printf("Rank %d started iteration %d\n", my2drank, i);
    MPI_Isend(buffersA[i%2], sizelocal*sizelocal, MPI_INT,leftrank, 1, comm_2d, &requests[0]); 
    MPI_Isend(buffersB[i%2], sizelocal*sizelocal, MPI_INT,uprank, 1, comm_2d, &requests[1]); 
    MPI_Irecv(buffersA[(i+1)%2], sizelocal*sizelocal, MPI_INT,rightrank, 1, comm_2d, &requests[2]); 
    MPI_Irecv(buffersB[(i+1)%2], sizelocal*sizelocal, MPI_INT,downrank, 1, comm_2d, &requests[3]);

    MPI_Waitall(4, requests, status);
    printf("Rank %d stopped waiting in iteration %d\n", my2drank, i);
    MPI_Barrier(comm_2d);
}

printf("Rank %d at point 2\n", my2drank);

MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}

这是输出:

Rank 0 at point 1
Rank 0 started iteration 0
Rank 1 at point 1
Rank 1 started iteration 0
Rank 2 at point 1
Rank 2 started iteration 0
Rank 3 at point 1
Rank 3 started iteration 0
Rank 0 stopped waiting in iteration 0