首页 文章

openmpi:如何将一个未连接的数据块从一个级别发送到所有其他级别?

提问于
浏览
1

我正在寻找一种MPI功能/方法,允许从一个进程向所有其他进程提供多个数据块 . 与MPI_Bcast类似,但同时有多个块?

我在根级别上有一个碎片数据块:

#define BLOCKS 5
#define BLOCKSIZE 10000

char *datablock[BLOCKS];
int i;
for (i=0; i<BLOCKS; i++) datablock[i] = (char*)malloc(BLOCKSIZE*sizeof(char))

这只是一个例子,但很明显BLOCKS不是必须相邻的 . 我希望这个数据块传递给所有其他级别(我已经准备好了必要的内存来存储它) .

我注意到有一些方法,如MPI_Gatherv或MPI_Scatterv允许使用置换数组收集或散布碎片数据,问题是散布将每个碎片发送到不同的等级,我需要将所有碎片发送到所有其他等级,类似于带有位移信息的MPI_Bcast,如MPI_Bcastv .

一种解决方案是拥有多个MPI_Bcast调用(每个块一个),但我不确定这是否是执行此操作的最佳方法 .

更新:我将在这里尝试MPI_Ibcast方法,我希望我认为应该工作:

int rank; // rank id
int blocksize = 10000;
int blocknum = 200;
char **datablock = NULL;
char *recvblock = NULL;
MPI_Request *request;
request = (MPI_Request *)malloc(blocknum*sizeof(MPI_Request));
if(rank == 0) {
    // this is just an example in practice those blocks are created one the fly as soon as the last block is filled
    datablock = (char**)malloc(blocknum*sizeof(char*));
    for (i=0; i<BLOCKS; i++) datablock[i] = (char*)malloc(blocksize*sizeof(char));
    for (i=0; i<blocknum; i++)
        MPI_Ibcast(datablock[i], blocksize, MPI_CHAR, 0, MPI_COMM_WORLD, request[i]);
} else {
    // for this example the other threads know allreay how many blocks the rank 0 has created, in practice this information is broadcasted via MPI before the MPI_Ibcast call
    recvblock = (*char)malloc(blocksize*blocknum*sizeof(char));
    for (i=0; i<blocknum; i++)
        MPI_Ibcast(recvblock+i*(blocksize), blocksize, MPI_CHAR, 0, MPI_COMM_WORLD, request[i]);
}
MPI_Waitall(blocknum, request, MPI_STATUSES_IGNORE);

所以,有一个MPI_Waitall缺失,我不知道如何使用它,有一个计数,一组请求和一系列状态需要!?

我对根和其他等级有不同的MPI_Ibcast的原因是,发送缓冲区与接收缓冲区不同 .

另一个问题是,我是否需要对for循环中的每个MPI_Ibcast进行不同的请求,或者我可以重用MPI_request变量,就像我在上面的示例中所做的那样?

UPDATE2:所以我更新了这个例子,我现在使用MPI_Request指针!然后我在定义之后通过malloc调用初始化,这似乎很奇怪我想但这仅仅是一个例子,在实践中所需的请求数量仅在运行时才知道 . 我特别担心如果我可以在这里使用sizeof(MPI_Request)或者这是有问题的,因为这不是标准数据类型?

除此之外,这个例子是正确的吗?如果我想使用MPI_Ibcast,这是一个很好的解决方案吗?

1 回答

  • 1
    • 序列化是个好主意吗?例如,您可以将多个缓冲区复制到一个缓冲区中,然后将其广播,然后在接收方进行解压缩 . It is the way boost.mpi handles complex objects (in c++)

    • 或者,您可以对 MPI_Bcast() 的非阻塞版本使用多次调用: MPI_Ibcast() ,然后调用 MPI_Waitall() .

    • 请注意,您描述的数据看起来像2D数组 . 有一种方法可以以不同的方式分配它,以便整个数据在内存中是连续的:

    int block=42;
    int blocksize=42;
    char **array=malloc(block*sizeof(char*));
    if(array==NULL){fprintf(stderr,"malloc failed\n";exit(1);}
    array[0]=malloc(block*blocksize*sizeof(char));
    if(array[0]==NULL){fprintf(stderr,"malloc failed\n";exit(1);}
    int i;
    for(i=1;i<block;i++){
        array[i]=&array[0][i*blocksize];
    }
    

    然后,对 MPI_Bcast() 的单个调用足以扩展整个数组:

    MPI_Bcast(array[0], block*blocksize, MPI_CHAR,0, MPI_COMM_WORLD);
    

    编辑:这是一个基于您的代码的解决方案,由 mpicc main.c -o main -Wall 编译并由 mpirun -np 4 main 运行:

    #include <mpi.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main(int argc,char *argv[])
    {
    
        int  size, rank;
        MPI_Init(&argc,&argv);
        MPI_Comm_rank(MPI_COMM_WORLD,&rank);
        MPI_Comm_size(MPI_COMM_WORLD,&size);    
    
        int i;
        int blocksize = 10000;
        int blocknum = 200;
        char **datablock = NULL;
        char *recvblock = NULL;
        MPI_Request requests[blocknum];
        MPI_Status status[blocknum];
    
        if(rank == 0) {
            // this is just an example in practice those blocks are created one the fly as soon as the last block is filled
            datablock = malloc(blocknum*sizeof(char*));
            if(datablock==NULL){fprintf(stderr,"malloc failed\n"); exit(1);}
            for (i=0; i<blocknum; i++){
                datablock[i] = (char*)malloc(blocksize*sizeof(char));
                if(datablock[i]==NULL){fprintf(stderr,"malloc failed\n"); exit(1);}
                datablock[i][0]=i%64;
            }
            for (i=0; i<blocknum; i++)
                MPI_Ibcast(datablock[i], blocksize, MPI_CHAR, 0, MPI_COMM_WORLD, &requests[i]);
    
    
        } else {
            // for this example the other threads know allreay how many blocks the rank 0 has created, in practice this information is broadcasted via MPI before the MPI_Ibcast call
            recvblock = malloc(blocksize*blocknum*sizeof(char));
            if(recvblock==NULL){fprintf(stderr,"malloc failed\n"); exit(1);}
            for (i=0; i<blocknum; i++)
                MPI_Ibcast(recvblock+i*(blocksize), blocksize, MPI_CHAR, 0, MPI_COMM_WORLD, &requests[i]);
        }
    
        int ierr=MPI_Waitall(blocknum, requests, status); 
        if(ierr!=MPI_SUCCESS){fprintf(stderr,"MPI_Waitall() failed rank %d\n",rank);exit(1);}
    
    
        if(rank==0){
            for(i=0;i<blocknum;i++){
                free(datablock[i]);
            }
            free(datablock);
        }else{
            for(i=0;i<blocknum;i++){
                if(recvblock[i*(blocksize)]!=i%64){
                    printf("communcation problem ! %d %d %d\n",rank,i, recvblock[i*(blocksize)]);
                }
            }
            free(recvblock);
        }
    
        MPI_Finalize();
        return 0;
    }
    

    我相信最佳实现会在序列化和 MPI_IBcast() 之间混合,以限制内存占用量和消息数量 .

相关问题