首页 文章

fortran openacc派生类型与可分配

提问于
浏览
0

我读过Fortran派生类型的手动深度复制是可能的,但是下面的简单测试程序在运行时失败了;程序与PGI v16.10完全编译 . 出了什么问题?

program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)

!$acc kernels
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels

   print*,grid%xm

end program Test

我得到的错误是:

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

1 回答

  • 1

    您只需要在kernels指令中添加“present(grid)”子句 .

    这是一个带有修复程序的程序示例,以及一些其他内容,例如更新数据,以便可以在主机上打印 .

    % cat test.f90
    program Test
    
        implicit none
    
        type dt
            integer :: n
            real, dimension(:), allocatable :: xm
        end type dt
    
        type(dt) :: grid
        integer :: i
    
        grid%n = 10
        allocate(grid%xm(grid%n))
    
    !$acc enter data copyin(grid)
    !$acc enter data create(grid%xm)
    !$acc kernels present(grid)
       do i = 1, grid%n
          grid%xm(i) = i * i
       enddo
    !$acc end kernels
    !$acc update host(grid%xm)
       print*,grid%xm
    
    !$acc exit data delete(grid%xm, grid)
       deallocate(grid%xm)
    
    end program Test
    
    % pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
    test:
         16, Generating enter data copyin(grid)
         17, Generating enter data create(grid%xm(:))
         18, Generating present(grid)
         19, Loop is parallelizable
             Accelerator kernel generated
             Generating Tesla code
             19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
         23, Generating update self(grid%xm(:))
        1.000000        4.000000        9.000000        16.00000
        25.00000        36.00000        49.00000        64.00000
        81.00000        100.0000
    

    请注意,PGI 17.7将在Fortran中包含beta支持真正的深层副本 . 与上面的手动深层复制相反 . 这是使用真正的深拷贝的一个例子:

    % cat test_deep.f90
    program Test
    
        implicit none
    
        type dt
            integer :: n
            real, dimension(:), allocatable :: xm
        end type dt
    
        type(dt) :: grid
        integer :: i
    
        grid%n = 10
        allocate(grid%xm(grid%n))
    
    !$acc enter data copyin(grid)
    !$acc kernels present(grid)
       do i = 1, grid%n
          grid%xm(i) = i * i
       enddo
    !$acc end kernels
    !$acc update host(grid)
       print*,grid%xm
    
    !$acc exit data delete(grid)
       deallocate(grid%xm)
    
    end program Test
    
    % pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
    test:
         16, Generating enter data copyin(grid)
         17, Generating present(grid)
         18, Loop is parallelizable
             Accelerator kernel generated
             Generating Tesla code
             18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
         22, Generating update self(grid)
        1.000000        4.000000        9.000000        16.00000
        25.00000        36.00000        49.00000        64.00000
        81.00000        100.0000
    

相关问题