首页 文章

复制回主机时CUDA memcpy无法正常工作

提问于
浏览
-1

我的问题是CUDA memcpy从设备复制到主机 . 我的程序使用用C#CUDA包装类编写的GUI和用cudaC编写的核心cuda逻辑 .

这是c#中负责启动所有内容的主要代码:

int[] imgData = srcImg.RgbData8bitInt;
int[] patData = pattern.PatternData;
int[] maskData = pattern.MaskData;
int[] Accumulator = new int[srcImg.Width * srcImg.Height];

IntPtr A_dev = CUDA.MallocInt(srcImg.Width * srcImg.Height);

IntPtr Img_dev = CUDA.MallocInt(imgData.Length);
CUDA.MemcpyToDevice(imgData, Img_dev, imgData.Length);

IntPtr Pat_dev = CUDA.MallocInt(patData.Length);
CUDA.MemcpyToDevice(patData, Pat_dev, patData.Length);

IntPtr Mask_dev = CUDA.MallocInt(maskData.Length);
CUDA.MemcpyToDevice(maskData, Mask_dev, maskData.Length);

int gridSizeX = (srcImg.Width - pattern.Image.Width) / 256 + 1;
int gridSizeY = srcImg.Height - pattern.Image.Width;
int imageWidth = srcImg.Width;

CUDA.Execute(status, gridSizeX, gridSizeY, A_dev, Img_dev, Pat_dev, Mask_dev, imageWidth);
CUDA.SynchronizeContext();

CUDA.MemcpyToHost(Accumulator, A_dev, Accumulator.Length);

顺便说一句 . CUDA.SynchronizeContext()是cudaThreadSynchronize()的包装器;

有问题的部分是最后一行,负责将值从设备复制回主机 .

[DllImport(dllPath, CharSet = CharSet.Ansi, SetLastError = true, CallingConvention = CallingConvention.StdCall)]
private static extern int memcpyToHost(int[] srcPtr, IntPtr devPtr, int size);
extern "C" int __declspec(dllexport) __stdcall memcpyToHost(int* host, int* dev, int size)
{
    if (dev == 0) return 1;
    cudaError_t status = cudaMemcpy(host, dev, size * sizeof(int), cudaMemcpyDeviceToHost);
    if (status == cudaSuccess)
        return 0;
    else
        return 1;
}

我在调试时得到的错误状态是:cudaErrorInvalidValue

分配内存和复制到设备似乎没问题,我已经调试过了 . 我在这里完全不知所措,也许有人遇到过类似的问题?

EDIT: SOLVED 见评论

1 回答

  • 1

    问题是cudaDeviceReset();在内核调用之后放置 .

相关问题