Java 学习之路

7 votes

answers

views

如何使用推力和CUDA流将内存从主机异步复制到设备

我想使用推力将内存从主机复制到设备 thrust::host_vector<float> h_vec(1 << 28); thrust::device_vector<float> d_vec(1 << 28); thrust::copy(h_vec.begin(), h_vec.end(), d_vec.begin()); 使用CUDA流类似于使用...

c++ asynchronous cuda thrust
0 votes

answers

views

无法调用CUDA内存的推力

我试图使用推力库找到一个数组的总和（已经存在于CUDA内存中） . 这里很少有回复，说可以通过使用thrust :: device_ptr包装它，但它给我带来了错误 . 初始代码 cudaMemcpy((void *)(data + stride), (void *)d_output, sizeof(unsigned int) * rows * cols, cudaMemcpyDeviceToHo...

cuda thrust
0 votes

answers

views

没有重载函数“thrust :: remove_if”的实例匹配参数列表

我用remove_if编写了程序 . 它使用由cudaMalloc分配的数组，并由程序的前一部分填充（在设备中） . 删除后，阵列将被下一部分使用（在设备中;没有推力） . 我想避免任何复制设备 - 主机，主机设备 . 我用这个例子：https://github.com/thrust/thrust/blob/master/examples/cuda/wrap_pointer.cu Nvcc写道：*...

cuda thrust
1 votes

answers

views

如何使用CUDA Thrust执行策略来覆盖Thrust的低级设备内存分配器

我想覆盖低级CUDA设备内存分配器（实现为thrust :: system :: cuda :: detail :: malloc（）），以便在调用时使用自定义分配器而不是直接调用cudaMalloc（）主机（CPU）线程 . 这可能吗？如果是这样，是否可以使用Thrust“执行策略”机制来执行此操作？我试过这样的模型： struct eptCGA : thrust::system::cuda::...

c++ templates cuda malloc thrust
0 votes

answers

views

命名空间thrust :: system :: cuda :: thrust中的原因不明的错误，特别是在“system_error”和“cuda_category”中

我正在尝试使用thrust :: raw_pointer_cast来转换原始指针以捕获仿函数中的输出 . 我已经尝试了多种方法将指针传递给浮点数，但是继续得到内存冲突和两个智能感知错误推力:: system :: cuda :: thrust没有成员“system_error”并且没有成员“cuda_category” . 奇怪的是，它似乎是程序throw_on_error.hpp中的一个错误，它...

c++ cuda thrust
1 votes

answers

views

获取设备内存指针并将其传递给推力函数

我正在GPU上做一些计算，结果我有一个数组 . 我在Cudafy工作，我想使用推力库进行分类和减少 . 我能够在CUDA C中编写一个函数并将其作为DLL导入我的Cudafy代码中，以便使用推力库，因为它仅在CUDA C中可用 . 但推力函数只能从主机调用 . 我不想将整个数据从设备复制到主机以执行这些推力操作 . 我不知何故想要使用推力而不必复制数据 . 我知道可以从设备使用的 thrust::...

c# pointers cuda thrust cudafy.net
-2 votes

answers

views

CUDA推力设备指针与变换复制崩溃

在CUDA 9.2中我有这样的事情： #ifdef __CUDA_ARCH__ struct Context { float n[4]; } context; #else typedef __m128 Context; #endif struct A { float k[2]; }; struct B { float q[4]; }; struct FTransform : t...

cuda thrust
10 votes

answers

views

可以在单个CUDA内核中启动的最大线程数

我对可以在Fermi GPU中启动的最大线程数感到困惑 . 我的GTX 570设备查询说明如下 . Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimensio...

cuda gpu thrust
1 votes

answers

views

Thrust（CUDA Library）编译错误，如“'vectorize_from_shared_kernel__entry'：不是'thrust::detail::device::cuda'的成员”

我使用CUDA VS Wizard创建了一个VS项目，我尝试使用Thrust构建一个cuda程序，测试程序非常简单： // ignore headers int main(void) { thrust::device_vector<double> X; X.resize(100); } 我会得到一些编译错误，如：1> C：\ DOCUME~1 \ ADMINI~1 \ LO...

compiler-errors cuda thrust
1 votes

answers

views

NVidia Thrust device_vector of strings

我开始使用NVidia Thrust库作为CUDA 4.0工具包的一部分，并希望在深入挖掘之前验证一些内容 . 我可以执行以下操作并在构建期间没有任何问题： thrust::host_vector <int> iVec; thrust::device_vector <int> iVec2; thrust::host_vector <std::string> sV...

c++ cuda thrust
1 votes

answers

views

推力CUDA：不寻常的运行时错误

我正在尝试用Thrust CUDA库实现一些算法 . 它在前几次运行中运行良好但现在显示 thrust::system::detail bad_alloc error . 那是什么意思？我的GPU有4GB的全局内存，所以我没有内存不足（我的应用程序几乎不需要200MB） . 我在必要的地方使用 free 和 cudaFree . 这是sys conf . 操作系统：Linux卡片：特斯拉C20...

memory-leaks cuda runtime-error thrust
0 votes

answers

views

用CUDA编译错误改变推力后端系统5

我最近安装了CUDA 5，发现基于Thrust的现有代码无法编译 . 如果我切换到OMP或TBB，则只会发生错误 . 所以我使用Thrust示例中的monte_carlo.cpp进行了实验 . 当我使用CUDA 5.0的include路径时，我收到此错误： g -O2 -o monte_carlo monte_carlo.cpp -DTHRUST_DEVICE_SYSTEM = THRUST_D...

cuda thrust
1 votes

answers

views

Thrust：使用device_ptr时如何获取copy_if函数复制的元素数量

我正在使用Thrust库的thrust :: copy_if函数，加上计算迭代器以获取数组中非零元素的索引 . 我还需要获取复制元素的数量 . 我正在使用'counting_iterator.cu'示例中的代码，除了在我的应用程序中我需要重用预先分配的数组，所以我用thrust :: device_ptr包装它们然后将它们传递给thrust :: copy_if函数 . 这是代码： using n...

cuda thrust
0 votes

answers

views

错误：在Thrust程序中未定义标识符“atomicOr”

我发现在Visual Studio 2012中编译的Thrust程序中无法识别Cuda atomicOr函数 . 我已经读过，调用NVidia nvcc编译器时应该已经包含了所有头文件 . 此问题上的大多数帖子都声明这必然意味着架构设置不正确 . 我根据其他帖子尝试了这些设置：How to set CUDA compiler flags in Visual Studio 2010? ......以...

thrust atomicity
0 votes

answers

views

通过设备阵列上的键减少

我使用reduce_by_key来查找int2类型数组中具有相同第一个值的元素数 . 例如Array：<1,2> <1,3> <1,4> <2,5> <2,7>所以没有 . 1为第一元素的元素为3，2为2 . CODE: struct compare_int2 : public thrust::binary_function<in...

cuda parallel-processing thrust
1 votes

answers

views

Thrust：从CUDA切换到OpenMP时编译器错误

我正在学习一些为CUDA（v6.5）成功编译的程序 . 但是，当我切换到OpenMP时，我收到以下错误：错误9错误C2668：'thrust :: raw_reference_cast'：模糊调用重载函数C：\ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v6.5 \ include \ thrust \ detail \ fun...

openmp thrust
1 votes

answers

views

推力copy_if：不允许使用不完整类型

我正在尝试使用 thrust::copy_if 来压缩数组，并使用谓词检查正数：头文件：file.h： struct is_positive { __host__ __device__ bool operator()(const int x) { return (x >= 0); } }; 和file.cu #include "../headers/fi...

c++ cuda thrust
0 votes

answers

views

推力集操作未编译[重复]

这个问题与以下内容完全相同： thrust set difference fails to compile with calling a host function from a host device function is not allowed 1回答我尝试了一个使用thrust :: set的简单程序 . 它找到了两组的差异 . 但是我收到编译错误 . #include <th...

compiler-errors set difference thrust
0 votes

answers

views

CUDA多个设备问题，推力:: system_error

我使用 thrust 开发了一个算法 . 我的办公室计算机有一个支持CUDA的卡，带有架构： ---有关设备0的一般信息名称：Quadro 2000计算能力：2.1时钟速率：1251000 kHz设备重叠：启用内核执行超时：禁用在这台机器上，我的算法运行没有错误 . 但是，在尝试生成 device_vector 时，实验室计算机上的干净构建会引发令人讨厌的错误 . 这两台机器都运行RedHa...

linux cuda thrust
7 votes

answers

views

如何使CUDA中的矩阵列标准化并获得最大性能？

如何有效地规范化CUDA中的矩阵列？我的矩阵存储在column-major中，典型大小为2000x200 . 该操作可以用以下matlab代码表示 . A = rand(2000,200); A = exp(A); A = A./repmat(sum(A,1), [size(A,1) 1]); 这可以通过Thrust，cuBLAS和/或cuNPP有效地完成吗？包括4个内核的快速实现如下所示...

performance matrix cuda thrust cublas
-1 votes

answers

views

不允许通过从__host__ __device__函数调用__host__函数来编译推力集差异

我有两组A＆B分别是20和10整数 . B是A的子集 . 我需要找到B的互补集 . 我使用thrust :: set_difference来查找集合差异，但是无法使用消息进行编译： warning: calling a __host__ function from a __host__ __device__ function is not allowed 我的代码如下 . 我不知道为什么这个简单...

compiler-errors cuda set thrust
4 votes

answers

views

CUDA推力：从设备复制到设备

我使用标准CUDA malloc在CUDA中分配了一个内存数组，并将其传递给函数，如下所示： void MyClass::run(uchar4 * input_data) 我还有一个类成员，它是一个推力device_ptr，声明为： thrust::device_ptr<uchar4> data = thrust::device_malloc<uchar4(num_pts); ...

c++ cuda gpgpu thrust

热门问题