首页 文章

NUMBA - 如何使用“cuda”目标在@guvectorize中生成随机数?

提问于
浏览
0

在这个(哑)例子中,我试图通过计算落入单位圆的(0,1)x(0,1)中随机选择的点的数量来计算pi .

@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
    n = arr.shape[0]
    for t in range(n):
        inside = 0
        for i in range(iters):
            x = np.random.random()
            y = np.random.random()
            if x ** 2 + y ** 2 <= 1.0:
               inside += 1
        res[t] = 4.0 * inside / iters

编译期间弹出此异常:

numba.errors.UntypedAttributeError: Failed at nopython (nopython frontend)
Unknown attribute 'random' of type Module(<module 'numpy.random' from '...'>)
File "scratch.py", line 34
[1] During: typing of get attribute at /.../scratch.py (34)

我天真地认为使用描述here的RNG可以解决问题 . 我修改过的代码看起来像:

@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
    n = arr.shape[0]
    rng = create_xoroshiro128p_states(n, seed=1)
    for t in range(n):
        inside = 0
        for i in range(iters):
            x = xoroshiro128p_uniform_float64(rng, t)
            y = xoroshiro128p_uniform_float64(rng, t)
            if x ** 2 + y ** 2 <= 1.0:
                inside += 1
        res[t] = 4.0 * inside / iters

但是会弹出类似的错误:

numba.errors.TypingError: Failed at nopython (nopython frontend)
Untyped global name 'create_xoroshiro128p_states': cannot determine Numba type of <class 'function'>
File "scratch.py", line 28

当我尝试更改为 target='parallel' 时,使用 numpy.random.random 的原始代码是否正常 nopython=True . 导致 target='cuda' 问题的原因是什么方法可以在 @guvectorize -d块中获取随机数?

1 回答

  • 0

    函数create_xoroshiro128p_states旨在在CPU上运行,如本例中Numba文档中所示,重复如下:

    from __future__ import print_function, absolute_import
    
    from numba import cuda
    from numba.cuda.random import create_xoroshiro128p_states, 
    xoroshiro128p_uniform_float32
    import numpy as np
    
    @cuda.jit
    def compute_pi(rng_states, iterations, out):
        """Find the maximum value in values and store in result[0]"""
        thread_id = cuda.grid(1)
    
        # Compute pi by drawing random (x, y) points and finding what
        # fraction lie inside a unit circle
        inside = 0
        for i in range(iterations):
            x = xoroshiro128p_uniform_float32(rng_states, thread_id)
            y = xoroshiro128p_uniform_float32(rng_states, thread_id)
            if x**2 + y**2 <= 1.0:
                inside += 1
    
        out[thread_id] = 4.0 * inside / iterations
    
    threads_per_block = 64
    blocks = 24
    rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
    out = np.zeros(threads_per_block * blocks, dtype=np.float32)
    
    compute_pi[blocks, threads_per_block](rng_states, 10000, out)
    print('pi:', out.mean())
    

    它生成一组随机初始化数据,使得GPU上的随机数生成独立于线程 . 这些数据最终会在设备端出现,这有点令人困惑 . 但它允许您将随机状态数据传递给GPU内核 .

相关问题