Home Articles

Numpy对所有人有效率

Asked
Viewed 99 times
5

我有一个3xN的3d坐标数组,我想有效地计算所有条目的距离矩阵 . 是否有任何有效的循环策略而不是可以应用的嵌套循环?

当前的伪代码实现:

for i,coord in enumerate(coords):
    for j,coords2 in enumerate(coords):
        if i != j:
             dist[i,j] = numpy.norm(coord - coord2)

1 Answer

  • 5

    要完全重现您的结果:

    >>> import scipy.spatial as sp
    >>> import numpy as np
    >>> a=np.random.rand(5,3) #Note this is the transpose of your array.
    >>> a
    array([[ 0.83921304,  0.72659404,  0.50434178],  #0
           [ 0.99883826,  0.91739731,  0.9435401 ],  #1
           [ 0.94327962,  0.57665875,  0.85853404],  #2
           [ 0.30053567,  0.44458829,  0.35677649],  #3
           [ 0.01345765,  0.49247883,  0.11496977]]) #4
    >>> sp.distance.cdist(a,a)
    array([[ 0.        ,  0.50475862,  0.39845025,  0.62568048,  0.94249268],
           [ 0.50475862,  0.        ,  0.35554966,  1.02735895,  1.35575051],
           [ 0.39845025,  0.35554966,  0.        ,  0.82602847,  1.1935422 ],
           [ 0.62568048,  1.02735895,  0.82602847,  0.        ,  0.3783884 ],
           [ 0.94249268,  1.35575051,  1.1935422 ,  0.3783884 ,  0.        ]])
    

    要在不重复计算的情况下更有效地执行此操作,并仅计算唯一对:

    >>> sp.distance.pdist(a)
    array([ 0.50475862,  0.39845025,  0.62568048,  0.94249268,  0.35554966,
            1.02735895,  1.35575051,  0.82602847,  1.1935422 ,  0.3783884 ])
    #pairs: [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
    #         (2, 4), (3, 4)]
    

    注意两个数组之间的关系 . cdist 数组可以通过以下方式复制:

    >>> out=np.zeros((a.shape[0],a.shape[0]))
    >>> dists=sp.distance.pdist(a)
    >>> out[np.triu_indices(a.shape[0],1)]=dists
    >>> out+=out.T
    
    >>> out
    array([[ 0.        ,  0.50475862,  0.39845025,  0.62568048,  0.94249268],
           [ 0.50475862,  0.        ,  0.35554966,  1.02735895,  1.35575051],
           [ 0.39845025,  0.35554966,  0.        ,  0.82602847,  1.1935422 ],
           [ 0.62568048,  1.02735895,  0.82602847,  0.        ,  0.3783884 ],
           [ 0.94249268,  1.35575051,  1.1935422 ,  0.3783884 ,  0.        ]])
    

    一些有些令人惊讶的时间 -

    设置:

    def pdist_toarray(a):
        out=np.zeros((a.shape[0],a.shape[0]))
        dists=sp.distance.pdist(a)
    
        out[np.triu_indices(a.shape[0],1)]=dists
        return out+out.T
    
    def looping(a):
        out=np.zeros((a.shape[0],a.shape[0]))
        for i in xrange(a.shape[0]):
            for j in xrange(a.shape[0]):
                out[i,j]=np.linalg.norm(a[i]-a[j])
        return out
    

    时序:

    arr=np.random.rand(1000,3)
    
    %timeit sp.distance.pdist(arr)
    100 loops, best of 3: 4.26 ms per loop
    
    %timeit sp.distance.cdist(arr,arr)
    100 loops, best of 3: 9.31 ms per loop
    
    %timeit pdist_toarray(arr)
    10 loops, best of 3: 66.2 ms per loop
    
    %timeit looping(arr)
    1 loops, best of 3: 16.7 s per loop
    

    因此,如果你想要回到正方形阵列,你应该使用 cdist ,如果你只想使用 pdist . 对于具有1000个元素的数组,循环慢约4000倍,对于具有10个元素的数组,与 cdist 相比,循环慢约70倍 .

Related