在直方图合并之前将距离矩阵中的距离乘以-Java 学习之路

我使用scipy.spatial.distance.pdist来计算坐标数组的距离，然后是numpy.histogram来计算结果 . 目前，这会将每个坐标视为一个对象，但是我在同一个坐标处有多个对象 . 一种选择是更改数组，使每个坐标出现多次，对于该坐标处的每个对象一次，但这会大大增加数组的大小和pdist的计算时间，因为它会缩放为N ^ 2，并且这是非常昂贵的，并且速度在这个应用中很重要 .

第二种方法是处理得到的距离矩阵，使得每个距离重复ninj次，其中ni是坐标i处的对象的数量，nj是坐标j处的对象的数量 . 这会将原始MxM距离矩阵变换为NxN距离矩阵，其中M是阵列中坐标的总数，但N是对象的总数 . 但同样，这似乎是不必要的代价，因为我真正需要做的是以某种方式告诉直方图函数将距离ij上的事件数乘以ninj . 换句话说，有没有办法告诉numpy.histogram有's not just one object at distance ij, but that there'的ni * nj对象呢？

其他想法显然是受欢迎的 .

编辑：

这是第一种方法的一个例子 .

import numpy as np
from scipy import spatial
import matplotlib.pyplot as plt

#create array of 5 coordinates in 3D
coords = np.random.random(15).reshape(5,3)
'''array([[ 0.66500534,  0.10145476,  0.92528492],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.81564621,  0.82750694,  0.53083443]])'''

#number of objects at each coordinate
objects = np.random.randint(1,10,5)
#array([5, 3, 8, 5, 1])

#create new array with coordinates for each individual object
new_coords = np.zeros((objects.sum(),3))

#there's surely a simpler way to do this
j=0
for coord in range(coords.shape[0]):
    for i in range(objects[coord]):
            new_coords[j] = coords[coord]
            j+=1

'''new_coords
array([[ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.81564621,  0.82750694,  0.53083443]])''' 

#calculate distance matrix of old and new arrays
distances_old = distance.pdist(coords)
distances_new = distance.pdist(new_coords)

#calculate and plot normalized histograms (typically just use np.histogram without plotting)
plt.hist(distances_old, range=(0,1), alpha=.5, normed=True)
(array([ 0.,  0.,  0.,  0.,  2.,  1.,  2.,  2.,  2.,  1.]), array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]), <a list of 10 Patch objects>)

plt.hist(distances_new, range=(0,1), alpha=.5, normed=True)
(array([ 2.20779221,  0.        ,  0.        ,  0.        ,  1.68831169,
        0.64935065,  2.07792208,  2.81385281,  0.34632035,  0.21645022]), array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]), <a list of 10 Patch objects>)

plt.show()

histograms

第二种方法将改为处理距离矩阵而不是坐标矩阵，但我还没有想出代码 .

这两种方法对我来说似乎都是低效的，我认为操纵np.histogram的分箱过程更有可能是有效的，因为它只是基本的乘法，但我不知道如何告诉np.histogram将每个坐标视为有变量要计数的对象数量 .

1 回答

这样的事情可能有用：

from scipy.spatial import distance

positions = np.random.rand(10, 2)
counts = np.random.randint(1, 5, len(positions))

distances = distance.pdist(positions)
i, j = np.triu_indices(len(positions), 1)

bins = np.linspace(0, 1, 10)
h, b = np.histogram(distances, bins=bins, weights=counts[i]*counts[j])

除了 0 -distances之外，它与重复相比检出：

repeated = np.repeat(positions, counts, 0)
rdistances_r = distance.pdist(repeated)

hr, br = np.histogram(rdistances, bins=bins)

In [83]: h
Out[83]: array([11, 22, 27, 43, 67, 46, 40,  0, 19,  0])

In [84]: hr
Out[84]: array([36, 22, 27, 43, 67, 46, 40,  0, 19,  0])

回复于 2024-04-28T00:33:34+08:00

在直方图合并之前将距离矩阵中的距离乘以

1 回答

相关问题