首页 文章

如何用NumPy计算欧几里德距离?

提问于
浏览
341

我在3D中有两点:

(xa, ya, za)
(xb, yb, zb)

我想计算距离:

dist = sqrt((xa-xb)^2 + (ya-yb)^2 + (za-zb)^2)

使用NumPy或Python的最佳方法是什么?我有:

a = numpy.array((xa ,ya, za))
b = numpy.array((xb, yb, zb))

17 回答

  • 5

    使用numpy.linalg.norm

    dist = numpy.linalg.norm(a-b)
    
  • 3
    import math
    
    dist = math.hypot(math.hypot(xa-xb, ya-yb), za-zb)
    
  • 17

    计算多维空间的欧几里德距离:

    import math
    
     x = [1, 2, 6] 
     y = [-2, 3, 2]
    
     dist = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(x, y)]))
     5.0990195135927845
    
  • 597

    那里's a function for that in SciPy. It'被称为Euclidean .

    例:

    from scipy.spatial import distance
    a = (1, 2, 3)
    b = (4, 5, 6)
    dst = distance.euclidean(a, b)
    
  • 54

    一个漂亮的单行:

    dist = numpy.linalg.norm(a-b)
    

    但是,如果速度是一个问题,我建议您在您的机器上进行试验 . 我发现在我的机器上使用 math 库的 sqrt** 运算符比单行NumPy解决方案快得多 .

    我使用这个简单的程序运行测试:

    #!/usr/bin/python
    import math
    import numpy
    from random import uniform
    
    def fastest_calc_dist(p1,p2):
        return math.sqrt((p2[0] - p1[0]) ** 2 +
                         (p2[1] - p1[1]) ** 2 +
                         (p2[2] - p1[2]) ** 2)
    
    def math_calc_dist(p1,p2):
        return math.sqrt(math.pow((p2[0] - p1[0]), 2) +
                         math.pow((p2[1] - p1[1]), 2) +
                         math.pow((p2[2] - p1[2]), 2))
    
    def numpy_calc_dist(p1,p2):
        return numpy.linalg.norm(numpy.array(p1)-numpy.array(p2))
    
    TOTAL_LOCATIONS = 1000
    
    p1 = dict()
    p2 = dict()
    for i in range(0, TOTAL_LOCATIONS):
        p1[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
        p2[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
    
    total_dist = 0
    for i in range(0, TOTAL_LOCATIONS):
        for j in range(0, TOTAL_LOCATIONS):
            dist = fastest_calc_dist(p1[i], p2[j]) #change this line for testing
            total_dist += dist
    
    print total_dist
    

    在我的机器上, math_calc_dist 的运行速度比 numpy_calc_dist 快得多: 1.5 seconds23.5 seconds 相比 .

    要在 fastest_calc_distmath_calc_dist 之间获得可衡量的差异,我必须将 TOTAL_LOCATIONS 上升到6000.然后 fastest_calc_dist 需要 ~50 secondsmath_calc_dist 需要 ~60 seconds .

    您也可以尝试使用 numpy.sqrtnumpy.square ,尽管两者都比我机器上的 math 替代品慢 .

    我的测试是使用Python 2.6.6运行的 .

  • 2

    根据您的定义 ab ,您还可以使用:

    distance = np.sqrt(np.sum((a-b)**2))
    
  • 0
    import numpy as np
    from scipy.spatial import distance
    input_arr = np.array([[0,3,0],[2,0,0],[0,1,3],[0,1,2],[-1,0,1],[1,1,1]]) 
    test_case = np.array([0,0,0])
    dst=[]
    for i in range(0,6):
        temp = distance.euclidean(test_case,input_arr[i])
        dst.append(temp)
    print(dst)
    
  • 101

    您可以轻松使用该公式

    distance = np.sqrt(np.sum(np.square(a-b)))
    

    这实际上只不过是使用毕达哥拉斯定理来计算距离,通过加上Δx,Δy和Δz的平方并使结果生根 .

  • 27

    我在matplotlib.mlab中找到了'dist'函数,但我认为它不够方便 .

    我在这里发帖仅供参考 .

    import numpy as np
    import matplotlib as plt
    
    a = np.array([1, 2, 3])
    b = np.array([2, 3, 4])
    
    # Distance between a and b
    dis = plt.mlab.dist(a, b)
    
  • 9

    对于有兴趣一次计算多个距离的人,我使用perfplot(我的一个小项目)进行了一些比较 . 事实证明

    a_min_b = a - b
    numpy.sqrt(numpy.einsum('ij,ij->i', a_min_b, a_min_b))
    

    计算 ab 中行的距离最快 . 这实际上只适用于一行!

    enter image description here


    重现情节的代码:

    import matplotlib
    import numpy
    import perfplot
    from scipy.spatial import distance
    
    
    def linalg_norm(data):
        a, b = data
        return numpy.linalg.norm(a-b, axis=1)
    
    
    def sqrt_sum(data):
        a, b = data
        return numpy.sqrt(numpy.sum((a-b)**2, axis=1))
    
    
    def scipy_distance(data):
        a, b = data
        return list(map(distance.euclidean, a, b))
    
    
    def mpl_dist(data):
        a, b = data
        return list(map(matplotlib.mlab.dist, a, b))
    
    
    def sqrt_einsum(data):
        a, b = data
        a_min_b = a - b
        return numpy.sqrt(numpy.einsum('ij,ij->i', a_min_b, a_min_b))
    
    
    perfplot.show(
        setup=lambda n: numpy.random.rand(2, n, 3),
        n_range=[2**k for k in range(20)],
        kernels=[linalg_norm, scipy_distance, mpl_dist, sqrt_sum, sqrt_einsum],
        logx=True,
        logy=True,
        xlabel='len(x), len(y)'
        )
    
  • 8

    你可以减去向量然后减去内积 .

    按照你的例子,

    a = numpy.array((xa, ya, za))
    b = numpy.array((xb, yb, zb))
    
    tmp = a - b
    sum_squared = numpy.dot(tmp.T, tmp)
    result sqrt(sum_squared)
    

    它代码简单,易于理解 .

  • 5

    它可以像下面这样完成 . 我不知道它有多快,但它没有使用NumPy .

    from math import sqrt
    a = (1, 2, 3) # Data point 1
    b = (4, 5, 6) # Data point 2
    print sqrt(sum( (a - b)**2 for a, b in zip(a, b)))
    
  • 5

    下面是Python中欧几里德距离的一些简洁代码,给出了在Python中表示为列表的两个点 .

    def distance(v1,v2): 
        return sum([(x-y)**2 for (x,y) in zip(v1,v2)])**(0.5)
    
  • 1

    我喜欢np.dot(点积):

    a = numpy.array((xa,ya,za))
    b = numpy.array((xb,yb,zb))
    
    distance = (np.dot(a-b,a-b))**.5
    
  • 0

    this problem solving method的另一个例子:

    def dist(x,y):   
        return numpy.sqrt(numpy.sum((x-y)**2))
    
    a = numpy.array((xa,ya,za))
    b = numpy.array((xb,yb,zb))
    dist_a_b = dist(a,b)
    
  • 1

    我想用各种性能说明来阐述简单的答案 . np.linalg.norm可能比你需要的更多:

    dist = numpy.linalg.norm(a-b)
    

    首先 - 此功能旨在处理列表并返回所有值,例如比较从 pA 到点集 sP 的距离:

    sP = set(points)
    pA = point
    distances = np.linalg.norm(sP - pA, ord=2, axis=1.)  # 'distances' is a list
    

    记住几件事:

    • Python函数调用很昂贵 .

    • [Regular] Python不缓存名称查找 .

    所以

    def distance(pointA, pointB):
        dist = np.linalg.norm(pointA - pointB)
        return dist
    

    并不像看起来那么无辜 .

    >>> dis.dis(distance)
      2           0 LOAD_GLOBAL              0 (np)
                  2 LOAD_ATTR                1 (linalg)
                  4 LOAD_ATTR                2 (norm)
                  6 LOAD_FAST                0 (pointA)
                  8 LOAD_FAST                1 (pointB)
                 10 BINARY_SUBTRACT
                 12 CALL_FUNCTION            1
                 14 STORE_FAST               2 (dist)
    
      3          16 LOAD_FAST                2 (dist)
                 18 RETURN_VALUE
    

    首先 - 每当我们调用它时,我们必须对"np"进行全局查找,对"linalg"进行范围查找,并对"norm"进行范围查找,仅调用该函数的开销可以等同于几十个python指令 .

    最后,我们浪费了两个操作来存储结果并重新加载它以便返回...

    第一步改进:使查找更快,跳过商店

    def distance(pointA, pointB, _norm=np.linalg.norm):
        return _norm(pointA - pointB)
    

    我们得到了更加精简:

    >>> dis.dis(distance)
      2           0 LOAD_FAST                2 (_norm)
                  2 LOAD_FAST                0 (pointA)
                  4 LOAD_FAST                1 (pointB)
                  6 BINARY_SUBTRACT
                  8 CALL_FUNCTION            1
                 10 RETURN_VALUE
    

    但是,函数调用开销仍然相当于一些工作 . 你会想做基准来确定你自己是否可以更好地做数学:

    def distance(pointA, pointB):
        return (
            ((pointA.x - pointB.x) ** 2) +
            ((pointA.y - pointB.y) ** 2) +
            ((pointA.z - pointB.z) ** 2)
        ) ** 0.5  # fast sqrt
    

    在某些平台上, **0.5math.sqrt 快 . 你的旅费可能会改变 .

    ****高级演奏说明 .

    你为什么计算距离?如果唯一的目的是显示它,

    print("The target is %.2fm away" % (distance(a, b)))
    

    向前走 . 但如果您要比较距离,进行范围检查等,我想添加一些有用的性能观察 .

    我们来看两种情况:按距离排序或剔除列表到满足范围约束的项目 .

    # Ultra naive implementations. Hold onto your hat.
    
    def sort_things_by_distance(origin, things):
        return things.sort(key=lambda thing: distance(origin, thing))
    
    def in_range(origin, range, things):
        things_in_range = []
        for thing in things:
            if distance(origin, thing) <= range:
                things_in_range.append(thing)
    

    我们需要记住的第一件事是我们正在使用Pythagoras来计算距离( dist = sqrt(x^2 + y^2 + z^2) ),因此我们正在进行大量的 sqrt 调用 . 数学101:

    dist = root ( x^2 + y^2 + z^2 )
    :.
    dist^2 = x^2 + y^2 + z^2
    and
    sq(N) < sq(M) iff M > N
    and
    sq(N) > sq(M) iff N > M
    and
    sq(N) = sq(M) iff N == M
    

    简而言之:直到我们实际需要以X为单位而不是X ^ 2的距离,我们才能消除计算中最难的部分 .

    # Still naive, but much faster.
    
    def distance_sq(left, right):
        """ Returns the square of the distance between left and right. """
        return (
            ((left.x - right.x) ** 2) +
            ((left.y - right.y) ** 2) +
            ((left.z - right.z) ** 2)
        )
    
    def sort_things_by_distance(origin, things):
        return things.sort(key=lambda thing: distance_sq(origin, thing))
    
    def in_range(origin, range, things):
        things_in_range = []
    
        # Remember that sqrt(N)**2 == N, so if we square
        # range, we don't need to root the distances.
        range_sq = range**2
    
        for thing in things:
            if distance_sq(origin, thing) <= range_sq:
                things_in_range.append(thing)
    

    伟大的,这两个功能不再做任何昂贵的平方根 . 那会更快 . 我们还可以通过将in_range转换为生成器来改进in_range:

    def in_range(origin, range, things):
        range_sq = range**2
        yield from (thing for thing in things
                    if distance_sq(origin, thing) <= range_sq)
    

    如果您正在执行以下操作,这尤其有益:

    if any(in_range(origin, max_dist, things)):
        ...
    

    但如果你要做的下一件事需要距离,

    for nearby in in_range(origin, walking_distance, hotdog_stands):
        print("%s %.2fm" % (nearby.name, distance(origin, nearby)))
    

    考虑产生元组:

    def in_range_with_dist_sq(origin, range, things):
        range_sq = range**2
        for thing in things:
            dist_sq = distance_sq(origin, thing)
            if dist_sq <= range_sq: yield (thing, dist_sq)
    

    如果您可以链接范围检查('找到靠近X且在Nm内的东西',因为您不必再次计算距离),这可能特别有用 .

    但是,如果我们正在搜索一个非常大的 things 和我们预计其中很多都不值得考虑?

    实际上有一个非常简单的优化:

    def in_range_all_the_things(origin, range, things):
        range_sq = range**2
        for thing in things:
            dist_sq = (origin.x - thing.x) ** 2
            if dist_sq <= range_sq:
                dist_sq += (origin.y - thing.y) ** 2
                if dist_sq <= range_sq:
                    dist_sq += (origin.z - thing.z) ** 2
                    if dist_sq <= range_sq:
                        yield thing
    

    这是否有用将取决于“事物”的大小 .

    def in_range_all_the_things(origin, range, things):
        range_sq = range**2
        if len(things) >= 4096:
            for thing in things:
                dist_sq = (origin.x - thing.x) ** 2
                if dist_sq <= range_sq:
                    dist_sq += (origin.y - thing.y) ** 2
                    if dist_sq <= range_sq:
                        dist_sq += (origin.z - thing.z) ** 2
                        if dist_sq <= range_sq:
                            yield thing
        elif len(things) > 32:
            for things in things:
                dist_sq = (origin.x - thing.x) ** 2
                if dist_sq <= range_sq:
                    dist_sq += (origin.y - thing.y) ** 2 + (origin.z - thing.z) ** 2
                    if dist_sq <= range_sq:
                        yield thing
        else:
            ... just calculate distance and range-check it ...
    

    再次,考虑产生dist_sq . 我们的热狗示例然后变成:

    # Chaining generators
    info = in_range_with_dist_sq(origin, walking_distance, hotdog_stands)
    info = (stand, dist_sq**0.5 for stand, dist_sq in info)
    for stand, dist in info:
        print("%s %.2fm" % (stand, dist))
    
  • 1

    首先找出两个矩阵的差异 . 然后,使用numpy的multiply命令应用元素乘法 . 之后,找到元素乘以新矩阵的总和 . 最后,找到求和的平方根 .

    def findEuclideanDistance(a, b):
        euclidean_distance = a - b
        euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
        euclidean_distance = np.sqrt(euclidean_distance)
        return euclidean_distance
    

相关问题