首页 文章

在numpy数组中查找最接近的值

提问于
浏览
245

是否有一种numpy-thonic方式,例如函数,找到数组中最近的值?

例:

np.find_nearest( array, value )

14 回答

  • 388

    如果您不想使用numpy,这将执行此操作:

    def find_nearest(array, value):
        n = [abs(i-value) for i in array]
        idx = n.index(min(n))
        return array[idx]
    
  • 3

    这是一个扩展,用于在向量数组中查找最近的向量 .

    import numpy as np
    
    def find_nearest_vector(array, value):
      idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
      return array[idx]
    
    A = np.random.random((10,2))*100
    """ A = array([[ 34.19762933,  43.14534123],
       [ 48.79558706,  47.79243283],
       [ 38.42774411,  84.87155478],
       [ 63.64371943,  50.7722317 ],
       [ 73.56362857,  27.87895698],
       [ 96.67790593,  77.76150486],
       [ 68.86202147,  21.38735169],
       [  5.21796467,  59.17051276],
       [ 82.92389467,  99.90387851],
       [  6.76626539,  30.50661753]])"""
    pt = [6, 30]  
    print find_nearest_vector(A,pt)
    # array([  6.76626539,  30.50661753])
    
  • 16

    Summary of answer :如果有一个已排序 array ,则二分代码(下面给出)执行速度最快 . 大型阵列的速度提高约100-1000倍,小型阵列的速度提高约2-100倍 . 它也不需要numpy . 如果你有一个未排序的 array 然后如果 array 很大,那么首先应该考虑使用O(n logn)排序然后分割,如果 array 很小,那么方法2似乎是最快的 .

    First you should clarify what you mean by nearest value . 通常人们想要横坐标中的间隔,例如array = [0,0.7,2.1],value = 1.95,answer为idx = 1 . 这是我怀疑你需要的情况(否则,一旦找到间隔,可以使用后续条件语句很容易地修改以下内容) . 我会注意到执行此操作的最佳方法是使用二分(我将首先提供 - 注意它根本不需要numpy并且比使用numpy函数更快,因为它们执行冗余操作) . 然后,我将提供与其他用户在此处呈现的其他人的时序比较 .

    二分法:

    def bisection(array,value):
        '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
        and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
        to indicate that ``value`` is out of range below and above respectively.'''
        n = len(array)
        if (value < array[0]):
            return -1
        elif (value > array[n-1]):
            return n
        jl = 0# Initialize lower
        ju = n-1# and upper limits.
        while (ju-jl > 1):# If we are not yet done,
            jm=(ju+jl) >> 1# compute a midpoint with a bitshift
            if (value >= array[jm]):
                jl=jm# and replace either the lower limit
            else:
                ju=jm# or the upper limit, as appropriate.
            # Repeat until the test condition is satisfied.
        if (value == array[0]):# edge cases at bottom
            return 0
        elif (value == array[n-1]):# and top
            return n-1
        else:
            return jl
    

    现在我将从其他答案中定义代码,它们每个都返回一个索引:

    import math
    import numpy as np
    
    def find_nearest1(array,value):
        idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
        return idx
    
    def find_nearest2(array, values):
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        return indices
    
    def find_nearest3(array, values):
        values = np.atleast_1d(values)
        indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
        out = array[indices]
        return indices
    
    def find_nearest4(array,value):
        idx = (np.abs(array-value)).argmin()
        return idx
    
    
    def find_nearest5(array, value):
        idx_sorted = np.argsort(array)
        sorted_array = np.array(array[idx_sorted])
        idx = np.searchsorted(sorted_array, value, side="left")
        if idx >= len(array):
            idx_nearest = idx_sorted[len(array)-1]
        elif idx == 0:
            idx_nearest = idx_sorted[0]
        else:
            if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
                idx_nearest = idx_sorted[idx-1]
            else:
                idx_nearest = idx_sorted[idx]
        return idx_nearest
    
    def find_nearest6(array,value):
        xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
        return xi
    

    现在我将时间代码: Note 方法1,2,4,5没有正确给出间隔 . 方法1,2,4舍入到阵列中的最近点(例如> = 1.5 - > 2),方法5总是向上舍入(例如1.45 - > 2) . 只有方法3和6,当然还有二分法才能正确地给出间隔 .

    array = np.arange(100000)
    val = array[50000]+0.55
    print( bisection(array,val))
    %timeit bisection(array,val)
    print( find_nearest1(array,val))
    %timeit find_nearest1(array,val)
    print( find_nearest2(array,val))
    %timeit find_nearest2(array,val)
    print( find_nearest3(array,val))
    %timeit find_nearest3(array,val)
    print( find_nearest4(array,val))
    %timeit find_nearest4(array,val)
    print( find_nearest5(array,val))
    %timeit find_nearest5(array,val)
    print( find_nearest6(array,val))
    %timeit find_nearest6(array,val)
    
    (50000, 50000)
    100000 loops, best of 3: 4.4 µs per loop
    50001
    1 loop, best of 3: 180 ms per loop
    50001
    1000 loops, best of 3: 267 µs per loop
    [50000]
    1000 loops, best of 3: 390 µs per loop
    50001
    1000 loops, best of 3: 259 µs per loop
    50001
    1000 loops, best of 3: 1.21 ms per loop
    [50000]
    1000 loops, best of 3: 746 µs per loop
    

    对于大阵列,二等分给出4us与下一个最佳180us相比,最长1.21ms(快~100-1000倍) . 对于较小的阵列,它快〜2-100倍 .

  • 0
    import numpy as np
    def find_nearest(array, value):
        array = np.asarray(array)
        idx = (np.abs(array - value)).argmin()
        return array[idx]
    
    array = np.random.random(10)
    print(array)
    # [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
    #   0.17104965  0.56874386  0.57319379  0.28719469]
    
    value = 0.5
    
    print(find_nearest(array, value))
    # 0.568743859261
    
  • 1

    如果您的数组已排序且非常大,这是一个更快的解决方案:

    def find_nearest(array,value):
        idx = np.searchsorted(array, value, side="left")
        if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
            return array[idx-1]
        else:
            return array[idx]
    

    这可以扩展到非常大的数组 . 如果您不能假设数组已经排序,则可以轻松修改上述内容以在方法中进行排序 . 这对小型阵列来说太过分了,但是一旦它们变大,这就会快得多 .

  • 39

    稍作修改,上面的答案适用于任意维度的数组(1d,2d,3d,...):

    def find_nearest(a, a0):
        "Element in nd array `a` closest to the scalar value `a0`"
        idx = np.abs(a - a0).argmin()
        return a.flat[idx]
    

    或者,写成一行:

    a.flat[np.abs(a - a0).argmin()]
    
  • 55

    这是@Ari Onasafari的scipy版本,回答“ to find the nearest vector in an array of vectors

    In [1]: from scipy import spatial
    
    In [2]: import numpy as np
    
    In [3]: A = np.random.random((10,2))*100
    
    In [4]: A
    Out[4]:
    array([[ 68.83402637,  38.07632221],
           [ 76.84704074,  24.9395109 ],
           [ 16.26715795,  98.52763827],
           [ 70.99411985,  67.31740151],
           [ 71.72452181,  24.13516764],
           [ 17.22707611,  20.65425362],
           [ 43.85122458,  21.50624882],
           [ 76.71987125,  44.95031274],
           [ 63.77341073,  78.87417774],
           [  8.45828909,  30.18426696]])
    
    In [5]: pt = [6, 30]  # <-- the point to find
    
    In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
    Out[6]: array([  8.45828909,  30.18426696])
    
    #how it works!
    In [7]: distance,index = spatial.KDTree(A).query(pt)
    
    In [8]: distance # <-- The distances to the nearest neighbors
    Out[8]: 2.4651855048258393
    
    In [9]: index # <-- The locations of the neighbors
    Out[9]: 9
    
    #then 
    In [10]: A[index]
    Out[10]: array([  8.45828909,  30.18426696])
    
  • 8

    这是一个处理非标量“值”数组的版本:

    import numpy as np
    
    def find_nearest(array, values):
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        return array[indices]
    

    或者,如果输入是标量,则返回数值类型的版本(例如int,float):

    def find_nearest(array, values):
        values = np.atleast_1d(values)
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        out = array[indices]
        return out if len(out) > 1 else out[0]
    
  • 2

    对于大型阵列,@ Demitri给出的(优秀)答案远远快于目前标记为最佳的答案 . 我已经通过以下两种方式调整了他的确切算法:

    • 无论输入数组是否已排序,下面的函数都有效 .

    • 下面的函数返回与最接近的值对应的输入数组的索引,这更为一般 .

    请注意,下面的函数还处理特定的边缘情况,这会导致@Demitri编写的原始函数中的错误 . 否则,我的算法与他的算法相同 .

    def find_idx_nearest_val(array, value):
        idx_sorted = np.argsort(array)
        sorted_array = np.array(array[idx_sorted])
        idx = np.searchsorted(sorted_array, value, side="left")
        if idx >= len(array):
            idx_nearest = idx_sorted[len(array)-1]
        elif idx == 0:
            idx_nearest = idx_sorted[0]
        else:
            if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
                idx_nearest = idx_sorted[idx-1]
            else:
                idx_nearest = idx_sorted[idx]
        return idx_nearest
    
  • 8

    这是@ Dimitri解决方案的快速矢量化版本,如果你有很多 values 来搜索( values 可以是多维数组):

    #`values` should be sorted
    def get_closest(array, values):
        #make sure array is a numpy array
        array = np.array(array)
    
        # get insert positions
        idxs = np.searchsorted(array, values, side="left")
    
        # find indexes where previous index is closer
        prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
        idxs[prev_idx_is_less] -= 1
    
        return array[idxs]
    

    Benchmarks

    使用带有@Demitri解决方案的 for 循环快100倍

    >>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
    139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    >>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
    took 21.4 seconds
    
  • 14

    这是unutbu's answer的矢量化版本:

    def find_nearest(array, values):
        array = np.asarray(array)
    
        # the last dim must be 1 to broadcast in (array - values) below.
        values = np.expand_dims(values, axis=-1) 
    
        indices = np.abs(array - values).argmin(axis=-1)
    
        return array[indices]
    
    
    image = plt.imread('example_3_band_image.jpg')
    
    print(image.shape) # should be (nrows, ncols, 3)
    
    quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)
    
    quantiled_image = find_nearest(quantiles, image)
    
    print(quantiled_image.shape) # should be (nrows, ncols, 3)
    
  • 8

    我认为最pythonic的方式是:

    num = 65 # Input number
     array = n.random.random((10))*100 # Given array 
     nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
     nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)
    

    这是基本代码 . 如果需要,您可以将其用作功能

  • 1

    所有答案都有利于收集信息以编写有效的代码 . 但是,我编写了一个小的Python脚本来针对各种情况进行优化 . 如果提供的数组已排序,那将是最好的情况 . 如果搜索指定值的最近点的索引,则 bisect 模块是最有效的时间 . 当一个搜索索引对应一个数组时, numpy searchsorted 是最有效的 .

    import numpy as np
    import bisect
    xarr = np.random.rand(int(1e7))
    
    srt_ind = xarr.argsort()
    xar = xarr.copy()[srt_ind]
    xlist = xar.tolist()
    bisect.bisect_left(xlist, 0.3)
    

    在[63]中:%time bisect.bisect_left(xlist,0.3)CPU时间:用户0 ns,sys:0 ns,总计:0 ns挂壁时间:22.2μs

    np.searchsorted(xar, 0.3, side="left")
    

    在[64]中:%time np.searchsorted(xar,0.3,side =“left”)CPU时间:用户0 ns,sys:0 ns,总计:0 ns挂壁时间:98.9μs

    randpts = np.random.rand(1000)
    np.searchsorted(xar, randpts, side="left")
    

    %time np.searchsorted(xar,randpts,side =“left”)CPU时间:用户4 ms,sys:0 ns,总计:4 ms挂起时间:1.2 ms

    如果我们遵循乘法规则,那么numpy应该需要~100 ms,这意味着快〜83倍 .

  • 6
    import numpy as np
    def find_nearest(array, value):
        array = np.array(array)
        z=np.abs(array-value)
        y= np.where(z == z.min())
        m=np.array(y)
        x=m[0,0]
        y=m[1,0]
        near_value=array[x,y]
    
        return near_value
    
    array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
    print(array)
    value = 0
    print(find_nearest(array, value))
    

相关问题