我有一个numpy的一维数组，我想找到一个值超过numpy数组中的值的索引的位置 .

例如 .

aa = range(-10,10)

在 aa 中查找位置，其中超出了值 5 .

7 回答

这快一点（看起来更好）

np.argmax(aa>5)

由于argmax将在第一个 True （"In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned."）停止，并且不保存另一个列表 .

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

回复于 2024-04-28T09:31:26+08:00

给定数组的排序内容，有一个更快的方法：searchsorted .

import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]

# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop

回复于 2024-04-28T09:31:26+08:00

In [34]: a=np.arange(-10,10)

In [35]: a
Out[35]:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

In [36]: np.where(a>5)
Out[36]: (array([16, 17, 18, 19]),)

In [37]: np.where(a>5)[0][0]
Out[37]: 16

回复于 2024-04-28T09:31:26+08:00

128

我也对此感兴趣，并将所有建议的答案与perfplot进行了比较 . （免责声明：我是perfplot的作者 . ）

如果您知道您正在查看的数组是 already sorted ，那么

numpy.searchsorted(a, alpha)

是给你的 . 这是一个恒定时间操作，即速度不依赖于阵列的大小 . 你不可能比这更快 .

如果你对阵列一无所知，那你就不会错

numpy.argmax(a > alpha)

已经分类：

enter image description here

未排序：

enter image description here

重现情节的代码：

import numpy
import perfplot


alpha = 0.5

def argmax(data):
    return numpy.argmax(data > alpha)

def where(data):
    return numpy.where(data > alpha)[0][0]

def nonzero(data):
    return numpy.nonzero(data > alpha)[0][0]

def searchsorted(data):
    return numpy.searchsorted(data, alpha)

out = perfplot.show(
    # setup=numpy.random.rand,
    setup=lambda n: numpy.sort(numpy.random.rand(n)),
    kernels=[
        argmax, where,
        nonzero,
        searchsorted
        ],
    n_range=[2**k for k in range(2, 20)],
    logx=True,
    logy=True,
    xlabel='len(array)'
    )

回复于 2024-04-28T09:31:26+08:00

在元素之间具有恒定步长的数组

在 range 或任何其他线性增加的数组的情况下，您可以简单地以编程方式计算索引，根本不需要实际迭代数组：

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('no value greater than {}'.format(val))
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    # For linearly decreasing arrays or constant arrays we only need to check
    # the first element, because if that does not satisfy the condition
    # no other element will.
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

人们可能会改善这一点 . 我确保它对一些示例数组和值正常工作，但这并不意味着那里不会出错，特别是考虑到它使用了浮点数......

>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)[16]  # double check
6

>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15

鉴于它可以在没有任何迭代的情况下计算位置，它将是恒定时间（ O(1) ）并且可能击败所有其他提到的方法 . 但是它需要在数组中保持一个恒定的步长，否则会产生错误的结果 .

使用numba的一般解决方案

更通用的方法是使用numba函数：

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

这适用于任何数组，但它必须遍历数组，因此在一般情况下它将是 O(n) ：

>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16

基准

尽管NicoSchlömer已经提供了一些基准测试，但我认为包含我的新解决方案并测试不同的“ Value ”可能会有所帮助 .

测试设置：

import numpy as np
import math
import numba as nb

def first_index_using_argmax(val, arr):
    return np.argmax(arr > val)

def first_index_using_where(val, arr):
    return np.where(arr > val)[0][0]

def first_index_using_nonzero(val, arr):
    return np.nonzero(arr > val)[0][0]

def first_index_using_searchsorted(val, arr):
    return np.searchsorted(arr, val) + 1

def first_index_using_min(val, arr):
    return np.min(np.where(arr > val))

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('empty array')
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

funcs = [
    first_index_using_argmax, 
    first_index_using_min, 
    first_index_using_nonzero,
    first_index_calculate_range_like, 
    first_index_numba, 
    first_index_using_searchsorted, 
    first_index_using_where
]

from simple_benchmark import benchmark, MultiArgument

并使用以下方式生成图：

%matplotlib notebook
b.plot()

项目在开头

b = benchmark(
    funcs,
    {2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

numba函数表现最佳，其次是calculate-function和searchsorted函数 . 其他解决方案表现更差 .

项目结束了

b = benchmark(
    funcs,
    {2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

对于小型数组，numba函数执行速度惊人，但对于较大的数组，它的性能优于calculate-function和searchsorted函数 .

项目位于sqrt（len）

b = benchmark(
    funcs,
    {2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

这更有趣 . numba和计算函数再次表现很好，但实际上这触发了searchsorted的最坏情况，在这种情况下实际上效果不佳 .

没有值满足条件时的函数比较

另一个有趣的一点是，如果没有应该返回索引的值，这些函数的行为如何：

arr = np.ones(100)
value = 2

for func in funcs:
    print(func.__name__)
    try:
        print('-->', func(value, arr))
    except Exception as e:
        print('-->', e)

有了这个结果：

first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0

Searchsorted，argmax和numba只返回错误的值 . 但 searchsorted 和 numba 返回的索引不是数组的有效索引 .

函数 where ， min ， nonzero 和 calculate 抛出异常 . 但是只有 calculate 的例外实际上说有用了 .

这意味着实际上必须将这些调用包装在适当的包装函数中，该函数捕获异常或无效的返回值并进行适当的处理，至少如果您不确定该值是否可以在数组中 .

注意：calculate和 searchsorted 选项仅适用于特殊条件 . "calculate"函数需要一个常量步骤，searchsorted需要对数组进行排序 . 所以这些在适当的情况下可能有用，但不是解决这个问题的方法 . 如果您正在处理已排序的Python列表，您可能需要查看bisect模块而不是使用Numpys searchsorted .

回复于 2024-04-28T09:31:26+08:00

3
我想提议
```
np.min(np.append(np.where(aa>5)[0],np.inf))
```
这将返回满足条件的最小索引，如果从未满足条件则返回无穷大（并且 where 返回空数组） .
回复于 2024-04-28T09:31:26+08:00
1
我会去
```
i = np.min(np.where(V >= x))
```
其中 V 是向量（1d数组）， x 是值， i 是结果索引 .
回复于 2024-04-28T09:31:26+08:00

Numpy第一次出现的值大于现有值

7 回答

在元素之间具有恒定步长的数组

使用numba的一般解决方案

基准

项目在开头

项目结束了

项目位于sqrt（len）

没有值满足条件时的函数比较

相关问题