如何在NumPy数组中获得N个最大值的索引？-Java 学习之路

323

NumPy提出了一种通过 np.argmax 获取数组最大值索引的方法 .

我想要一个类似的东西，但返回N个最大值的索引 .

例如，如果我有一个数组， [1, 3, 2, 4, 5] ， function(array, n=3) 将返回 [4, 3, 1] .

15 回答

2

bottleneck有一个局部排序函数，如果排序整个数组的费用只是为了得到N个最大值太大了 .

我对这个模块一无所知;我只是用Google搜索numpy partial sort .

回复于 2024-04-29T13:17:17+08:00
2
以下是查看最大元素及其位置的简单方法 . 这里 axis 是域名; axis = 0表示列方式最大数， axis = 1表示2D情况的行方式最大数 . 对于更高的尺寸，它取决于你 .
```
M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))
```
回复于 2024-04-29T13:17:17+08:00
396
较新的NumPy版本（1.8及更高版本）具有一个名为argpartition的功能 . 要获得四个最大元素的索引，请执行
```
>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])
```
与 argsort 不同，此函数在最坏的情况下以线性时间运行，但返回的索引未排序，从评估 a[ind] 的结果可以看出 . 如果您也需要，请在之后对其进行排序：
```
>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])
```
以这种方式按排序顺序获取top-k元素需要O（n k log k）时间 .
回复于 2024-04-29T13:17:17+08:00
0
我能想到的最简单的是：
```
In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])
```
这涉及到完整的数组 . 我想知道 numpy 是否提供了一种内置的方式来进行局部排序;到目前为止，我还没有找到一个 .

如果这个解决方案变得太慢（特别是对于小型的 n ），那么在Cython中编写一些内容可能是值得的 .
回复于 2024-04-29T13:17:17+08:00
1
我发现使用 np.unique 最直观 .

这个想法是，唯一方法返回输入值的索引 . 然后，根据最大唯一值和指标，可以重新创建原始值的位置 .
```
multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]
```
回复于 2024-04-29T13:17:17+08:00

使用：

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

对于常规Python列表：

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

如果您使用Python 2，请使用 xrange 而不是 range .

资料来源：heapq — Heap queue algorithm

回复于 2024-04-29T13:17:17+08:00

4
这将比完整排序更快，具体取决于原始数组的大小和选择的大小：
```
>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])
```
当然，它涉及篡改原始阵列 . 您可以通过复制或替换原始值来修复（如果需要） . ...以您的用例为准 .
回复于 2024-04-29T13:17:17+08:00
226
如果你不关心第K个最大元素的顺序，你可以使用argpartition，它应该比通过 argsort 的完整排序表现更好 .
```
K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])
```
积分转至this question .

我运行了一些测试，它看起来像 argpartition 优于 argsort ，因为数组的大小和K的值增加 .
回复于 2024-04-29T13:17:17+08:00

使用：

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

它也适用于2D阵列 . 例如，

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

回复于 2024-04-29T13:17:17+08:00

如果你碰巧使用多维数组，那么你需要展平和解开索引：

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

例如：

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

回复于 2024-04-29T13:17:17+08:00

对于多维数组，您可以使用 axis 关键字来沿预期轴应用分区 .

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

并 grab 物品：

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

但请注意，这不会返回排序结果 . 在这种情况下，您可以沿预期轴使用 np.argsort() ：

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

这是一个例子：

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

回复于 2024-04-29T13:17:17+08:00

6
更简单：
```
idx = (-arr).argsort()[:n]
```
其中n是最大值的数量 .
回复于 2024-04-29T13:17:17+08:00
6
使用：
```
from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))
```
现在 result 列表将包含 N 元组（ index ， value ），其中 value 被最大化 .
回复于 2024-04-29T13:17:17+08:00
2

方法 np.argpartition 仅返回k个最大索引，执行本地排序，并且当数组非常大时比 np.argsort （执行完整排序）更快 . 但返回的指数是 NOT in ascending/descending order . 让我们举一个例子：

我们可以看到，如果你想要一个严格的升序前k个索引， np.argpartition 将不会返回你想要的 .

除了在np.argpartition之后手动进行排序之外，我的解决方案是使用PyTorch，torch.topk，一种用于神经网络构建的工具，提供类似NumPy的API，同时支持CPU和GPU . 它与使用MKL的NumPy一样快，如果需要大型矩阵/矢量计算，则可以提供GPU提升 .

严格的上升/下降前k个索引代码将是：

请注意torch.topk接受火炬张量，并返回 torch.Tensor 类型中的前k个值和前k个索引 . 与np类似，torch.topk也接受一个axis参数，以便您可以处理多维数组/张量 .

回复于 2024-04-29T13:17:17+08:00
0
我认为最有效的方法是手动迭代数组并保持k大小的最小堆，正如其他人提到的那样 .

而且我也提出了一种蛮力方法：
```
top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')
```
使用argmax获取其索引后，将最大元素设置为较大的负值 . 接下来argmax的调用将返回第二大元素 . 您可以记录这些元素的原始值并根据需要恢复它们 .
回复于 2024-04-29T13:17:17+08:00

如何在NumPy数组中获得N个最大值的索引？

15 回答

相关问题