首页 文章

如何向NumPy数组添加额外的列

提问于
浏览
208

假设我有一个NumPy数组, a

a = np.array([
    [1, 2, 3],
    [2, 3, 4]
    ])

我想添加一列零来获取数组, b

b = np.array([
    [1, 2, 3, 0],
    [2, 3, 4, 0]
    ])

我怎样才能在NumPy中轻松完成这项工作?

14 回答

  • 38

    派对有点晚了,但是还没有人发布这个答案,所以为了完整起见:你可以在普通的Python数组上使用列表推导来做到这一点:

    source = a.tolist()
    result = [row + [0] for row in source]
    b = np.array(result)
    
  • 4

    我认为更直接的解决方案和更快的启动是执行以下操作:

    import numpy as np
    N = 10
    a = np.random.rand(N,N)
    b = np.zeros((N,N+1))
    b[:,:-1] = a
    

    和时间:

    In [23]: N = 10
    
    In [24]: a = np.random.rand(N,N)
    
    In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
    10000 loops, best of 3: 19.6 us per loop
    
    In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
    100000 loops, best of 3: 5.62 us per loop
    
  • 110

    np.r_[ ... ]np.c_[ ... ]vstackhstack 的有用替代品,方括号[]而不是round() .
    几个例子:

    : import numpy as np
    : N = 3
    : A = np.eye(N)
    
    : np.c_[ A, np.ones(N) ]              # add a column
    array([[ 1.,  0.,  0.,  1.],
           [ 0.,  1.,  0.,  1.],
           [ 0.,  0.,  1.,  1.]])
    
    : np.c_[ np.ones(N), A, np.ones(N) ]  # or two
    array([[ 1.,  1.,  0.,  0.,  1.],
           [ 1.,  0.,  1.,  0.,  1.],
           [ 1.,  0.,  0.,  1.,  1.]])
    
    : np.r_[ A, [A[1]] ]              # add a row
    array([[ 1.,  0.,  0.],
           [ 0.,  1.,  0.],
           [ 0.,  0.,  1.],
           [ 0.,  1.,  0.]])
    : # not np.r_[ A, A[1] ]
    
    : np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
      array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])
    
    : np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
      array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])
    
    : np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
      array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])
    
    : np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
      array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])
    

    (方括号[]而不是round()的原因是Python扩展为例如1:4的正方形 - 超载的奇迹 . )

  • 27

    使用 numpy.append

    >>> a = np.array([[1,2,3],[2,3,4]])
    >>> a
    array([[1, 2, 3],
           [2, 3, 4]])
    
    >>> z = np.zeros((2,1), dtype=int64)
    >>> z
    array([[0],
           [0]])
    
    >>> np.append(a, z, axis=1)
    array([[1, 2, 3, 0],
           [2, 3, 4, 0]])
    
  • 8

    使用hstack的一种方法是:

    b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))
    
  • 21

    我认为:

    np.column_stack((a, zeros(shape(a)[0])))
    

    更优雅 .

  • 11

    我发现以下最优雅:

    b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3
    

    insert 的一个优点是它还允许您在数组内的其他位置插入列(或行) . 此外,您可以轻松插入整个矢量,而不是插入单个值,例如复制最后一列:

    b = np.insert(a, insert_index, values=a[:,2], axis=1)
    

    这导致:

    array([[1, 2, 3, 3],
           [2, 3, 4, 4]])
    

    对于时间安排, insert 可能比JoshAdel的解决方案慢:

    In [1]: N = 10
    
    In [2]: a = np.random.rand(N,N)
    
    In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
    100000 loops, best of 3: 7.5 µs per loop
    
    In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
    100000 loops, best of 3: 2.17 µs per loop
    
    In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
    100000 loops, best of 3: 10.2 µs per loop
    
  • 130

    我也对这个问题感兴趣并比较了它的速度

    numpy.c_[a, a]
    numpy.stack([a, a]).T
    numpy.vstack([a, a]).T
    numpy.ascontiguousarray(numpy.stack([a, a]).T)               
    numpy.ascontiguousarray(numpy.vstack([a, a]).T)
    numpy.column_stack([a, a])
    numpy.concatenate([a[:,None], a[:,None]], axis=1)
    numpy.concatenate([a[None], a[None]], axis=0).T
    

    对于任何输入向量 a 都执行相同的操作 . 增长的时间 a

    enter image description here

    请注意,所有非连续变体(特别是 stack / vstack )最终都比所有连续变体更快 . column_stack (因为它的清晰度和速度)似乎是一个很好的选择,如果你需要连续性 .


    重现情节的代码:

    import numpy
    import perfplot
    
    perfplot.show(
        setup=lambda n: numpy.random.rand(n),
        kernels=[
            lambda a: numpy.c_[a, a],
            lambda a: numpy.ascontiguousarray(numpy.stack([a, a]).T),
            lambda a: numpy.ascontiguousarray(numpy.vstack([a, a]).T),
            lambda a: numpy.column_stack([a, a]),
            lambda a: numpy.concatenate([a[:, None], a[:, None]], axis=1),
            lambda a: numpy.ascontiguousarray(numpy.concatenate([a[None], a[None]], axis=0).T),
            lambda a: numpy.stack([a, a]).T,
            lambda a: numpy.vstack([a, a]).T,
            lambda a: numpy.concatenate([a[None], a[None]], axis=0).T,
            ],
        labels=[
            'c_', 'ascont(stack)', 'ascont(vstack)', 'column_stack', 'concat',
            'ascont(concat)', 'stack (non-cont)', 'vstack (non-cont)',
            'concat (non-cont)'
            ],
        n_range=[2**k for k in range(20)],
        xlabel='len(a)',
        logx=True,
        logy=True,
        )
    
  • 4

    np.concatenate也有效

    >>> a = np.array([[1,2,3],[2,3,4]])
    >>> a
    array([[1, 2, 3],
           [2, 3, 4]])
    >>> z = np.zeros((2,1))
    >>> z
    array([[ 0.],
           [ 0.]])
    >>> np.concatenate((a, z), axis=1)
    array([[ 1.,  2.,  3.,  0.],
           [ 2.,  3.,  4.,  0.]])
    
  • 1

    我喜欢JoshAdel的答案,因为他专注于表现 . 一个小的性能改进是避免用零初始化的开销,只是被覆盖 . 当N很大,使用空而不是零时,这有一个可衡量的差异,并且零列被写为单独的步骤:

    In [1]: import numpy as np
    
    In [2]: N = 10000
    
    In [3]: a = np.ones((N,N))
    
    In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
    1 loops, best of 3: 492 ms per loop
    
    In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
    1 loops, best of 3: 407 ms per loop
    
  • 8

    假设 M 是(100,3)ndarray而 y 是(100,)ndarray append 可以使用如下:

    M=numpy.append(M,y[:,None],1)
    

    诀窍是使用

    y[:, None]
    

    这会将 y 转换为(100,1)2D数组 .

    M.shape
    

    现在给

    (100, 4)
    
  • 25

    np.insert也是有用的 .

    matA = np.array([[1,2,3], 
                     [2,3,4]])
    idx = 3
    new_col = np.array([0, 0])
    np.insert(matA, idx, new_col, axis=1)
    
    array([[1, 2, 3, 0],
           [2, 3, 4, 0]])
    

    它在给定索引之前插入值 new_col ,此处沿一个轴插入 idx . 换句话说,新插入的值将占据 idx 列,并向后移动 idx 之后的原始位置 .

  • 0

    有一个专门针对此的功能 . 它被称为numpy.pad

    a = np.array([[1,2,3], [2,3,4]])
    b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
    print b
    >>> array([[1, 2, 3, 0],
               [2, 3, 4, 0]])
    

    以下是docstring中的内容:

    Pads an array.
    
    Parameters
    ----------
    array : array_like of rank N
        Input array
    pad_width : {sequence, array_like, int}
        Number of values padded to the edges of each axis.
        ((before_1, after_1), ... (before_N, after_N)) unique pad widths
        for each axis.
        ((before, after),) yields same before and after pad for each axis.
        (pad,) or int is a shortcut for before = after = pad width for all
        axes.
    mode : str or function
        One of the following string values or a user supplied function.
    
        'constant'
            Pads with a constant value.
        'edge'
            Pads with the edge values of array.
        'linear_ramp'
            Pads with the linear ramp between end_value and the
            array edge value.
        'maximum'
            Pads with the maximum value of all or part of the
            vector along each axis.
        'mean'
            Pads with the mean value of all or part of the
            vector along each axis.
        'median'
            Pads with the median value of all or part of the
            vector along each axis.
        'minimum'
            Pads with the minimum value of all or part of the
            vector along each axis.
        'reflect'
            Pads with the reflection of the vector mirrored on
            the first and last values of the vector along each
            axis.
        'symmetric'
            Pads with the reflection of the vector mirrored
            along the edge of the array.
        'wrap'
            Pads with the wrap of the vector along the axis.
            The first values are used to pad the end and the
            end values are used to pad the beginning.
        <function>
            Padding function, see Notes.
    stat_length : sequence or int, optional
        Used in 'maximum', 'mean', 'median', and 'minimum'.  Number of
        values at edge of each axis used to calculate the statistic value.
    
        ((before_1, after_1), ... (before_N, after_N)) unique statistic
        lengths for each axis.
    
        ((before, after),) yields same before and after statistic lengths
        for each axis.
    
        (stat_length,) or int is a shortcut for before = after = statistic
        length for all axes.
    
        Default is ``None``, to use the entire axis.
    constant_values : sequence or int, optional
        Used in 'constant'.  The values to set the padded values for each
        axis.
    
        ((before_1, after_1), ... (before_N, after_N)) unique pad constants
        for each axis.
    
        ((before, after),) yields same before and after constants for each
        axis.
    
        (constant,) or int is a shortcut for before = after = constant for
        all axes.
    
        Default is 0.
    end_values : sequence or int, optional
        Used in 'linear_ramp'.  The values used for the ending value of the
        linear_ramp and that will form the edge of the padded array.
    
        ((before_1, after_1), ... (before_N, after_N)) unique end values
        for each axis.
    
        ((before, after),) yields same before and after end values for each
        axis.
    
        (constant,) or int is a shortcut for before = after = end value for
        all axes.
    
        Default is 0.
    reflect_type : {'even', 'odd'}, optional
        Used in 'reflect', and 'symmetric'.  The 'even' style is the
        default with an unaltered reflection around the edge value.  For
        the 'odd' style, the extented part of the array is created by
        subtracting the reflected values from two times the edge value.
    
    Returns
    -------
    pad : ndarray
        Padded array of rank equal to `array` with shape increased
        according to `pad_width`.
    
    Notes
    -----
    .. versionadded:: 1.7.0
    
    For an array with rank greater than 1, some of the padding of later
    axes is calculated from padding of previous axes.  This is easiest to
    think about with a rank 2 array where the corners of the padded array
    are calculated by using padded values from the first axis.
    
    The padding function, if used, should return a rank 1 array equal in
    length to the vector argument with padded values replaced. It has the
    following signature::
    
        padding_func(vector, iaxis_pad_width, iaxis, kwargs)
    
    where
    
        vector : ndarray
            A rank 1 array already padded with zeros.  Padded values are
            vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
        iaxis_pad_width : tuple
            A 2-tuple of ints, iaxis_pad_width[0] represents the number of
            values padded at the beginning of vector where
            iaxis_pad_width[1] represents the number of values padded at
            the end of vector.
        iaxis : int
            The axis currently being calculated.
        kwargs : dict
            Any keyword arguments the function requires.
    
    Examples
    --------
    >>> a = [1, 2, 3, 4, 5]
    >>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
    array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])
    
    >>> np.pad(a, (2, 3), 'edge')
    array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])
    
    >>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
    array([ 5,  3,  1,  2,  3,  4,  5,  2, -1, -4])
    
    >>> np.pad(a, (2,), 'maximum')
    array([5, 5, 1, 2, 3, 4, 5, 5, 5])
    
    >>> np.pad(a, (2,), 'mean')
    array([3, 3, 1, 2, 3, 4, 5, 3, 3])
    
    >>> np.pad(a, (2,), 'median')
    array([3, 3, 1, 2, 3, 4, 5, 3, 3])
    
    >>> a = [[1, 2], [3, 4]]
    >>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
    array([[1, 1, 1, 2, 1, 1, 1],
           [1, 1, 1, 2, 1, 1, 1],
           [1, 1, 1, 2, 1, 1, 1],
           [1, 1, 1, 2, 1, 1, 1],
           [3, 3, 3, 4, 3, 3, 3],
           [1, 1, 1, 2, 1, 1, 1],
           [1, 1, 1, 2, 1, 1, 1]])
    
    >>> a = [1, 2, 3, 4, 5]
    >>> np.pad(a, (2, 3), 'reflect')
    array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])
    
    >>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
    array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])
    
    >>> np.pad(a, (2, 3), 'symmetric')
    array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])
    
    >>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
    array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])
    
    >>> np.pad(a, (2, 3), 'wrap')
    array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])
    
    >>> def pad_with(vector, pad_width, iaxis, kwargs):
    ...     pad_value = kwargs.get('padder', 10)
    ...     vector[:pad_width[0]] = pad_value
    ...     vector[-pad_width[1]:] = pad_value
    ...     return vector
    >>> a = np.arange(6)
    >>> a = a.reshape((2, 3))
    >>> np.pad(a, 2, pad_with)
    array([[10, 10, 10, 10, 10, 10, 10],
           [10, 10, 10, 10, 10, 10, 10],
           [10, 10,  0,  1,  2, 10, 10],
           [10, 10,  3,  4,  5, 10, 10],
           [10, 10, 10, 10, 10, 10, 10],
           [10, 10, 10, 10, 10, 10, 10]])
    >>> np.pad(a, 2, pad_with, padder=100)
    array([[100, 100, 100, 100, 100, 100, 100],
           [100, 100, 100, 100, 100, 100, 100],
           [100, 100,   0,   1,   2, 100, 100],
           [100, 100,   3,   4,   5, 100, 100],
           [100, 100, 100, 100, 100, 100, 100],
           [100, 100, 100, 100, 100, 100, 100]])
    
  • 241

    就我而言,我不得不在NumPy数组中添加一列

    X = array([ 6.1101, 5.5277, ... ])
    X.shape => (97,)
    X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)
    

    在X.shape =>(97,2)之后

    array([[ 1. , 6.1101],
           [ 1. , 5.5277],
    ...
    

相关问题