首页 文章

将密集矩阵代码转换为稀疏矩阵代码

提问于
浏览
0

我正在尝试将此代码转换为使用scipy稀疏矩阵,因为实际矩阵非常大,但我遇到了麻烦 . 请任何人可以帮忙吗?

import numpy as np
G = np.array([[0., 50., 50., 0.],
              [10., 0., 10., 0.],
              [0., 0., 0., 10.],
              [2., 0., 2., 0.]])
s = G.sum(axis=0)
m = np.minimum(G, 1).transpose()
sm = s * m
sm_rnorm = (sm / sm.sum(axis=0))
smm = sm * sm_rnorm
G += smm.transpose()
print(G)

我尝试了以下方法:

import numpy as np
from scipy.sparse import csc_matrix
G = np.array([[0.,50.,50.,0.],
              [10.,0.,10.,0.],
              [0.,0.,0.,10.],
              [2.,0.,2.,0.]])
G = csc_matrix(G, dtype=np.float)
s = csc_matrix(G.sum(axis=0))
m = csc_matrix.minimum(G, 1).transpose()
sm = s * m
sm_rnorm = (sm / csc_matrix(sm.sum(axis=0)))
smm = sm * sm_rnorm
G += smm.transpose()
print(G)

...但得到 ValueError: dimension mismatch

1 回答

  • 1

    我跑密码,

    In [224]: G = np.array([[0., 50., 50., 0.],
         ...:               [10., 0., 10., 0.],
         ...:               [0., 0., 0., 10.],
         ...:               [2., 0., 2., 0.]])
         ...: s = G.sum(axis=0)
         ...: m = np.minimum(G, 1).transpose()
         ...: sm = s * m
         ...: sm_rnorm = (sm / sm.sum(axis=0))
         ...: smm = sm * sm_rnorm
         ...:               
    In [225]: s
    Out[225]: array([12., 50., 62., 10.])
    In [226]: m
    Out[226]: 
    array([[0., 1., 0., 1.],
           [1., 0., 0., 0.],
           [1., 1., 0., 1.],
           [0., 0., 1., 0.]])
    In [227]: sm
    Out[227]: 
    array([[ 0., 50.,  0., 10.],
           [12.,  0.,  0.,  0.],
           [12., 50.,  0., 10.],
           [ 0.,  0., 62.,  0.]])
    

    然后启动稀疏版本:

    In [192]: from scipy import sparse
    In [228]: Gm = sparse.csr_matrix(G)
    In [229]: Gm
    Out[229]: 
    <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 7 stored elements in Compressed Sparse Row format>
    In [230]: s_m = Gm.sum(axis=0)
    In [231]: s_m
    Out[231]: matrix([[12., 50., 62., 10.]])
    In [233]: m_m = Gm.minimum(1).T
    In [234]: m_m.A
    Out[234]: 
    array([[0., 1., 0., 1.],
           [1., 0., 0., 0.],
           [1., 1., 0., 1.],
           [0., 0., 1., 0.]])
    

    哎呀:

    In [236]: s_m * m_m
    Out[236]: matrix([[112.,  74.,  10.,  74.]])
    

    * 如果 np.matrix 和稀疏矩阵的矩阵乘法

    In [237]: s.dot(m)
    Out[237]: array([112.,  74.,  10.,  74.])
    

    稀疏矩阵元明智乘法:

    In [242]: sm_m = m_m.multiply(s_m)
    In [243]: sm_m.A
    Out[243]: 
    array([[ 0., 50.,  0., 10.],
           [12.,  0.,  0.,  0.],
           [12., 50.,  0., 10.],
           [ 0.,  0., 62.,  0.]])
    

    现在匹配 sm_rnorm

    In [244]: sm_m.sum(axis=0)
    Out[244]: matrix([[ 24., 100.,  62.,  20.]])
    In [250]: sm_m / sm_m.sum(axis=0)
    Out[250]: 
    matrix([[0. , 0.5, 0. , 0.5],
            [0.5, 0. , 0. , 0. ],
            [0.5, 0.5, 0. , 0.5],
            [0. , 0. , 1. , 0. ]])
    

    sparse/dense 以元素方式工作,但 sparse/sparse 有问题:

    In [252]: sm_m / sparse.csr_matrix(sm_m.sum(axis=0))
    ----> 1 sm_m / sparse.csr_matrix(sm_m.sum(axis=0))
    --> 576         return self._divide(other, true_divide=True)
        568             if true_divide and np.can_cast(self.dtype, np.float_):
    ValueError: inconsistent shapes
    

    我认为这是一个矩阵划分问题,但我会进一步深入研究 .

    sm_m.multiply(1 / sm_m.sum(axis=0)) 给出一个具有正确值的稀疏矩阵,但速度较慢(至少对于此示例) .

    smm_m = sm_m.multiply( sm_m / sm_m.sum(axis=0)) 匹配 smm . 并且 Gm += smm_m 有效 . 稀疏 += 没有't raise an efficiency error because it doesn'吨改变稀疏性 .

    因此关键问题是保持矩阵乘法和元素乘法(以及相应的除法) .

    w / sklearn

    sklearn.utils.sparsefuncs 有一些稀疏的实用功能

    以上 sm_mcoo 格式数组(不确定原因):

    In [366]: sm_m
    Out[366]: 
    <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 7 stored elements in COOrdinate format>
    In [367]: sm_m.A
    Out[367]: 
    array([[ 0., 50.,  0., 10.],
           [12.,  0.,  0.,  0.],
           [12., 50.,  0., 10.],
           [ 0.,  0., 62.,  0.]])
    

    将其转换为 csr

    In [368]: sm_m1 = sm_m.tocsr()
    In [369]: sm_m1
    Out[369]: 
    <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 7 stored elements in Compressed Sparse Row format>
    

    派生列缩放数组:

    In [370]: x = sm_m1.sum(axis=0)
    In [371]: x
    Out[371]: matrix([[ 24., 100.,  62.,  20.]])
    In [372]: x = 1/x.A1      # .A1 makes a 1d array from np.matrix
    

    在地点应用扩展:

    In [373]: sklearn.utils.sparsefuncs.inplace_csr_column_scale(sm_m1,x)
    In [374]: sm_m1.A
    Out[374]: 
    array([[0. , 0.5, 0. , 0.5],
           [0.5, 0. , 0. , 0. ],
           [0.5, 0.5, 0. , 0.5],
           [0. , 0. , 1. , 0. ]])
    

    inplace column_scale很简单:

    def inplace_csr_column_scale(X, scale):
        # ....
        X.data *= scale.take(X.indices, mode='clip')
    

    m_m.multiply(s_m) 步也可以这样做:

    In [380]: m1_m = m_m.tocsr()
    In [381]: sklearn.utils.sparsefuncs.inplace_csr_column_scale(m1_m,s_m.A1)
    In [382]: m1_m.A
    Out[382]: 
    array([[ 0., 50.,  0., 10.],
           [12.,  0.,  0.,  0.],
           [12., 50.,  0., 10.],
           [ 0.,  0., 62.,  0.]])
    

    我怀疑代码可以清理,删除转置等 .

    G 天生就是方形?我喜欢使用非方形数组来更好地跟踪形状,转置和尺寸总和 . 我尝试将 G 扩展为(5,4),并在 s*m 步骤遇到问题 .

相关问题