Python中两个不同大小的矩阵之间的相关性[重复]-Java 学习之路

这个问题在这里已有答案：

Computing the correlation coefficient between two multi-dimensional arrays 2个答案
Efficient pairwise correlation for two matrices of features 2个答案

我有两个矩阵p（500x10000）和h（500x256），我需要在Python中计算相关性 .

在Matlab中，我使用了corr（）函数，没有任何问题：myCorrelation = corr（p，h）;

在numpy，我试过 np.corrcoef( p, h ) ：

File "/usr/local/lib/python2.7/site-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

我也试过 np.correlate( p, h ) ：

File "/usr/local/lib/python2.7/site-packages/numpy/core/numeric.py", line 975, in correlate
    return multiarray.correlate2(a, v, mode)
ValueError: object too deep for desired array

输入：

pw.shape = (500, 10000)
hW.shape = (500, 256)

首先，我试过这个：

myCorrelationMatrix, _ = scipy.stats.pearsonr( pw, hW )

结果：

myCorrelationMatrix, _ = scipy.stats.pearsonr( pw, hW )
  File "/usr/local/lib/python2.7/site-packages/scipy/stats/stats.py", line 3019, in pearsonr
    r_num = np.add.reduce(xm * ym)
ValueError: operands could not be broadcast together with shapes (500,10000) (500,256)

并试过这个：

myCorrelationMatrix = corr2_coeff( pw, hW )

其中 corr2_coeff 根据1是：

def corr2_coeff(A,B) :
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(1)[:,None]
    B_mB = B - B.mean(1)[:,None]

    # Sum of squares across rows
    ssA = (A_mA**2).sum(1);
    ssB = (B_mB**2).sum(1);

    # Finally get corr coeff
    return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))

结果是这样的：

myCorrelationMatrix, _ = corr2_coeff( powerTraces, hW )
  File "./myScript.py", line 175, in corr2_coeff
    return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))
ValueError: shapes (500,10000) and (256,500) not aligned: 10000 (dim 1) != 256 (dim 0)

最后尝试了这个：

myCorrelationMatrix = corr_coeff( pw, hW )

其中 corr_coeff 根据2是：

def corr_coeff(A,B) :
    # Get number of rows in either A or B
    N = B.shape[0]

    # Store columnw-wise in A and B, as they would be used at few places
    sA = A.sum(0)
    sB = B.sum(0)

    # Basically there are four parts in the formula. We would compute them one-by-one
    p1 = N*np.einsum('ij,ik->kj',A,B)
    p2 = sA*sB[:,None]
    p3 = N*((B**2).sum(0)) - (sB**2)
    p4 = N*((A**2).sum(0)) - (sA**2)

    # Finally compute Pearson Correlation Coefficient as 2D array
    pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))

    # Get the element corresponding to absolute argmax along the columns
#   out = pcorr[np.nanargmax(np.abs(pcorr),axis=0),np.arange(pcorr.shape[1])]

    return pcorr

结果是：

RuntimeWarning: invalid value encountered in sqrt
  pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))
RuntimeWarning: invalid value encountered in divide
  pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))

更新

这不是重复，我已经尝试了你在Computing the correlation coefficient between two multi-dimensional arrays和Efficient pairwise correlation for two matrices of features上给出的两种方法，但没有一种方法有效 .

2 回答

0

在矩阵产品中，相等的尺寸必须在产品“内部”：A [m x n] * B [n x k] . 由于相关性是元素乘积的总和，因此它类似于具有先前归一化的矩阵乘积 . 您可以尝试转置第一个或第二个矩阵 .

回复于 2024-05-19T06:29:36+08:00
1

您可以将两个数据帧连接成一个数组大小（500,10256），然后在合并的数组和子集上运行np.corrcoef（）以查看感兴趣的变量的相关性 .

它不是很有效，但它会起作用 .

回复于 2024-05-19T06:29:36+08:00

Python中两个不同大小的矩阵之间的相关性[重复]

更新

2 回答

相关问题