Numpy数组矢量化交集

我有一个3d numpy数组,表示从我的分类器获得的形状(NxKxM)的用户ID,我想计算jaccard索引(len交集/ len联合),以检查我的groupkfold期间重叠的比例 . 我需要在N(超行数的数量)和M(数量或迭代)轴上计算此函数,而K是CV折叠数 . 我希望有类似的东西:

A [0] [0] [:]与A [1:] [:] [:]进行比较,A [0] [1] [:]与A [1:] [:] [:]进行比较对于第一级,将[0] [2] [:]与A [1:] [:] [:]进行比较,依此类推 . (在k = 3的情况下)

我尝试过嵌套for循环,但当然代码非常慢,到目前为止我有这种情况:

for elem in range(len(total_users_splits)):
    for subelem in range(elem+1,len(total_users_splits)):
        for i in range(n_splits):
            for j in range(n_splits):
                first = b[elem][i]
                second = b[subelem][j]
                total_num = len(np.union1d(first,second))
                intersect_len = len(np.intersect1d(first,second))
                X.append(intersect_len/total_num)
overlap = {'overlap_ratio_mean_uids':np.nanmean(X),
           'overlap_ratio_std_uids':np.nanstd(X),
           'overlap_ratio_max_uids':np.max(X),
           'overlap_ratio_min_uids':np.min(X)}

total_user_splits是维度NxM的列表,n_splits是k .

代码非常慢,但我不知道如何以矢量化方式应用np.intersect

回答(0)