首页 文章

参数未通过rpy2传递给R

提问于
浏览
2

我在使用rpy2和R库“e1071”时遇到了一些麻烦 . 我正在尝试从SVM预测中检索概率数据,但它永远不会包含在返回的对象中 .

使用“probability = TRUE”构建调用“svm”的模型将告诉模型在请求预测时包含额外数据 . 通过具有“probability = TRUE”参数的“预测”命令返回预测数据,并且应该返回具有标签和“概率”属性的复杂数据结构 . 我的问题是概率属性未包含在结果中 . 就像概率参数永远不会包含在预测调用中一样 .

以下是一些示例代码(必须安装e1071 R库):

import numpy
import rpy2
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
importr('e1071')


# configure the data set
SAMPLES = 50
trainingDataClassless = numpy.random.random((SAMPLES, 7))
trainingDataClasses = numpy.where(numpy.random.random((SAMPLES, 1)) > 0.5, 0.0, 1.0)
trainingDataFactorClasses = rpy2.robjects.FactorVector(trainingDataClasses)

# create the args for the svm
svmargs = {"x": trainingDataClassless, "y": trainingDataFactorClasses, "probability": True,
           "kernel": "linear", "type": "C-classification"}

print("Starting SVM with parameters: %s" % (svmargs,))
svmObj = rpy2.robjects.r['svm'](**svmargs)

print("SVM Analysis")
predictOutcomes = rpy2.robjects.r['predict'](svmObj, trainingDataClassless, probability=True)
print("outcomes: %s" % (predictOutcomes,))
probs = rpy2.robjects.r['attr'](predictOutcomes, "probabilities")
print("probs: %s" % (probs,)) # should NOT be NULL!

有关R中预测函数的更多信息(带有工作概率示例)可以在第39页的e1071 documentation上找到 .

2 回答

  • 0

    该属性在某处出现丢失,可能是在生成的R对象(一个因子)的低级和高级表示之间的转换期间 .

    使用低级接口调用是一种解决方法(见下文),但如果您可以在bitbucket上的rpy2问题跟踪器上报告问题,那将非常好 .

    r_predict = rpy2.robjects.rinterface.globalenv.get('predict')
    r_traindata = rpy2.robjects.Matrix(trainingDataClassless)
    r_true = rpy2.robjects.BoolVector([True])
    predictOutcomes = r_predict(svmObj,
                                r_traindata,
                                probability=r_true)
    

    edit: 一个问题被打开......并关闭(错误已修复 - https://bitbucket.org/rpy2/rpy2/issues/299

  • 2

    你的R函数( svmpredict )需要在R方面运行,而不是在Python上运行,因为Python没有看到或知道那些专门的函数 . Python可以用于numpy样本计算,作为调用函数的管道,以及打印结果:

    # PASS PYTHON DATASET OBJECTS INTO R  
    # numpy objects => R matrices 
    tdClassless_row,tdClassess_col = trainingDataClassless.shape
    rmatrix_tdClassless = rpy2.robjects.r.matrix(tdClassless, 
                                nrow=tdClassless_row, ncol=tdClassless_col)
    rpy2.robjects.r.assign("tdClassless", rmatrix_tdClassless)
    
    tdFactorClasses_row,tdFactorClasses_col = trainingDataFactorClasses.shape
    rmatrx_tdFactorClasses = rpy2.robjects.r.matrix(tdFactorClasses, 
                                nrow=tdFactorClasses_row, ncol=tdFactorClasses_col)
    rpy2.robjects.r.assign("tdFactorClasses", rmatrix_tdFactorClassless)
    
    # OBTAIN THE SVM FUNCTION
    rsvm_funct = rpy2.robjects.globalenv['svm']
    
    # PASS SVM PARAMETERS
    svmObj_py = rsvm_funct (
         rpy2.robjects('x = tdClassless'), 
         rpy2.robjects('y = tdFactorClasses'),
         rpy2.robjects('probability = TRUE'),
         rpy2.robjects('kernel = "linear"'), 
         rpy2.robjects('type = "C-classification"')
    )
    # ASSIGN svmObj in R 
    rpy2.robjects.r.assign("svmObj", svmObj_py)
    
    # OBTAIN THE PREDICT FUNCTION
    rpredict_funct = rpy2.robjects.globalenv['predict']
    
    // PASS PREDICT PARAMETERS
    predictOutcomes = rpredict_funct(
         rpy2.robjects('svmObj'), 
         rpy2.robjects('tdClassless'), 
         rpy2.robjects('probability = TRUE')
    )
    

相关问题