首页 文章

SKlearn中具有嵌套交叉验证的分类报告

提问于
浏览
6

是否可以通过一些解决方法从cross_val_score获取分类报告?我正在使用嵌套交叉验证,我可以在这里获得一个模型的各种分数,但是,我想看到外循环的分类报告 . 有什么建议?

# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)

我想在评分值旁边看到分类报告 . http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

2 回答

  • 7

    我们可以定义我们自己的评分函数,如下所示:

    from sklearn.metrics import classification_report, accuracy_score, make_scorer
    
    def classification_report_with_accuracy_score(y_true, y_pred):
    
        print classification_report(y_true, y_pred) # print classification report
        return accuracy_score(y_true, y_pred) # return accuracy score
    

    现在,使用我的新评分函数,使用 make_scorer 调用 cross_val_score

    # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, \
                   scoring=make_scorer(classification_report_with_accuracy_score))
    print nested_score
    

    它会将分类报告打印为文本,同时将 nested_score 作为数字返回 .

    http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html例如,当使用这个新的评分函数运行时,输出的最后几行将如下所示:

    #   precision    recall  f1-score   support    
    #0       1.00      1.00      1.00        14
    #1       1.00      1.00      1.00        14
    #2       1.00      1.00      1.00         9
    
    #avg / total       1.00      1.00      1.00        37
    
    #[ 0.94736842  1.          0.97297297  1. ]
    
    #Average difference of 0.007742 with std. dev. of 0.007688.
    
  • 8

    它只是Sandipan答案的补充,因为我无法编辑它 . 如果我们想要计算完整的交叉验证运行的平均分类报告而不是单个折叠,我们可以使用以下代码:

    # Variables for average classification report
    originalclass = []
    predictedclass = []
    
    #Make our customer score
    def classification_report_with_accuracy_score(y_true, y_pred):
        originalclass.extend(y_true)
        predictedclass.extend(y_pred)
        return accuracy_score(y_true, y_pred) # return accuracy score
    
    inner_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
    outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
    
    # Non_nested parameter search and scoring
    clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
    
    # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, scoring=make_scorer(classification_report_with_accuracy_score))
    
    # Average values in classification report for all folds in a K-fold Cross-validation  
    print(classification_report(originalclass, predictedclass))
    

    现在Sandipan答案中的示例结果如下所示:

    precision    recall  f1-score   support
    
              0       1.00      1.00      1.00        50
              1       0.96      0.94      0.95        50
              2       0.94      0.96      0.95        50
    
    avg / total       0.97      0.97      0.97       150
    

相关问题