在scikit-learn中使用交叉验证时绘制Precision-Recall曲线-Java 学习之路

我正在使用交叉验证来评估分类器的性能 scikit-learn ，我想绘制Precision-Recall曲线 . 我在scikit-learn的网站上找到an example来绘制PR曲线，但它没有使用交叉验证进行评估 .

在使用交叉验证时，如何绘制scikit中的Precision-Recall曲线？

我做了以下但我不确定这是否是正确的方法（psudo代码）：

for each k-fold:

   precision, recall, _ =  precision_recall_curve(y_test, probs)
   mean_precision += precision
   mean_recall += recall

mean_precision /= num_folds
mean_recall /= num_folds

plt.plot(recall, precision)

你怎么看？

编辑：

它不起作用，因为每次折叠后 precision 和 recall 数组的大小不同 .

任何人？

2 回答

6
不是在每次折叠后记录精度和召回值，而是在每次折叠后将 predictions 存储在测试样品上 . 接下来，收集所有测试（即袋外）预测并计算精度和召回率 .
```
## let test_samples[k] = test samples for the kth fold (list of list)
 ## let train_samples[k] = test samples for the kth fold (list of list)

 for k in range(0, k):
      model = train(parameters, train_samples[k])
      predictions_fold[k] = predict(model, test_samples[k])

 # collect predictions
 predictions_combined = [p for preds in predictions_fold for p in preds]

 ## let predictions = rearranged predictions s.t. they are in the original order

 ## use predictions and labels to compute lists of TP, FP, FN
 ## use TP, FP, FN to compute precisions and recalls for one run of k-fold cross-validation
```
在单次，完整的k-fold交叉验证运行中，预测器对每个样本进行一次且仅一次预测 . 给定n个样本，您应该有n个测试预测 .

（注意：这些预测与训练预测不同，因为预测器会对每个样本进行预测，而不会事先看到它 . ）

除非您使用 leave-one-out cross-validation ，否则k折交叉验证通常需要对数据进行随机分区 . 理想情况下，您可以进行 repeated （和 stratified ）k倍交叉验证 . 然而，组合来自不同轮次的精确回忆曲线并不是直截了当的，因为与ROC不同，您不能在精确回忆点之间使用简单的线性插值（参见Davis and Goadrich 2006） .

我个人使用Davis-Goadrich方法计算 AUC-PR 在PR空间中进行插值（随后进行数值积分），并使用重复分层10倍交叉验证的AUC-PR估计值对比分类器 .

对于一个不错的情节，我展示了一个交叉验证轮次的代表性PR曲线 .

当然，还有许多其他评估分类器性能的方法，具体取决于数据集的性质 .

例如，如果数据集中（二进制）标签的比例没有偏差（即大约为50-50），则可以使用更简单的ROC分析和交叉验证：

收集每个折叠的预测并构建ROC曲线（如前所述），收集所有TPR-FPR点（即采用所有TPR-FPR元组的并集），然后绘制可能平滑的组合点集 . 可选地，使用简单线性插值和用于数值积分的复合梯形方法计算AUC-ROC .
回复于 2024-05-04T13:44:51+08:00
0
这是使用交叉验证绘制sklearn分类器的Precision Recall曲线的最佳方法 . 最好的部分是，它绘制了所有类的PR曲线，因此您也可以获得多个整齐的曲线
```
from scikitplot.classifiers import plot_precision_recall_curve
import matplotlib.pyplot as plt

clf = LogisticRegression()
plot_precision_recall_curve(clf, X, y)
plt.show()
```
该函数自动负责交叉验证给定数据集，连接所有折叠预测，并计算每个类平均PR曲线的PR曲线 . 它是一个单行功能，可以为您完成所有这些功能 .

Precision Recall Curves

免责声明：请注意，这使用我构建的scikit-plot库 .
回复于 2024-05-04T13:44:51+08:00

在scikit-learn中使用交叉验证时绘制Precision-Recall曲线

2 回答

相关问题