roc_auc_score（）和auc（）的结果不同-Java 学习之路

我无法理解scikit-learn中 roc_auc_score() 和 auc() 之间的区别（如果有的话） .

我想用不 balancer 的类来预测二进制输出（Y = 1时约为1.5％） .

分类器

model_logit = LogisticRegression(class_weight='auto')
model_logit.fit(X_train_ridge, Y_train)

罗克曲线

false_positive_rate, true_positive_rate, thresholds = roc_curve(Y_test, clf.predict_proba(xtest)[:,1])

AUC的

auc(false_positive_rate, true_positive_rate)
Out[490]: 0.82338034042531527

和

roc_auc_score(Y_test, clf.predict(xtest))
Out[493]: 0.75944737191205602

有人可以解释这个区别吗？我以为两者都只计算ROC曲线下的面积 . 可能是因为数据集不 balancer 但我无法弄清楚原因 .

谢谢！

3 回答

AUC并不总是ROC曲线下的面积 . 曲线下面积是 some 曲线下的（抽象）区域，因此它比AUROC更通用 . 对于不 balancer 类，最好找到精确回忆曲线的AUC .

请参阅 roc_auc_score 的sklearn源：

def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
    # <...> docstring <...>
    def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
            # <...> bla-bla <...>

            fpr, tpr, tresholds = roc_curve(y_true, y_score,
                                            sample_weight=sample_weight)
            return auc(fpr, tpr, reorder=True)

    return _average_binary_score(
        _binary_roc_auc_score, y_true, y_score, average,
        sample_weight=sample_weight)

正如您所看到的，这首先获得roc曲线，然后调用 auc() 来获取该区域 .

我想你的问题是 predict_proba() 电话 . 对于正常 predict() ，输出始终相同：

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, roc_auc_score

est = LogisticRegression(class_weight='auto')
X = np.random.rand(10, 2)
y = np.random.randint(2, size=10)
est.fit(X, y)

false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict(X))
print auc(false_positive_rate, true_positive_rate)
# 0.857142857143
print roc_auc_score(y, est.predict(X))
# 0.857142857143

如果你为此更改了上述内容，有时会得到不同的输出：

false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict_proba(X)[:,1])
# may differ
print auc(false_positive_rate, true_positive_rate)
print roc_auc_score(y, est.predict(X))

回复于 2024-05-18T23:42:12+08:00

21
predict 只返回一个类或另一个类 . 然后在分类器上计算一个带有 predict 结果的ROC，只有三个阈值（试验所有一个类，其他所有类都是微不足道的，以及介于两者之间） . 您的ROC曲线如下所示：
```
..............................
      |
      |
      |
......|
|
|
|
|
|
|
|
|
|
|
|
```
同时， predict_proba() 返回整个概率范围，因此现在您可以在数据上设置三个以上的阈值 .
```
.......................
             |
             |
             |
          ...|
          |
          |
     .....|
     |
     |
 ....|
.|
|
|
|
|
```
因此不同的领域 .
回复于 2024-05-18T23:42:12+08:00
4
当您使用y_pred（类标签）时，您已经确定了阈值 . 当您使用y_prob（正类概率）时，您可以打开阈值，ROC曲线可以帮助您确定阈值 .

对于第一种情况，您使用的是概率：
```
y_probs = clf.predict_proba(xtest)[:,1]
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_probs)
auc(fp_rate, tp_rate)
```
当你这样做时，你会在考虑你将要使用的门槛之前考虑AUC' .

在第二种情况下，您使用预测（而不是概率），在这种情况下，对两者使用'predict'而不是'predict_proba'，您应该得到相同的结果 .
```
y_pred = clf.predict(xtest)
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_pred)
print auc(fp_rate, tp_rate)
# 0.857142857143

print roc_auc_score(y, y_pred)
# 0.857142857143
```
回复于 2024-05-18T23:42:12+08:00

roc_auc_score（）和auc（）的结果不同

分类器

罗克曲线

AUC的

3 回答

相关问题