首页 文章

在插入符号包中使用glmnet运行elasticnet逻辑回归时无法获得概率预测

提问于
浏览
0

我注意到当使用glmnet包在插入符号中运行惩罚逻辑回归时,模型预测被重新分类为0或1个结果:

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
train_control <- trainControl(method="cv", number=10, savePredictions = TRUE)
glmnetGrid <- expand.grid(alpha=c(0, .5, 1), lambda=c(.1, 1, 10))
model<- train(as.factor(admit) ~ ., data=mydata, trControl=train_control, method="glmnet", family="binomial", tuneGrid=glmnetGrid, metric="Accuracy", preProcess=c("center","scale"))
model

glmnet 

400 samples
  3 predictor
  2 classes: '0', '1' 

Pre-processing: centered (3), scaled (3) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 360, 360, 361, 359, 360, 361, ... 
Resampling results across tuning parameters:

  alpha  lambda  Accuracy      Kappa          Accuracy SD     Kappa SD     
  0.0     0.1    0.6923233271  0.09027099758  0.018975211636  0.06988057154
  0.0     1.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  0.0    10.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  0.5     0.1    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  0.5     1.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  0.5    10.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  1.0     0.1    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  1.0     1.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000
  1.0    10.0    0.6825703565  0.00000000000  0.007557700521  0.00000000000

Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were alpha = 0 and lambda = 0.1. 
> head(model$pred)
  pred obs rowIndex alpha lambda Resample
1    0   0       16     0     10   Fold01
2    0   0       17     0     10   Fold01
3    0   0       24     0     10   Fold01
4    0   1       46     0     10   Fold01
5    0   0       69     0     10   Fold01
6    0   0       84     0     10   Fold01

> summary(model$pred)
 pred     obs         rowIndex          alpha         lambda       Resample        
 0:3576   0:2457   Min.   :  1.00   Min.   :0.0   Min.   : 0.1   Length:3600       
 1:  24   1:1143   1st Qu.:100.75   1st Qu.:0.0   1st Qu.: 0.1   Class :character  
                   Median :200.50   Median :0.5   Median : 1.0   Mode  :character  
                   Mean   :200.50   Mean   :0.5   Mean   : 3.7                     
                   3rd Qu.:300.25   3rd Qu.:1.0   3rd Qu.:10.0                     
                   Max.   :400.00   Max.   :1.0   Max.   :10.0

是否有可能获得原始预测概率= exp(logit(y))而不是0/1预测结果?

1 回答

  • 1

    您必须在 trainControl 中使用选项 ClassProbs . 因子admit需要是一个字符,因为它将用作列名 . 请参阅以下示例 .

    library(caret)
    
    mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
    mydata$admit <- as.factor(mydata$admit)
    
    #create levels yes/no to make sure the the classprobs get a correct name
    levels(mydata$admit) = c("yes", "no")
    
    train_control <- trainControl(method="cv", number=10, classProbs = TRUE, savePredictions = TRUE)
    glmnetGrid <- expand.grid(alpha=c(0, .5, 1), lambda=c(.1, 1, 10))
    set.seed(4242)
    model<- train(admit ~ ., 
                  data=mydata, 
                  trControl = train_control, 
                  method="glmnet", 
                  family="binomial", 
                  tuneGrid=glmnetGrid, 
                  metric="Accuracy", 
                  preProcess=c("center","scale"))
    
    head(model$pred)
      pred obs rowIndex       yes        no alpha lambda Resample
    1  yes  no        4 0.6856383 0.3143617     0     10   Fold01
    2  yes  no        6 0.6796251 0.3203749     0     10   Fold01
    3  yes yes       10 0.6764742 0.3235258     0     10   Fold01
    4  yes yes       71 0.6795685 0.3204315     0     10   Fold01
    5  yes  no       78 0.6774003 0.3225997     0     10   Fold01
    6  yes yes       82 0.6812158 0.3187842     0     10   Fold01
    

相关问题