我试图运行“应用预测建模”一书中的代码,该部分是关于使用径向内核通过插入符号“训练”功能训练SVM的部分 .

我没有添加任何内容就复制了代码 . 代码运行没有任何错误,但结果与书中写的不一致 . 所有概率几乎相同,所有对象都分为一类 . 这是这段代码:

library(caret)
data("GermanCredit")
GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
# remove some other columns that do not add useful information
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL

#Split the data into training (80%) and test sets (20%)
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest  <- GermanCredit[-inTrain, ]

set.seed(1056)
svmFit <- train(Class ~ .,
           data = GermanCreditTrain,
           method = "svmRadial",
           preProcess = c("center", "scale"),
           tuneLength = 10,
           trControl = trainControl(method = "repeatedcv",                                        repeats = 5,
                                    classProbs = TRUE))

模型的输出如下:

> svmFit
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa      
    0.25  0.70025   0.006361713
    0.50  0.70025   0.006372290
    1.00  0.70025   0.006372290
    2.00  0.70075   0.008001058
    4.00  0.70100   0.009101928
    8.00  0.69950   0.004902168
   16.00  0.70050   0.006864093
   32.00  0.70025   0.006361713
   64.00  0.70050   0.007509254
  128.00  0.70050   0.007472237

Tuning parameter 'sigma' was held constant at a value of 0.01390712
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were sigma = 0.01390712 and C = 4.

因此,准确性甚至不会改变 . 我尝试了不同的参数集,但结果是一样的 .

所有样本的概率几乎相同:“坏”类为~0.304,“好”为~0.695(差异仅为第四位) .

这本书的结果可在这里找到:https://github.com/cran/AppliedPredictiveModeling/blob/master/inst/chapters/04_Over_Fitting.Rout

他们有

> svmFit
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictors
  2 classes: 'Bad', 'Good' 

Pre-processing: centered, scaled 
Resampling: Cross-Validated (10 fold, repeated 5 times) 

Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 

Resampling results across tuning parameters:

  C     Accuracy  Kappa  Accuracy SD  Kappa SD
  0.25  0.744     0.362  0.0499       0.113   
  0.5   0.74      0.35   0.0516       0.117   
  1     0.746     0.348  0.0522       0.125   
  2     0.743     0.325  0.0467       0.116   
  4     0.744     0.322  0.0477       0.12    
  8     0.75      0.323  0.0464       0.13    
  16    0.745     0.302  0.0457       0.13    
  32    0.739     0.28   0.0451       0.126   
  64    0.743     0.284  0.0444       0.135   
  128   0.734     0.265  0.0445       0.124   

Tuning parameter 'sigma' was held constant at a value of 0.008918477
Accuracy was used to select the optimal model using  the largest     value.
The final values used for the model were sigma = 0.00892 and C = 8.

此外,整个 class 都得到了这样的结果,但是老师,他的计算机有较旧版本的R,得到了正确的结果 . 所以这是我的问题:R,插入符号,kernlab等新版本中的某些更改中的问题,还是我对其他内容做错了?如何更改此代码以获得正确的结果? Caret版本是6.0-77 .

提前致谢 .