我遇到了 Headers 中指出的错误,并通过检查空值,将预测变量更改为数字以及使用中心和比例预处理变量来查看在线发布的类似问题的解决方案,但无效 .

我能够使用Caret for RF中的相同数据以及每个GBM参数的一系列tunegrid选项来运行模型,但是当我为每个GBM参数指定最佳值时却不能 .

我的火车数据包括回归目标变量(Gross.Salary0),我的预测变量是因子(二进制)或数字 . 我的数据中没有缺失值 . 没有完整变量数的数据子集如下:

structure(list(Gross.Salary0 = c(3043.7, 4170, 3148.4, 3678.4, 3586.4,
3126.4), Gender.MALE. = structure(c(1L, 2L, 1L, 1L, 2L, 1L), .Label =  
c("0", "1"), class = "factor"), Certificate...HQA.MASTER.S.DEGREE....Outflow.Date..2. 
= c(0L, 1875929344L, 0L, 1706185636L, 0L, 0L), Certificate...HQA.HONS.I.
= structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = 
"factor"), Year.of.Inflow... = c(2009L, 2009L, 2009L, 2009L, 2009L, 
2009L), Year.of.Inflow..2. = c(4036081L, 4036081L, 4036081L, 4036081L,
4036081L, 4036081L), Age...5....Agency.10. = c(0, 0, 0, 0, 0, 0), 
Inflow.Date..2....Agency.10. = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = 
c(NA, 6L), class = "data.frame")

为了获得R中Caret GBM的最佳调整参数,我设法运行以下代码:

fit_control2 <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")

grid <- expand.grid(n.trees=c(10,20,50,100,500,1000),shrinkage=c(0.01,0.05,0.1,0.5),n.minobsinnode = c(3,5,10),interaction.depth=c(1,5,10))

gbm_model2 <-train(Gross.Salary0 ~ ., data=train, method='gbm',trControl=fit_control2, tuneGrid=grid)

gbm_model2

当n.trees = 1000,interaction.depth = 1,shrinkage = 0.05和n.minobsinnode = 3时,结果产生最低的RMSE .

我使用最佳调整参数运行最终的GBM模型,但返回模型拟合误差和所有缺失的RMSE值 .

fit_control3 <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")

tunegrid <- expand.grid(n.trees=1000, interaction.depth = 1, shrinkage = 0.05, n.minobsinnode = 3)

gbm_model3 <- train(Gross.Salary0 ~ ., data=train, method="gbm", tunegrid =tunegrid, trControl=fit_control3)

除了缺少RMSE值之外,还有50个或更多警告,其中包含以下示例:

50: model fit failed for Fold07.Rep2: shrinkage=0.1, interaction.depth=2, n.minobsinnode=10, n.trees=150 Error in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli",  : 
unused argument (tunegrid = list(1000, 1, 0.05, 3))

虽然我的目标变量是数字(回归),但我注意到警告中显示的分布显示了bernoulli,因此我将模型中的分布指定为高斯分布,但R仍然返回相同的误差 .

请感谢您的帮助,谢谢 .