如何使用并行计算优化Caret的速度和内存使用？-Java 学习之路

我正在使用插入符号在大网格上找到最佳调整参数 . 我发现使用带有插入符号的doSnow，内存耗尽如此之快 . 是否有一些策略来优化它？

这是C50模型，搜索网格如下：

mygrid <- expand.grid(trials=c(1, 1:4*10),
                  model=c('rules', 'tree'),
                  winnow=FALSE,
                  fuzzy=c(TRUE, FALSE),
                  cutoff=c(0.01, seq(0.025, 0.5, by=0.025)))

1）为了节省内存，我设置trim = TRUE和returnData = FALSE来减少单个模型的大小，如下所示：

mycontrol2 <- trainControl(method = "repeatedcv",
                      number = 3,#10,
                      repeats = 3,#5,
                      classProbs = TRUE,
                      summaryFunction = fiveStats,
                      verboseIter=TRUE,
                      trim=TRUE, returnData = FALSE #to make the model size smaller.
                      )

2）为了提高速度，我使用DoSnow的并行计算，如下：

library(doSNOW)
 library(parallel)
 cores <- detectCores()
 cl <- makeCluster(cores-2, outfile="")
 registerDoSNOW(cl)

但是，当我通过“top”命令监视时，由于内存不足，Linux系统会终止该进程，错误消息如下所示 .

执行暂停unserialize（节点$ con）中的错误：从连接读取错误

在这里，是否有更好的方法以良好的速度管理插入符号培训？以下是我的一些想法，可供讨论：

make search grid to several small ones and train each individually . 问题：
是否有系统的方法来确定最佳网格尺寸？
分割后
是否有更好的方法来组合模型，因为我仍然想使用plot.train（mod）方法来查看性能图 .
set the number of parallel nodes, i.e., c1, smaller . 问题：
是否有设定最佳c1值的策略？

1 回答

0
我将尝试根据我最近的经验回答一些问题，因为你充满了主题 .

我平行应用插入符号计算：
```
library(doParallel)
n_cores <- detectCores(); n_cores
# As my pc has three cores, I will use 3 of them: 
registerDoParallel(cores = n_cores - 1)
```
然后，当您调整trainControl时，您可以申请：
```
mycontrol2 <- trainControl(
    method = "repeatedcv",
    number = 3,
    repeats = 3,
    classProbs = TRUE,
    verboseIter=TRUE,
    allowParallel = TRUE)
```
然后，你可以训练你的模型，我会放一些你可以使用的线，因为显然你没有指定你打算使用的模型 . 最重要的是定义预测变量（T1~ . ），方法，数据和trControl，输入如下内容：
```
modFit_1 <- train(T1~.,
    method="nnet",
    trControl=cvCtrl, # Here you tune, among others, the "allowParallel"
    data=train,
    metric = "RMSE",
    preProcess = "scale",
    tuneGrid = mygrid)
```
您有其他问题，我认为它们不适用于此主题 . 但是，很快，与"systematic way to decide the optimal grid size"和"models to compare"相关，您可能会对此链接的内容感兴趣：https://rpubs.com/chidungkt/389013，其中作者编写策略，应用插入R包，以比较五个模型的性能 .

嘿，如果你得到它，请给出你的反馈！照顾自己！
回复于 2024-04-19T11:44:53+08:00

如何使用并行计算优化Caret的速度和内存使用？

1 回答

相关问题