插入方法=“树袋”-Java 学习之路

这是我运行火车功能的输出：

Bagged CART 


1251 samples
  30 predictors
   2 classes: 'N', 'Y' 


No pre-processing
Resampling: Bootstrapped (25 reps) 


Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ... 


Resampling results


  Accuracy  Kappa  Accuracy SD  Kappa SD
  0.806     0.572  0.0129       0.0263

这是我的困惑矩阵

Bootstrapped (25 reps) Confusion Matrix 


(entries are percentages of table totals)

          Reference
Prediction    N       Y
         N    24.8   7.9
         Y    11.5  55.8

在对数据集进行分区--80％训练和20％测试之后，我训练模型，然后在我的测试分区上进行“预测”并获得~65％的准确度 .

问题：

(1) Does this mean my model is not very good?
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ?  Would a Logistic Regression method be better?
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'.  Could this be "skewing" my training / results?  Do I need a ratio closer to 50 - 50?

任何帮助将不胜感激！！

1 回答

1

代码和可重复的示例在这里会有所帮助 .

假设混淆矩阵来自运行 confusionMatrix.train ，那么我会说你的模型看起来很不错 . 准确性的差异有点令人费解 . 我已经看到测试集结果看起来比定期重新采样结果更糟糕，但是引导程序在测量性能方面可能非常悲观，而且它看起来比测试集好得多 . 尝试使用不同的训练/测试分组，看看你是否得到类似的东西（或尝试重复10倍的CV） .

（a）再次，很难说你发布的内容

（b）该模型非常优秀，并且没有关于哪种模式更好或更差的一般规则（google“no free lunch”定理）

（c）不 balancer 并不太糟糕，所以我不认为这是一个问题（除非训练和测试组的百分比不同）

马克斯

回复于 2024-05-13T09:07:32+08:00

插入方法=“树袋”

1 回答

相关问题