亲爱的机器学习和R朋友,

我注意到,在相同的模型调整参数下,使用虹膜数据集进行演示时,直接使用gbm训练gbm模型或使用插入符包的训练函数导致分类模型的不同结果(请参阅下面的代码,使用虹膜数据集进行演示):

1)gbm使用多项分布预测因子导致不同的类概率,而不是来自插入符号包的列车模型的预备 . 或者充其量只是奇怪地按比例缩小到0.5 . - 怎么了?

2)gbm预测类顺序与插入符号预测不同(gbm中的类1似乎是插入符号中的类2 [在具有2个类的模型中]) . 为什么会这样?

3)插入符函数不支持带有类numeric和“distribution =”bernoulli“的二项式响应变量?它给出了一个警告,对于两个类,应该使用因子 - 这是否也可能导致不同的预测?

当直接使用具有相同数据集的randomForest并通过插入符号时,这些不匹配的“问题”似乎并非如此 .

library(caret)
library(gbm)

data(iris)
iris$Species=as.numeric(iris$Species=="virginica")

###caret

trainControl <- trainControl(method="cv", number=3)

set.seed(123)
gbm.c<- train(as.factor(Species) ~ . , data=iris, distribution="multinomial",     

method="gbm", trControl=trainControl(method="none"), verbose=F)            

pr1=predict(gbm.c, newdata=iris, type="prob")
pr1=data.frame(pr1)
max(pr1[,1])
min(pr1[,1])##here the prob range from 0 to 1. perfect.

###GBM
set.seed(123)
gbm.g <- gbm(as.factor(Species) ~ ., data=iris,distribution = "multinomial", verbose=FALSE)

pr2 <- predict(gbm.g, newdata=iris, 100, type="response")
pr2=data.frame(pr2)

max(pr2[,1])
min(pr2[,1])###strange that the whole range for predict.gbm lies between  0.4 to 0.6 prob only, seems some unclear scaling is happening?

cor(pr1[,1], pr2[,1]) # even though the corellation(and r2) look good, the corelation is negative, why is the factor 1 and 2 swapped in one of the two?
plot(pr1[,1], pr2[,1])


class.c=apply(pr1, 1, FUN=function(x) which(x==max(x)))
class.g=apply(pr2, 1, FUN=function(x) which(x==max(x)))


class.c==class.g 
class.g2=rep(1, length(class.g))
class.g2[class.g==1]=2
class.c==class.g2 #class prediction seems to work okay, even though the scaling is puzzeling, and one has to know the wrong order

###random fores
library(randomForest)
set.seed(123)
rf.c<- train(as.factor(Species) ~ . , data=iris, method="rf", trControl=trainControl(method="none"), verbose=F)            

pr.rf1=predict(rf.c, newdata=iris, type="prob")
pr.rf1=data.frame(pr1)
set.seed(123)
rf.r=randomForest(as.factor(Species) ~ . , data=iris)
pr.rf2=predict(rf.r, newdata=iris, type="prob")

cor(pr.rf1[,1], pr.rf2[,1])
plot(pr.rf1[,1], pr.rf2[,1])