我知道我的问题与统计数据有关,但我正在寻找 R
的解决方案,所以我相信它适合SO .
我使用 R
中的 lme4
函数使用 glmer
函数构建了一个广义线性混合效应模型(GLMM),以使用Zuur等人基于显着的解释变量模拟水产养殖场周围的物种丰富度 . (2009)Mixed Effects Models and Extensions in Ecology with R . 该模型是:
Mod1 <- glmer(Richness ~ Distance + Depth + Substrate + Beggiatoa +
Distance*Beggiatoa + (1|Site/transect), family = poisson, data = mydata)
现在我有一个在不同站点收集的完整数据集,我想评估此模型在新数据集上的执行情况 .
在CV上的question后,有人建议在新数据集上查找中位数绝对偏差(mad) . 我在 R
中的 stats
包中尝试了 mad
函数,但是我收到以下错误消息:
Error in x[!is.na(x)] : object of type 'S4' is not subsettable
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
Does anybody knows what's going wrong here? Is it that mad in stats can't be calculated for GLMMs? If so, is there another R package to calculate mad from GLMMs?
Edit:
为了让您了解我的数据,这里是 dput(head(mydata))
的输出,同时请注意新数据集中没有"Substrate"类别,"S"指的是"Richness":
structure(list(S = c(0, 1, 2, 3, 3, 2), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("BC", "BH", "GC", "IS", "Ref"
), class = "factor"), Transect = structure(c(4L, 4L, 4L, 4L,
4L, 4L), .Label = c("10GC", "10IS", "10N", "10S", "11IS", "12IS",
"13E", "1GC", "1N", "1W", "2E", "2GC", "2IS", "2N", "2W", "2WA",
"3E", "3GC", "3IS", "3N", "3S", "4E", "4GC", "4IS", "4S", "4W",
"5GC", "5IS", "5S", "6GC", "6IS", "6N", "6S", "6W", "7E", "7GC",
"7IS", "8GC", "8IS", "8W", "9E", "9GC", "9IS", "9N", "RefBC1",
"RefBC10", "RefBC11", "RefBC12", "RefBC2", "RefBC3", "RefBC4",
"RefBC5", "RefBC6", "RefBC7", "RefBC8", "RefBC9", "X1", "X2"), class = "factor"),
Distance = c(2, 20, 40, 80, 120, 160), Depth = c(40L, 40L,
50L, 40L, 40L, 40L), Beggiatoa = c(2, 1, 1, 0, 0, 0)), .Names = c("S",
"Site", "Transect", "Distance", "Depth", "Beggiatoa"), row.names = c(NA,
6L), class = "data.frame")
1 回答
对于样本内误差,中位绝对偏差计算就是这样
...你可能想要
residuals(fitted_model,type="response")
,因为residuals
默认会给你偏差残差(见?residuals.merMod
)如果您想查看样本外错误,可以执行以下操作:
(
re.form=~0
指定您要省略预测中的随机效果,除非您还获得了训练数据,否则这是您唯一的选择)