首页 文章

如何在没有模型对象的情况下从ns样条参数进行预测

提问于
浏览
4

我有一个适合R的glm的系数,我想预测一组新数据的预期值 . 如果我有模型对象,这将很简单,使用predict() . 但是,我现在不在现场,出于数据机密性的原因,我不再拥有模型对象 . 我只有使用summary(model)生成的摘要对象,它包含模型系数 .

使用系数来预测简单模型的预期值很容易 . 但是,当模型包含三次样条ns()时,我想知道如何执行此操作 . 当模型还包括分类变量时的任何快捷方式也将受到赞赏 .

这是一个简单的例子 .

library(splines)
dat <- data.frame(x=1:500, z=runif(500), k=as.factor(sample(c("a","b"), size=500, replace=TRUE)))
kvals <- data.frame(kn=c("a","b"),kv=c(20,30))
dat$y = dat$x + (40*dat$z)^2 + kvals$kv[match(dat$k,kvals$kn)] + rnorm(500,0,30)
# Fit model
library(splines)
mod <- glm(y ~ x + ns(z,df=2) + k,data=dat)
# Create new dataset
dat.new <- expand.grid(x=1:3,z=seq(0.2,0.4,0.1),k="b")
# Predict expected values in the usual way
predict(mod,newdata=dat.new)
summ <- summary(mod)
rm(mod)
# Now, how do I predict using just the summary object and dat.new?

1 回答

  • 2

    有一个起点可以帮助你实现Roland简要建议的策略 . summ 对象具有定义样条函数所需的信息,但它有点埋没:

    names(summ)
         [1] "call"           "terms"          "family"         "deviance"       "aic"           
         [6] "contrasts"      "df.residual"    "null.deviance"  "df.null"        "iter"          
        [11] "deviance.resid" "coefficients"   "aliased"        "dispersion"     "df"            
        [16] "cov.unscaled"   "cov.scaled"
    

    看看 terms 叶子的结构,我们发现样条细节仍然埋藏在 predvars subleaf内部:

    str(summ$terms)
    Classes 'terms', 'formula'  language y ~ x + ns(z, df = 2) + k
      ..- attr(*, "variables")= language list(y, x, ns(z, df = 2), k)
      ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
      .. ..- attr(*, "dimnames")=List of 2
      .. .. ..$ : chr [1:4] "y" "x" "ns(z, df = 2)" "k"
      .. .. ..$ : chr [1:3] "x" "ns(z, df = 2)" "k"
      ..- attr(*, "term.labels")= chr [1:3] "x" "ns(z, df = 2)" "k"
      ..- attr(*, "order")= int [1:3] 1 1 1
      ..- attr(*, "intercept")= int 1
      ..- attr(*, "response")= int 1
      ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
      ..- attr(*, "predvars")= language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"), Boundary.knots = c(0.00118412892334163,  0.99828373757191), intercept = FALSE), k)
      ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "nmatrix.2" "factor"
      .. ..- attr(*, "names")= chr [1:4] "y" "x" "ns(z, df = 2)" "k"
    

    所以拉出属性:

    str(attributes(summ$terms)$predvars)
     language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"),
                   Boundary.knots = c(0.00118412892334163,  0.99828373757191), intercept = FALSE), k)
    

    您可以看到,如果提供所需的x,y,z和k值,则可以恢复样条曲线:

    with(dat, ns(z, knots = 0.514993450604379, Boundary.knots = c(0.00118412892334163, 
     0.99828373757191), intercept = FALSE) )
    #---
                     1             2
      [1,] 5.760419e-01 -1.752762e-01
      [2,] 2.467001e-01 -1.598936e-01
      [3,] 4.392684e-01  4.799757e-01
    snipping ....
    [498,] 4.965628e-01 -2.576437e-01
    [499,] 5.627389e-01  1.738909e-02
    [500,] 2.393920e-02 -1.611872e-02
    attr(,"degree")
    [1] 3
    attr(,"knots")
    [1] 0.5149935
    attr(,"Boundary.knots")
    [1] 0.001184129 0.998283738
    attr(,"intercept")
    [1] FALSE
    attr(,"class")
    [1] "ns"     "basis"  "matrix"
    

    如果您知道数据的极端情况,则可以构建替代 dat . 请参阅 ?ns 以及它链接到的其他帮助页面 .

相关问题