首页 文章

循环或应用多个回归,将系数和p值提取到数据框中

提问于
浏览
0

我有一个包含3个相关(LHS)变量和4个独立(RHS)变量的数据框 . 我想在每个RHS变量上运行每个LHS变量的线性回归,并将每个回归的结果作为一行存储在数据框中,列为:lhs,rhs,Estimate,Std . 错误,t值,Pr(> | t |) .

例如,使用mtcars,我考虑了一个嵌套循环:

lhs <- c('mpg', 'cyl', 'disp')
rhs <- c('hp', 'drat', 'wt', 'qsec')

reg_count <- 1
for (i in lhs){
  for (j in rhs){
    model <- lm(i ~ j, data = mtcars)
    results[reg_count] <- coef(summary(model))
    reg_count <- reg_count + 1
  }
}

然而,由于许多原因,这失败了 . 有一种简单的方法可以做到这一点吗?理想情况下使用apply()函数而不是循环?

2 回答

  • 2

    我就是这样做的 . 我稍微缩短了你的例子,但这无关紧要:

    lhs <- c('mpg', 'cyl', 'disp')
    rhs <- c('hp', 'drat')
    
    models = list()
    for (i in lhs){
      for (j in rhs){
        models[[paste(i, "vs", j)]] <- lm(as.formula(paste(i, "~", j)), data = mtcars)
      }
    }
    

    如果你想使用 apply ,你需要以 matrix 开头 . 运行时间的差异可以忽略不计 .

    # with apply:
    coefs_mat = expand.grid(lhs, rhs)
    mods = apply(coefs_mat, 1, function(row) {
      lm(as.formula(paste(row[1], "~", row[2])), data = mtcars)
    })
    names(mods) = with(coefs_mat, paste(Var1, "vs", Var2))
    

    两种方法都给出了相同的结果 . 现在我们可以用 broom::tidy 拉取系数等

    # get coefs
    library(broom)
    coefs = lapply(mods, tidy, simplify = F)
    # combine
    dplyr::bind_rows(coefs, .id = "mod")
    #             mod        term      estimate    std.error  statistic      p.value
    # 1     mpg vs hp (Intercept)   30.09886054 1.633921e+00 18.4212465 6.642736e-18
    # 2     mpg vs hp          hp   -0.06822828 1.011930e-02 -6.7423885 1.787835e-07
    # 3     cyl vs hp (Intercept)    3.00679525 4.254852e-01  7.0667442 7.405351e-08
    # 4     cyl vs hp          hp    0.02168354 2.635142e-03  8.2286042 3.477861e-09
    # 5    disp vs hp (Intercept)   20.99248341 3.260662e+01  0.6438104 5.245902e-01
    # 6    disp vs hp          hp    1.42977003 2.019414e-01  7.0801224 7.142679e-08
    # 7   mpg vs drat (Intercept)   -7.52461844 5.476663e+00 -1.3739423 1.796391e-01
    # 8   mpg vs drat        drat    7.67823260 1.506705e+00  5.0960421 1.776240e-05
    

    我们还可以提取模型摘要统计信息:

    # get summary stats
    summ = lapply(mods, glance, simplify = F)
    dplyr::bind_rows(summ, .id = "mod")
    #            mod r.squared adj.r.squared     sigma statistic      p.value df     logLik
    # 1    mpg vs hp 0.6024373     0.5891853  3.862962  45.45980 1.787835e-07  2  -87.61931
    # 2    cyl vs hp 0.6929688     0.6827344  1.005944  67.70993 3.477861e-09  2  -44.56307
    # 3   disp vs hp 0.6255997     0.6131197 77.089503  50.12813 7.142679e-08  2 -183.41236
    # 4  mpg vs drat 0.4639952     0.4461283  4.485409  25.96964 1.776240e-05  2  -92.39996
    # 5  cyl vs drat 0.4899134     0.4729105  1.296596  28.81354 8.244636e-06  2  -52.68517
    # 6 disp vs drat 0.5044038     0.4878839 88.693360  30.53315 5.282022e-06  2 -187.89934
    #         AIC       BIC     deviance df.residual
    # 1 181.23863 185.63584    447.67431          30
    # 2  95.12614  99.52335     30.35771          30
    # 3 372.82473 377.22194 178283.74604          30
    # 4 190.79993 195.19714    603.56673          30
    # 5 111.37033 115.76754     50.43482          30
    # 6 381.79868 386.19588 235995.36410          30
    
  • 2

    您可以从 expand.grid 开始,为相关/自变量对提供一个很好的数据帧 . 然后将公式和模型添加到数据中 .

    pairings <- expand.grid(
      lhs = c('mpg', 'cyl', 'disp'),
      rhs = c('hp', 'drat', 'wt', 'qsec')
    )
    
    pairings[["formula"]] <- lapply(
      X   = paste(pairings[["lhs"]], "~", pairings[["rhs"]]),
      FUN = as.formula
    )
    
    pairings[["model"]] <- lapply(
      X    = pairings[["formula"]],
      FUN  = lm,
      data = mtcars
    )
    

    结果:

    str(pairings, max.level = 1)
    # 'data.frame': 12 obs. of  4 variables:
    #  $ lhs    : Factor w/ 3 levels "mpg","cyl","disp": 1 2 3 1 2 3 1 2 3 1 ...
    #  $ rhs    : Factor w/ 4 levels "hp","drat","wt",..: 1 1 1 2 2 2 3 3 3 4 ...
    #  $ formula:List of 12
    #  $ model  :List of 12
    #  - attr(*, "out.attrs")=List of 2
    
    pairings[["model"]][[1]]
    # Call:
    # FUN(formula = X[[i]], data = ..1)
    # 
    # Coefficients:
    # (Intercept)           hp
    #    30.09886     -0.06823
    

相关问题