首页 文章

使用group by和tidy运行多个模型并将结果提取到数据帧

提问于
浏览
8

我想使用 group_by %>% do(tidy(*)) 运行几个线性回归模型并将模型结果提取到数据框 . 每个模型的数据框应包括以下内容:结果变量,暴露变量,样本大小,β系数,SE和p值 .

library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

模型应该针对所有协变量调整的每次暴露的每个结果回归,例如:

lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...

最终的代码可能看起来像这样?

models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))

1 回答

  • 5

    这是一个解决方案,首先为您要运行的每个模型创建公式,然后从您要分析的数据集中调用正确的变量,而不是重新整形数据集本身并应用模型:

    library(tidyverse)
    library(broom)
    
    outcomes <- c("wt", "mpg", "hp", "disp")
    exposures <- c("gear", "vs", "am")
    covariates <- c("drat", "qsec")
    
    expand.grid(outcomes, exposures, covariates) %>%
      group_by(Var1, Var2) %>%
      summarise(Var3 = paste0(Var3, collapse = "+")) %>%
      rowwise() %>%
      summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
      group_by(model_id = row_number(),
               frm) %>%
      do(tidy(lm(.$frm, data = mtcars))) %>%
      ungroup()
    
    # # A tibble: 52 x 7
    #   model_id frm                       term          estimate std.error statistic     p.value
    #      <int> <chr>                     <chr>            <dbl>     <dbl>     <dbl>       <dbl>
    # 1        1 wt~factor(gear)+drat+qsec (Intercept)      9.25     2.17       4.27  0.000218   
    # 2        1 wt~factor(gear)+drat+qsec factor(gear)4   -0.187    0.493     -0.378 0.708      
    # 3        1 wt~factor(gear)+drat+qsec factor(gear)5   -0.703    0.518     -1.36  0.186      
    # 4        1 wt~factor(gear)+drat+qsec drat            -1.03     0.425     -2.42  0.0227     
    # 5        1 wt~factor(gear)+drat+qsec qsec            -0.121    0.0912    -1.32  0.196      
    # 6        2 wt~factor(vs)+drat+qsec   (Intercept)      4.35     2.28       1.91  0.0663     
    # 7        2 wt~factor(vs)+drat+qsec   factor(vs)1     -1.04     0.416     -2.49  0.0189     
    # 8        2 wt~factor(vs)+drat+qsec   drat            -0.918    0.263     -3.49  0.00160    
    # 9        2 wt~factor(vs)+drat+qsec   qsec             0.147    0.106      1.39  0.175      
    # 10        3 wt~factor(am)+drat+qsec   (Intercept)      8.29     1.31       6.33  0.000000766
    # # ... with 42 more rows
    

    非常类似的过程,如果您更喜欢使用 purrr 来自 purrr 包而不是 do

    expand.grid(outcomes, exposures, covariates) %>%
      group_by(Var1, Var2) %>%
      summarise(Var3 = paste0(Var3, collapse = "+")) %>%
      rowwise() %>%
      summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
      group_by(model_id = row_number()) %>%
      mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
      unnest() %>%
      ungroup()
    

    请记住,这种方法的关键是创建公式 . 因此,如果您设法以稍微不同的方式指定变量并使用比以前更少的代码创建公式,代码将变得更简单:

    outcomes <- c("wt", "mpg", "hp", "disp")
    exposures <- c("gear", "vs", "am")
    covariate1 <- "drat"
    covariate2 <- "qsec"
    
    expand.grid(outcomes, exposures, covariate1, covariate2) %>%
      transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
      group_by(model_id = row_number()) %>%
      mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
      unnest() %>%
      ungroup()
    

相关问题