首页 文章

总结多个group_by变量的组合和单独

提问于
浏览
2

我正在使用dplyr的group_by并汇总得到每个group_by变量组合的均值,但也希望单独得到每个group_by变量的均值 .

例如,如果我跑

mtcars %>% 
  group_by(cyl, vs) %>% 
  summarise(new = mean(wt))

我明白了

cyl    vs      new
  <dbl> <dbl>    <dbl>
     4     0 2.140000
     4     1 2.300300
     6     0 2.755000
     6     1 3.388750
     8     0 3.999214

但我想得到

cyl    vs      new
  <dbl> <dbl>    <dbl>
     4     0 2.140000
     4     1 2.300300
     4    NA 2.285727
     6     0 2.755000
     6     1 3.388750
     6    NA 3.117143
     8     0 3.999214
    NA     0 3.688556
    NA     1 2.611286

即得到组合和单独变量的均值

Edit Jaap将此标记为重复,并将我指向Using aggregate to apply several functions on several variables in one call的方向 . 我看着jaap 's answer there which referenced dplyr but I can'看看这是怎么回答我的问题?你说要使用 summarise_each ,但是我仍然没有看到我如何使用它来分别通过变量得到每个组的平均值?如果我是愚蠢的道歉...

1 回答

  • 1

    这是一个使用 bind_rows 的想法,

    library(dplyr)
    
    mtcars %>% 
         group_by(cyl, vs) %>% 
         summarise(new = mean(wt)) %>% 
        bind_rows(., 
                  mtcars %>% group_by(cyl) %>% summarise(new = mean(wt)) %>% mutate(vs = NA), 
                  mtcars %>% group_by(vs) %>% summarise(new = mean(wt)) %>% mutate(cyl = NA)) %>% 
       arrange(cyl) %>% 
       ungroup()
    
    # A tibble: 10 × 3
    #     cyl    vs      new
    #   <dbl> <dbl>    <dbl>
    #1      4     0 2.140000
    #2      4     1 2.300300
    #3      4    NA 2.285727
    #4      6     0 2.755000
    #5      6     1 3.388750
    #6      6    NA 3.117143
    #7      8     0 3.999214
    #8      8    NA 3.999214
    #9     NA     0 3.688556
    #10    NA     1 2.611286
    

相关问题