首页 文章

在dplyr总结中的COUNTIF等效项

提问于
浏览
1

我有一个数据框,列出了参加活动的学生总数(Stu)和每组学生数(ID)(Sub):

ID   Stu   Sub
  (int) (int) (int)
1   101    80    NA
2   102   130    NA
3   103    10    NA
4   104   210    20
5   105   180    NA
6   106   150    NA

我想知道参与活动(Sub> 0)或不参与(Sub is.na)的大小乐队(> 400,> 200,> 100,> 0)的团体数量

output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L), 
                       Stu = c(80L, 130L, 10L, 210L, 180L, 150L), 
                       Sub = c(NA,NA, NA, 20L, NA, NA)), 
                  .Names = c("ID", "Stu", "Sub"), 
                  class = c("tbl_df", "data.frame"), 
                  row.names = c(NA, -6L))

temp <- output %>% 
mutate(Stu = ifelse(Stu >= 400, 400,
         ifelse(Stu >= 200, 200,
             ifelse(Stu >= 100, 100, 0
                 )))) %>%
group_by(Stu) %>%
summarise(entries = length(!is.na(Sub)),
          noentries = length(is.na(Sub)))

结果应该是:

Stu entries noentries
  (dbl)   (int)     (int)
1     0       0         2
2   100       0         3
3   200       1         0

但我得到:

Stu entries noentries
  (dbl)   (int)     (int)
1     0       2         2
2   100       3         3
3   200       1         1

如何使总结中的长度函数像countif一样?

3 回答

  • 3

    summarise 需要单个值,因此 sum 而不是 length 完成工作:

    output %>% 
      mutate(Stu = ifelse(Stu >= 400, 400,
                          ifelse(Stu >= 200, 200,
                                 ifelse(Stu >= 100, 100, 0
                                 )))) %>%
      group_by(Stu) %>% 
      summarise(entries = sum(!is.na(Sub)),
                noentries = sum(is.na(Sub)))
    
    Source: local data frame [3 x 3]
    
    Stu entries noentries
    (dbl)   (int)     (int)
    1     0       0         2
    2   100       0         3
    3   200       1         0
    
  • 1

    遵循@ eipi10提供的相同想法,但切入追逐 count() 而不是 group_by() %>% tally() 并显示 tidyr::spread 可以模仿 reshape2::dcast

    output %>%
      count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'),
            Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>%
      tidyr::spread(Sub, n, fill = 0)
    
  • 3

    另一种选择是按 StuSub 分组,但要做到这一点,我们需要首先重新编码 SubStu 的值以匹配我们想要的输出分组 . 我们还使用 cut 而不是嵌套 ifelse 来设置 Stu 中的值分隔符:

    library(reshape2)
    
    output %>% 
      group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"),
               Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>%
      tally %>%
      dcast(Stu ~ Sub, fill=0)
    

    Stu条目没有条目
    1 0 0 2
    2 100 0 3
    3 200 1 0

相关问题