首页 文章

为什么group_by不能在dplyr中使用max(colSums)

提问于
浏览
1

我想知道每个国家,小学,中学和高中的最长持续时间(因为每年,持续时间可能不一样) . 我首先使用group_by country,并使用colSum,但我获得的值是所有的max(colSum),这意味着group_bu根本不起作用 .

我做了一些研究,我已经脱离了'plyr' . 其实,如果我试试

df1 <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(
    newvar = sum(wt)
  )

它运作良好 . 但是在这里我不是变异而不是一列,但在许多专栏中,你知道我该怎么做才能解决这个问题吗?

非常感谢 .

data1 = data.frame(country = c("A",'A',"A",'A',"B","B","B","B"),
    item = c("Age for primary school","Duration for primary school", "Duration for middle school", "duration for high school",
             "Age for primary school","Duration for primary school", "Duration for middle school", "duration for high school"), 
                     '2008' = c(6, 6, 4, 3,7,5,4,3),  
                     '2009' = c(6, 6, 4, 3,6,6,4,3), 
                     '2010' = c(7, 5, 4, 3,6,6,4,3),  
                     '2011' = c(7, 5, 4, 3,7,5,4,3)) 
  temp1 <- dplyr::filter(data1, item != 'Age for primary school') %>%
    dplyr::group_by(country) %>%
    dplyr::mutate(n_grade = max(colSums(.[,-c(1:2)], na.rm = TRUE)))

1 回答

  • 0

    如果在mutate中使用 . ,它将采用管道的左侧,即整个data.frame / tibble,而不是单个组 . 您可以改用 do .

    temp1 <- dplyr::filter(data1, item != 'Age for primary school') %>%
                dplyr::group_by(country) %>%
                dplyr::do(mutate(., n_grade = max(colSums(.[,-c(1:2)], na.rm = TRUE))))
    

    作为旁注,这是你可以用 data.table 做的 .

    library(data.table)
    setDT(data1)
    temp1 <- 
    data1[item != 'Age for primary school'] %>% 
        .[, n_grade := max(colSums(.SD, na.rm = TRUE))
          , by = country
          , .SDcols = -(1:2)]
    

相关问题