首页 文章

将tidyr :: spread和dplyr :: summarize结合在一起

提问于
浏览
5

我经常想在"single step"中执行 tidyr::spreaddplyr::summarise 来按组聚合数据 . 我想要的是 expected . 我可以通过单独执行 summarisespread 来获得 expected 并将结果与 dplyr::full_join 结合起来,但我正在寻找 alternative approaches that avoid full_join . 不需要真正的单步骤方法 .

df <- data.frame(
        id = rep(letters[1], 2),
        val1 = c(10, 20),
        val2 = c(100, 200),
        key = c("A", "B"),
        value = c(1, 2))

library(tidyverse)
result1 <- df %>%
              group_by(id) %>%
              summarise(
                val1 = min(val1),
                val2 = max(val2)
              )
# A tibble: 1 x 3
  # id      val1  val2
  # <fctr> <dbl> <dbl>
# 1 a       10.0   200

result2 <- df %>%
              select(id, key, value) %>%
              group_by(id) %>%
              spread(key, value)
# A tibble: 1 x 3
# Groups: id [1]
  # id         A     B
# * <fctr> <dbl> <dbl>
# 1 a       1.00  2.00

expected <- full_join(result1, result2, by="id")
# A tibble: 1 x 5
  # id      val1  val2     A     B
  # <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a       10.0   200  1.00  2.00

3 回答

  • 0

    我怀疑你的数据可能有更多的边缘情况需要进行一些修改,但为什么不简单 spread 然后 summarise ?您可以为每个变量单独指定汇总函数,因此对于 AB ,您假设没有't actually need to calculate anything (I')您可以删除所有 NA

    df %>%
      spread("key", "value") %>%
      group_by(id) %>%
      summarise(
        val1 = min(val1),
        val2 = max(val2),
        A = mean(A, na.rm = TRUE),
        B = mean(B, na.rm = TRUE)
        )
    # A tibble: 1 x 5
      id     val1  val2     A     B
      <fct> <dbl> <dbl> <dbl> <dbl>
    1 a      10.0   200  1.00  2.00
    
  • 5

    自我回答:这是一种适用于 tidyr::nest 的方法,但似乎"messy"并没有好多少

    df %>%
      group_by(id) %>%
      nest() %>%
      mutate(
        min_vals = map(data, ~.x %>% summarise(min_val = min(val1), max_val = max(val2))),
        data = map(data, ~select(.x, key, value) %>% spread(key, value))
      ) %>%
      unnest()
    
    # A tibble: 1 x 5
      # id         A     B min_val max_val
      # <fctr> <dbl> <dbl>   <dbl>   <dbl>
    # 1 a       1.00  2.00    10.0     200
    
  • 0

    使用 do 的另一种方法:

    res <- df %>%
      group_by(id) %>%
      summarise(
        val1 = min(val1),
        val2 = max(val2),
        key = list(key),
        value = list(value)
      ) %>% group_by(id, val1, val2) %>%
      do( matrix(.$value[[1]], nrow=1) %>% as.data.frame %>% setNames(as.character(.$key[[1]])) )
    

相关问题