首页 文章

dplyr错误:组合group_by,mutate和ifelse时的奇怪问题 . 这是一个错误吗?

提问于
浏览
21

我在dplyr以及group_by,mutate和ifelse的组合方面遇到了奇怪的问题 . 请考虑以下data.frame

> df1
  crawl.id group.id hits.diff
1        1        1        NA
2        1        2        NA
3        2        2         0
4        1        3        NA
5        1        3        NA
6        1        3        NA

当我使用它时,以下代码

library(dplyr)
df1 %>%
  group_by(group.id) %>% 
  mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )

出于某种原因,我得到了

Error: incompatible types, expecting a logical vector**

但是,删除 group_by()ifelse 一切都按预期工作:

df1 %>%
  mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )

crawl.id group.id hits.diff hits.consumed
1        1        1        NA            NA
2        1        2        NA            NA
3        2        2         0             0
4        1        3        NA            NA
5        1        3        NA            NA
6        1        3        NA            NA

df1 %>%
  group_by( group.id ) %>%
  mutate( hits.consumed = -hits.diff )

  crawl.id group.id hits.diff hits.consumed
1        1        1        NA            NA
2        1        2        NA            NA
3        2        2         0             0
4        1        3        NA            NA
5        1        3        NA            NA
6        1        3        NA            NA

这是一个错误还是一个功能?任何人都可以复制这个吗? group_by,mutate和ifelse的特定组合使它失败的特别之处是什么?

我自己的研究在这里引导我:https://github.com/hadley/dplyr/issues/464这表明现在应该修复它 .

这是 dput(df1)

structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L, 
2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), 
    hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id", 
"group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame")

1 回答

  • 33

    将其全部包装在 as.numeric 中以强制输出格式,以便 NA (默认情况下为 logical )不会覆盖输出变量的类:

    df1 %>%
      group_by(group.id) %>% 
      mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )
    
    #  crawl.id group.id hits.diff hits.consumed
    #1        1        1        NA            NA
    #2        1        2        NA            NA
    #3        2        2         0             0
    #4        1        3        NA            NA
    #5        1        3        NA            NA
    #6        1        3        NA            NA
    

    很确定这是与此处相同的问题:Custom sum function in dplyr returns inconsistent results,因为这个结果表明:

    out <- df1[1:2,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
    class(out$hits.consumed)
    #[1] "logical"
    out <- df1[1:3,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
    class(out$hits.consumed)
    #[1] "numeric"
    

相关问题