首页 文章

dplyr无法使用group_by()重新定义类型

提问于
浏览
4

我有以下问题:

当使用dplyr在group_by()之后改变数字列时,如果一行只包含一个使用mutate命令时为NaN的值,则会失败 .

因此,如果分组列包含数字,则它会正确分类为dbl,但只要组中只有NaN的实例,它就会失败,因为dplyr将该组定义为 lgl ,而所有其他组都是 dbl .

我的第一个(也是更一般的问题)是:有没有办法告诉dplyr,当使用group_by()时,总是以某种方式定义列?

其次,有人可以帮我解决下面MWE中解释的问题:

# ERROR: This will provide the column defining error mentioned:

df <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df <- df %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)

df <- df %>% mutate(Winsorise = ifelse(x>2,2,x))

# NO ERROR (as no groups have single entry with NaN):
df2 <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df2 <- df2 %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)

# Update the Group for the row with an NA - Works
df2[9,1] <- "A"
df2 <- df2 %>% mutate(Winsorise = ifelse(x>3,3,x))


# REASON FOR ERROR: What happens for groups with one member = NaN, although we want the winsorise column to be dbl not lgl: 
df3 <- data_frame(g = "A",x = NaN)
df3 <- df3 %>% mutate(Winsorise = ifelse(x>3,3,x))

1 回答

  • 3

    原因是,正如您在df3中正确指出的那样,当源列为NaN / NA时,mutate结果被转换为逻辑 .

    为了避免这种情况,请将您的答案转换为数字:

    df <- df %>% mutate(Winsorise = as.numeric(ifelse(x>2,2,x)))
    

    也许@hadley可以解释为什么变异结果被转换成lgl?

相关问题