示例设置:

> first <- function(value) {
  if(length(value)==0) {
    return(data.frame(out=NA,othercolumns=c(0,1)))
  } else {
    return(data.frame(out=mean(value),othercolumns=c(1,1)))
  }
}
> set.seed(1)
> df <- data.frame(column1=runif(10),column2=runif(10),
                 category=sample(c("a","b"),10,replace=TRUE))

dplyr函数链返回错误:

> df %>% group_by(category) %>% filter(column2 > 1) %>% do(first(.$column1))
Error: incompatible number of rows (2, expecting 0

有没有办法强制dplyr将空数据帧发送到 do() 而不是抛出错误?

Update

在@Henrik的链接之后,似乎需要将数据框转换为 tbl_df() 对象 . 转换必须在group_by()调用之后发生:

> df %>% group_by(category) %>% tbl_df() %>%
  filter(column2 > 1) %>% do(first(.$column1))

+   out othercolumns
1  NA            0
2  NA            1

语法奇怪且不直观但有效......

虽然我希望输出像

category  out othercolumns
1  "a"       NA            0
2  "a"       NA            1
3  "b"       NA            0
4  "b"       NA            1

Update 2

与plyr :: ddply结合似乎运作良好:

> ddply(df,.(category),function(.) filter(.,column2 > 1) %>%
        do(first(.$column1)))

+   category out othercolumns
1        a  NA            0
2        a  NA            1
3        b  NA            0
4        b  NA            1