我正在努力编写一个在 dplyr::mutate()
内工作的函数 .
由于 rowwise() %>% sum()
在大型数据集上非常慢,因此建议的替代方法是返回到baseR . 我希望如下简化这个过程,但是在mutate函数中传递数据时遇到了麻烦 .
require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)
cars %>%
mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Appears to not be getting the data passed. Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
那么问题就变成了如何通过在函数内部传递数据来解决每次包含点的需要?
rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(., na.rm = na.rm)
}
#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
由reprex package(v0.2.0)创建于2018-05-22 .
来自akrun的回答(请upvote):
换句话说:放弃 mutate()
并在新功能中做所有事情 .
这是我的最终函数,作为对他的更新,如果需要,还允许命名sum value列 .
rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
bind_cols(data, .)
}
1 回答
我们可以将
...
放在最后它也可以在不改变
...
的位置的情况下工作(尽管通常建议使用) . 这里的主要问题是在mutate
中的参数列表中未指定data
(.
) .在函数中创建整个流程而不是做一个部分会更容易