dplyr：在group_by之后总结管道-Java 学习之路

我有这个data.frame：

df_test = structure(list(`MAE %` = c(-0.0647202646339709, -0.126867775585001, 
-1.81159420289855, -1.03092783505155, -2.0375491194877, -0.160783192796913, 
-0.585827216261999, -0.052988554472234, -0.703351261894911, -0.902996305924203, 
-0.767676767676768, -0.0101091791346543, -0.0134480903711673, 
-0.229357798165138, -0.176407935028625, -0.627062706270627, -1.75706139769261, 
-1.23024009524439, -0.257391763463569, -0.878347259688137, -0.123613523987705, 
-1.65711947626841, -2.11718534838887, -0.256285931980328, -1.87152777777778, 
-0.0552333609500138, -0.943983402489627, -0.541095890410959, 
-0.118607409474639, -0.840453845076341), Profit = c(7260, 2160, 
-7080, 3600, -8700, 6300, -540, 10680, -1880, -3560, -720, 5400, 
5280, 1800, 11040, -240, -2320, 2520, 10300, -2520, 8400, -9240, 
-5190, 7350, -6790, 3600, -3240, 8640, 7150, -2400)), .Names = c("MAE %", 
"Profit"), row.names = c(NA, 30L), class = "data.frame")

现在我想要一些总结统计数据，如：

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(GroupCnt  = n(),
              TopMAE    = filter(`MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(`MAE %` <= -1) %>% sum(Profit))

因此，如果Profit> 0或<= 0，我们将数据分组 . 然后，我希望对于MAE％<= -1且MAE％> -1的行，Profit的sum（） . 分组必须用于TopMAE，BottomMAE计算 .

预期结果如下：

#  win.g CroupCnt TopMAE BottomMAE
#1 FALSE       14 -15100    -39320
#2  TRUE       16  95360      6120

但我的R代码不起作用 . 我有一个错误：

错误：没有适用于'filter_'的方法应用于类“逻辑”的对象

我根据错误更改了我的代码：

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(UnderStop = n(),
              TopMAE    = filter(., `MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(., `MAE %` <= -1) %>% sum(Profit))

但结果是没有 . 我又错了一次：

错误：长度不正确（14），期待：16

我尝试了解分组行为以及如何在分组后使用管道内部汇总，但我没有成功 . 花一整天时间 .

我如何获得预期的结果表？在分组和计算这些组的某些功能时，请帮助我理解dplyr逻辑 .

2 回答

这是你想要的？（只是问，因为我的输出结果不同），

df_test %>% 
       group_by(win.g = Profit > 0) %>% 
       summarise(CroupCnt = n(), TopMAE = sum(Profit[`MAE %` > -1]), 
                                 BottomMAE = sum(Profit[`MAE %` <= -1]))

#Source: local data frame [2 x 4]

#  win.g CroupCnt TopMAE BottomMAE
#  (lgl)    (int)  (dbl)     (dbl)
#1 FALSE       14 -15100    -39320
#2  TRUE       16  95360      6120

回复于 2024-04-27T06:21:13+08:00

就个人而言，我更倾向于通过识别您在两个维度上执行分组操作来解决此类问题，但您的代码仅使用一个维度 . 这是一个在两个维度上执行相同工作的示例 . 它比@Sotos提供的代码多一些，但提供了相同的结果 .

library(dplyr)
library(tidyr)

df_test %>%
  #* Group on two dimensions
  group_by(win.g = Profit > 0,
           top = ifelse(`MAE %` > -1, "TopMAE", "BottomMAE")) %>%
  summarise(GroupCnt = n(),
            SumProfit = sum(Profit)) %>%
  ungroup() %>%

  #* Collapse the GroupCnt
  group_by(win.g) %>%
  mutate(GroupCnt = sum(GroupCnt)) %>%
  ungroup() %>%

  #* From long to wide
  spread(top, SumProfit)

回复于 2024-04-27T06:21:13+08:00

dplyr：在group_by之后总结管道

2 回答

相关问题