首页 文章

比较dplyr中组内列中的值

提问于
浏览
0

我想使用dplyr比较分组data.frame中的值,并创建一个虚拟变量或类似的东西,指示哪个更大 . 无法搞清楚!

这是一些可重现的代码:

table <- structure(list(species = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Adelophryne adiastola", 
"Adelophryne gutturosa"), class = "factor"), scenario = structure(c(3L, 
1L, 2L, 3L, 1L, 2L), .Label = c("future1", "future2", "present"
), class = "factor"), amount = c(5L, 3L, 2L, 50L, 60L, 40L)), .Names = c("species", 
"scenario", "amount"), class = "data.frame", row.names = c(NA, 
-6L))
> table
                species scenario amount
1 Adelophryne adiastola  present      5
2 Adelophryne adiastola  future1      3
3 Adelophryne adiastola  future2      2
4 Adelophryne gutturosa  present     50
5 Adelophryne gutturosa  future1     60
6 Adelophryne gutturosa  future2     40

我会按照 species 对df进行分组 . 我想创建一个新列,可以是 increase_amount ,其中每个"future"中的金额与"present"进行比较 . 当值增加时我可以得到1而当它减少时我可以得到0 .

我一直在尝试使用for循环来抛出每个物种,但是df包含超过50,000个物种,而且我需要重新进行操作的时间太长了...

有人知道吗?非常感谢!

2 回答

  • 0

    你可以这样做:

    table %>% 
      group_by(species) %>% 
      mutate(tmp = amount[scenario == "present"]) %>% 
      mutate(increase_amount = ifelse(amount > tmp, 1, 0))
    # Source: local data frame [6 x 5]
    # Groups: species [2]
    # 
    #                 species scenario amount   tmp increase_amount
    #                  <fctr>   <fctr>   <int> <int>           <dbl>
    # 1 Adelophryne adiastola  present      5     5               0
    # 2 Adelophryne adiastola  future1      3     5               0
    # 3 Adelophryne adiastola  future2      2     5               0
    # 4 Adelophryne gutturosa  present     50    50               0
    # 5 Adelophryne gutturosa  future1     60    50               1
    # 6 Adelophryne gutturosa  future2     40    50               0
    
  • 1

    我们可以使用 avebase R 执行此操作

    table$increase_amount <-  with(table, as.integer(amount > ave(amount * 
             (scenario == "present"), species, FUN = function(x) x[x!=0])))
    table$increase_amount
    #[1] 0 0 0 0 1 0
    

相关问题