首页 文章

计算R中分组行的三年平均移动量

提问于
浏览
0

我有一个大型数据集,我正在计算大量的汇总统计数据,按物种和年份分组 . 这是一些用于设置数据帧的玩具代码:

species <- rep(c("Farfantepenaeus duorarum", "Menticirrhus littoralis",  "Ovalipes stephensoni", "Lolliguncula brevis", "Larimus fasciatus"), 4)
years <- rep(c(2007, 2013, 2001, 2013, 1994), 4)
lat <-c(33.9085, 34.6205, 33.7895, 33.8015, 29.9625, 35.1655, 34.7950, 29.5620, 32.8960, 32.2590, 33.1320, 32.9850, 34.6605, 34.0425, 32.8360, 32.6270, 32.0680, 31.7900, 34.1960, 30.7830)
testdf <- data.frame(species, years, lat)

我想要计算的第一个统计量是每年每个物种发现的最高3个纬度的平均值 . 我粗暴地强制使用以下代码,后来我加入了主df:

testtop3lat <- testdf %>%
  group_by(species, years) %>%
  top_n(3, lat) %>%
  mutate(top3lat = mean(lat))

接下来的任务是我想要计算一个移动的三年平均距离(纬度) . 因此,对于每个 speciesyears 组合,我想计算 [(latitude at year + 1) - (latitude at year - 1)] / 3 ,并将其作为列添加回来 . 最后,我希望主df中的每个观察都有 top3lattop3slope 列,每个 speciesyears 组合具有相同的条目 .

我一直在搞乱 mutate ,并编写了一个自定义函数来映射到原始数据集,但到目前为止都没有 . 建议将不胜感激!

EDIT 抱歉,这不是最有用的玩具数据集 . 对一个物种进行以下观察:

obsyears <- c(1980, 1980, 1980, 1981, 1981, 1981, 1982, 1982, 1982)
obslats <- c(38.5, 37, 39.2, 41.7, 40, 38.6, 41.2, 39.8, 38.7)

1981年,期望产出为 top3lat=40.1 (平均值为41.7,40和38.6) as a column for all row entries for that species in 1981 .

1981年的第二个期望输出是 top3slope = [(top3lat[1982]-top3lat[1980])/3] (是的,我知道这是不正确的R语言),这里将是 (39.9-38.23)/3 = 0.56also as a column for all row entries for that species in 1981 .

1 回答

  • 0

    如果我正确理解了您的要求,您应该能够在 leadlag 之内实现您的目标,全部来自dplyr .

    newdf <- testdf %>%
      left_join(
        testdf %>%
          group_by(species, years) %>%
          top_n(3, lat) %>%
          mutate(
            top3lat = mean(lat),
            top3slope = (lead(lat) - lag(lat)) / 3
          ) %>%
          na.omit %>%
          select(-lat),
        by = c("species", "years")
      ) %>%
      arrange(species, years, desc(lat))
    
    newdf    
    #                     species years     lat  top3lat  top3slope
    # 1  Farfantepenaeus duorarum  2007 35.1655 34.06867 -0.2588333
    # 2  Farfantepenaeus duorarum  2007 33.9085 34.06867 -0.2588333
    # 3  Farfantepenaeus duorarum  2007 33.1320 34.06867 -0.2588333
    # 4  Farfantepenaeus duorarum  2007 32.6270 34.06867 -0.2588333
    # 5         Larimus fasciatus  1994 32.8360 31.95933 -0.4920000
    # 6         Larimus fasciatus  1994 32.2590 31.95933 -0.4920000
    # 7         Larimus fasciatus  1994 30.7830 31.95933 -0.4920000
    # 8         Larimus fasciatus  1994 29.9625 31.95933 -0.4920000
    # 9       Lolliguncula brevis  2013 34.1960 34.01333  0.1315000
    # 10      Lolliguncula brevis  2013 34.0425 34.01333  0.1315000
    # 11      Lolliguncula brevis  2013 33.8015 34.01333  0.1315000
    # 12      Lolliguncula brevis  2013 32.8960 34.01333  0.1315000
    # 13  Menticirrhus littoralis  2013 34.7950 34.13350 -0.5451667
    # 14  Menticirrhus littoralis  2013 34.6205 34.13350 -0.5451667
    # 15  Menticirrhus littoralis  2013 32.9850 34.13350 -0.5451667
    # 16  Menticirrhus littoralis  2013 32.0680 34.13350 -0.5451667
    # 17     Ovalipes stephensoni  2001 34.6605 33.41333 -0.6665000
    # 18     Ovalipes stephensoni  2001 33.7895 33.41333 -0.6665000
    # 19     Ovalipes stephensoni  2001 31.7900 33.41333 -0.6665000
    # 20     Ovalipes stephensoni  2001 29.5620 33.41333 -0.6665000
    

    如果删除 lat 可以进一步总结

    newdf %>%
      select(-lat) %>%
      distinct
    
    #                    species years  top3lat  top3slope
    # 1 Farfantepenaeus duorarum  2007 34.06867 -0.2588333
    # 2        Larimus fasciatus  1994 31.95933 -0.4920000
    # 3      Lolliguncula brevis  2013 34.01333  0.1315000
    # 4  Menticirrhus littoralis  2013 34.13350 -0.5451667
    # 5     Ovalipes stephensoni  2001 33.41333 -0.6665000
    

相关问题