在R [duplicate]中使用已聚集的data.frame上的聚集-Java 学习之路

这个问题在这里已有答案：

Replicate each row of data.frame and specify the number of replications for each row 5个答案

我在R中有一个data.frame，它包含年龄，长度和每个长度组中个人的总数 . 我想得到每个年龄组的长度的均值和标准差，我觉得用dplyr这样做最容易 . 但是，我似乎无法弄清楚如何 gather() 这个特定的数据集 . 这是数据：

dat <- data.frame(age = sort(rep(1:5, 5)),
              length = c(6:10, 8:12, 10:14, 12:16, 14:18),
              total = sample(25:50, 50, replace=T))

看起来像这样：

age length total
   1      6    38
   1      7    42
   1      8    49
   1      9    28
   1     10    26
   2      8    37

并且，我希望它看起来像以下所以我可以轻松 group_by(age) %>% summarize(mean = mean(length), sd = sd(length)) .

age  length
1     6
1     6
1     6
1     6
1     6

等（1岁时应该有38个6岁，1岁时应该有42个，等等） .

如何使用tidyr的 gather() 功能实现这一目标？我似乎无法做到这一点 . 很高兴听到其他建议 .

1 回答

如何计算加权平均值呢？

dat <- data.frame(age = sort(rep(1:5, 5)),
                  length = c(6:10, 8:12, 10:14, 12:16, 14:18),
                  total = sample(25:50, 50, replace=T))
library(magrittr)
library(dplyr)

dat %>% 
  group_by(age) %>%
  summarise(mean_length = sum(length * total) / sum(total),
            wtd_mean = weighted.mean(length, total))

编辑：我之前发现R有一个 weighted.mean 函数，这使得它更简单 .

回复于 2024-05-11T04:26:39+08:00

在R [duplicate]中使用已聚集的data.frame上的聚集

1 回答

相关问题