首页 文章

使用dplyr计算组总数的相对频率

提问于
浏览
1

我有以下玩具数据:

data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", 
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, 
-16L))

使用命令:

data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq

我为每个类中的每个值计算适当的相对频率:

> data
  Var1 Var2 Freq  rel_freq
1    A    1    3 0.2727273
2    B    1    3 0.6000000
3    A    2    4 0.3636364
4    B    2    2 0.4000000
5    A    3    4 0.3636364
6    B    3    0 0.0000000

我想知道如何构建等效的 dplyr 管道 . 粘贴在下面是我的尝试:

library(dplyr)
data %>%
  group_by(value, class) %>%
  summarise(n = n()) %>%
  complete(class, fill = list(n = 0)) %>%
  mutate(freq = n / sum(n))

我计算每个值的相对频率,但不幸的是,每个类的相对频率(而不是组总数):

Source: local data frame [6 x 4]
Groups: value [3]

  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.5000000
2     1      B     3 0.5000000
3     2      A     4 0.6666667
4     2      B     2 0.3333333
5     3      A     4 1.0000000
6     3      B     0 0.0000000

1 回答

  • 3

    您只需按 class 进行分组以计算频率,因此请删除 value 分组:

    data %>%
        group_by(value, class) %>%
        summarise(n = n()) %>%
        complete(class, fill = list(n = 0)) %>%
        group_by(class) %>%
        mutate(freq = n / sum(n))
    # A tibble: 6 x 4
      value  class     n      freq
      <int> <fctr> <dbl>     <dbl>
    1     1      A     3 0.2727273
    2     1      B     3 0.6000000
    3     2      A     4 0.3636364
    4     2      B     2 0.4000000
    5     3      A     4 0.3636364
    6     3      B     0 0.0000000
    

相关问题