使用dplyr的组之间的累积求和-Java 学习之路

我有一个结构如下：

day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5

请注意，每个 day 包含多个行，并且对于每个 day ， theta 的相同值重复任意次数 . （tibble包含其他任意列，需要这种重复结构 . ）

我想使用 dplyr 对 theta 的值进行累计求和，以便在上面的示例中， 2.1 仅添加一次到 3.2 ，等等 . 该tibble将被突变以附加新的累积总和（ c.theta ）如下：

day  theta  c.theta
1   1    2.1     2.1
2   1    2.1     2.1
3   2    3.2     5.3
4   2    3.2     5.3
5   5    9.5     14.8
6   5    9.5     14.8
7   5    9.5     14.8 
...

我最初的努力只能在整个数据集（例如， 2.1 + 2.1 + 3.2 ... ）上进行累积求和，这是不可取的 . {1825184_ day 然后 cumsum 超过 theta 在我的Stack Overflow搜索中，我可以在组内找到许多examples的累积求和，但从不在组之间，如上所述 . 向正确的方向推动将非常感激 .

3 回答

在 dplyr 这样做我想出了一个非常类似于PoGibas的解决方案 - 使用 distinct 每天只获得一行，找到总和并合并回来：

df = read.table(text="day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5", header = TRUE)

cumsums = df %>%
    distinct(day, theta) %>%
    mutate(ctheta = cumsum(theta))

df %>%
    left_join(cumsums %>% select(day, ctheta), by = 'day')

回复于 2024-04-27T21:28:39+08:00

不是 dplyr ，而只是一个替代 data.table 解决方案：

library(data.table)
# Original table is called d
setDT(d)
merge(d, unique(d)[, .(c.theta = cumsum(theta), day)])

   day theta c.theta
1:   1   2.1     2.1
2:   1   2.1     2.1
3:   2   3.2     5.3
4:   2   3.2     5.3
5:   5   9.5    14.8
6:   5   9.5    14.8
7:   5   9.5    14.8

PS：如果你想保留其他列，你必须使用 unique(d[, .(day, theta)])

回复于 2024-04-27T21:28:39+08:00

在基数R中，您可以使用 split<- 和 tapply 返回所需的结果 .

# construct 0 vector to fill in
dat$temp <- 0
# fill in with cumulative sum for each day
split(dat$temp, dat$day) <- cumsum(tapply(dat$theta, dat$day, head, 1))

这里， tapply 返回每天的第一个元素，它被送到 cumsum . 使用 split<- 将累积金额的元素输入到每一天 .

这回来了

dat
  day theta temp
1   1   2.1  2.1
2   1   2.1  2.1
3   2   3.2  5.3
4   2   3.2  5.3
5   5   9.5 14.8
6   5   9.5 14.8
7   5   9.5 14.8

回复于 2024-04-27T21:28:39+08:00

使用dplyr的组之间的累积求和

3 回答

相关问题