如果每个观察可以属于具有多个分组变量的多个组，则进行聚合-Java 学习之路

这个问题是后续的：Aggregating if each observation can belong to multiple groups .

与链接问题一样，我的观察结果属于几个群体 . 但是现在我得到了2个分组变量，这使问题变得更加困难（至少对我而言） . 在下面的示例中，观察可以属于A，B，C组中的一个或多个 . 但是我还想根据另一个因素进行区分，即x <1，x <.5或y <0 . 因为所有x小0也小1每个观察可以再次属于多个组 . 我想根据两个分组（A，B，C和x <1，x <.5，y <0）进行聚合，得到所有组合的总和（（A和x <1），（A和x） <.5），...，（C和x <0） . 让我知道问题是否不够明确并随意编辑 Headers ，因为我无法想出一个合适的 Headers .

# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)


df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
                 groupB = sample(TF, size = n, replace = TRUE),
                 groupC = sample(TF, size = n, replace = TRUE))

df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]

# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)

1 回答

1
首先，您可以融合并使用group == TRUE进行子集化 . 接下来，使用 CJ （即交叉连接）来创建所有组合的列表 . 然后对原始数据集执行非equi连接并按如下方式执行求和：
```
mDT <- melt(df, id.vars=c("time", "x"))[(value)]
mDT[CJ(time=time, variable=variable, Level=seq(0,1,0.5), unique=TRUE), 
    sum(x.x), 
    by=.EACHI, 
    on=.(time, variable, x < Level)]
```
回复于 2024-05-07T11:05:15+08:00

如果每个观察可以属于具有多个分组变量的多个组，则进行聚合

1 回答

相关问题