这个问题是后续的:Aggregating if each observation can belong to multiple groups .
与链接问题一样,我的观察结果属于几个群体 . 但是现在我得到了2个分组变量,这使问题变得更加困难(至少对我而言) . 在下面的示例中,观察可以属于A,B,C组中的一个或多个 . 但是我还想根据另一个因素进行区分,即x <1,x <.5或y <0 . 因为所有x小0也小1每个观察可以再次属于多个组 . 我想根据两个分组(A,B,C和x <1,x <.5,y <0)进行聚合,得到所有组合的总和((A和x <1),(A和x) <.5),...,(C和x <0) . 让我知道问题是否不够明确并随意编辑 Headers ,因为我无法想出一个合适的 Headers .
# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)
df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
groupB = sample(TF, size = n, replace = TRUE),
groupC = sample(TF, size = n, replace = TRUE))
df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]
# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)
1 回答
首先,您可以融合并使用group == TRUE进行子集化 . 接下来,使用
CJ
(即交叉连接)来创建所有组合的列表 . 然后对原始数据集执行非equi连接并按如下方式执行求和: