首页 文章

如果每个观察可以属于具有多个分组变量的多个组,则进行聚合

提问于
浏览
0

这个问题是后续的:Aggregating if each observation can belong to multiple groups .

与链接问题一样,我的观察结果属于几个群体 . 但是现在我得到了2个分组变量,这使问题变得更加困难(至少对我而言) . 在下面的示例中,观察可以属于A,B,C组中的一个或多个 . 但是我还想根据另一个因素进行区分,即x <1,x <.5或y <0 . 因为所有x小0也小1每个观察可以再次属于多个组 . 我想根据两个分组(A,B,C和x <1,x <.5,y <0)进行聚合,得到所有组合的总和((A和x <1),(A和x) <.5),...,(C和x <0) . 让我知道问题是否不够明确并随意编辑 Headers ,因为我无法想出一个合适的 Headers .

# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)


df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
                 groupB = sample(TF, size = n, replace = TRUE),
                 groupC = sample(TF, size = n, replace = TRUE))

df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]

# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)

1 回答

  • 1

    首先,您可以融合并使用group == TRUE进行子集化 . 接下来,使用 CJ (即交叉连接)来创建所有组合的列表 . 然后对原始数据集执行非equi连接并按如下方式执行求和:

    mDT <- melt(df, id.vars=c("time", "x"))[(value)]
    mDT[CJ(time=time, variable=variable, Level=seq(0,1,0.5), unique=TRUE), 
        sum(x.x), 
        by=.EACHI, 
        on=.(time, variable, x < Level)]
    

相关问题