我正在分析一个纵向数据集,其中时间t(级别1)的观测值嵌套在单位i(级别2)内 . 具有随机截距和斜率的基线模型看起来像这样:

y_ti = beta_0i + beta_1i * x1_ti + e_ti

beta_0i = gamma_00 + gamma_10 * a_i + u_0i
beta_1i = gamma_10 + gamma_11 * a_i + u_1i

和二阶误差项的二元正规结构 . 在原始模型中,x1对应于特定于单位的趋势 . 在观察期的后半段,一些单位接受了治疗 . 在相同的时间点观察所有单元,然而,对于不同的单元,在不同的时间点发生处理(例如,在单元1的时段15和单元2的时段25中) . 此外,一些单位根本没有接触过治疗 . 此外,还有另一个变量(x2),我很想知道在治疗发生之前和之后相关的随机效应是否不同 . 这意味着我不仅对beta_2的单位之间的差异感兴趣,而且对接触治疗的那些单位在治疗之前的单位内变异感兴趣 . 在下面的等式中,我添加了固定处理效果和附加的随机斜率:

y_ti = beta_0i + beta_1i * x1_ti + beta_2i * x2_ti + beta_3 * treat_ti + e_ti

所以基本上,我对 interaction between the treatment variable (fixed effect) and the x2 variable (random effect) 很感兴趣 . 为了解决这个问题,我目前的方法是模拟嵌套在单位i(第2级)和单位i(1825959_)中的时间t(第1级)的结果,即治疗和非治疗(第3级) . 交叉分类方法似乎是一种可能的解决方案,因为单元不嵌套在高阶治疗组和非治疗组中,但可能在不同时间点属于一个或另一个 . 因此,我目前的做法是:

y_tij = beta_0i + beta_1i * x1_ti + beta_2ij * x2_tij + beta_3 * treat_ti + e_tij

这给出了每个处理单元(前后)的β_2的两个系数和每个未处理单元的一个系数 . 该信息可用于探索治疗对与变量x2相关的随机斜率的影响 .

My question is: is this approach correct, or are there better ways to model this interaction. I attached a reproducible example using simulated data below. Any help is much appreciated!

library(mvtnorm)
library(lme4)
library(dplyr)

set.seed(3333)
N <- 30 #number of units
J <- 30 #number of obs per unit
M <- J * N  #total number of observations

#generate data
#random effects
#unit-level df with predictor a
unit_df <- data.frame(unit = c(1:N), a = rnorm(N)) 
#random intercept (beta0) and slope (beta1), linearly related to a
unit_df <-  within(unit_df, {
  beta0_a <-  1 - 0.2 * a
  beta1_a <-  3 + 0.4 * a
})
#adding some noise for correlated intercepts and slopes
sd_beta0 <- 0.2
sd_beta1 <- 0.5
r <- 0.7
cov_matrix <-  matrix(c(sd_beta0^2, r * sd_beta1 * sd_beta0, r * sd_beta1 * 
                      sd_beta0, sd_beta1^2), nrow = 2, byrow = TRUE)
random_effects <- rmvnorm(N, mean = c(0, 0), sigma = cov_matrix)
#unit-level intercept & slopes
unit_df$beta0 <- unit_df$beta0_a + random_effects[, 1]
unit_df$beta1 <- unit_df$beta1_a + random_effects[, 2]

#fixed effects
#fixed component of x1, shared among units 
within_unit_df <-  data.frame(unit = sort(rep(c(1:N), J)), j = rep(c(1:J),N), x1 =rep(rnorm(N,0,1)))
#treatment variable, indicating in which period the treatment occured
#treatments may occur in any period in the second half (i.e., periods 15:29), and the treatment start varries by unit
treat <- sample(15:29, 30, replace=T)
#not all units obtain the treatment, setting the treatment for 10 units to 0   
for(i in sample(1:30, 10, replace=F)) treat[i] <- 30
#add treatment dummy to data frame
within_unit_df$treat_start <- rep(treat,each = 30)
within_unit_df <- within_unit_df %>% group_by(unit) %>% mutate(treat = ifelse(seq(n())>treat_start,1,0)) %>% select(-treat_start) %>% as.data.frame()

#interaction between the treatment variable (fixed effect) and variable x2 (random effect) 
#create another random component for each unit-treatment combination (i.e., effects will not only vary over units, but also over treatment)
unit_treat_df <- within_unit_df %>% group_by(unit,treat) %>% summarise(nobs=n()) %>% as.data.frame()
#number of unit-treatment combinations
N1 <- nrow(unit_treat_df) 
#unit-treatment level slope (random effect beta2)
sd_beta2 <- 0.4
unit_treat_df$beta2 <- rnorm(N1, 2, sd_beta2)

#merge everything together
df <- merge(unit_df, within_unit_df)
df <- plyr::join(df,unit_treat_df[,c("unit","treat","beta2")],by=c('unit','treat'), type='left', match='all') 
#add fixed component of x2, shared among units
df$x2 <- rep(rnorm(30,0,1),30) 

#generate data from the individual linear models incl. individual-level noise
df <-  within(df, y <-  beta0 + x1 * beta1 + 0.2*treat + x2 * beta2 + 0.75 * rnorm(n = M))
#create estimation data file and drop the rest
est_df <-  df[, c("unit", "a", "x1", "x2", "y","treat")]
rm(list = ls()[!ls() %in% c("est_df")])

#run model 
out <-  lmer(y ~ x1 + a + x1 * a + treat + x2 + (1 + x1 | unit) + (0 + x2 | unit:treat), data = est_df)
ranef(out)
summary(out)