首页 文章

创建虚拟变量以进行双向ANOVA

提问于
浏览
1
d = data.frame(
    Temperature = c(rep("Cool", 6), rep("Warm", 6)),
    Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
    Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
)

我自己创建了一个双向ANOVA的小数据框架 . 我想通过执行双向ANOVA模型

summary(aov(Time~Bact*Temperature, data=d))

时间是因变量,而Bact和温度是两个分类独立变量 .

我想学习并证明ANOVA也可以用线性回归模型完成,而不是以ANOVA方式进行 . 我想将我的数据转换为虚拟变量并对其执行线性回归 . 我希望我能恢复相同的结果 . 虚拟变量还将包括Bact和温度之间的相互作用 .

问题是,我不知道如何将我的数据帧转换为虚拟变量,以便它可以在lm()函数中使用 .

2 回答

  • 1

    lm() 将为您创建虚拟变量 . 无需自己创建它们:

    m <- lm(Time ~ Bact*Temperature, data = d)
    anova(m)
    

    Edit

    如果你想在 lm() 的引擎下同行,你可以看到 model.matrix(m) 的设计矩阵

  • 0

    我也和你做同样的事情 . 我希望能够控制住,所以每当我有时间,我都会用以下方法自行设计假人:

    d = data.frame(
      Temperature = c(rep("Cool", 6), rep("Warm", 6)),
      Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
      Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
    )
    

    这是:

    > d
       Temperature   Bact  Time
    1         Cool Bact 1 15.23
    2         Cool Bact 1 14.32
    3         Cool Bact 2 14.77
    4         Cool Bact 2 15.12
    5         Cool Bact 3 14.05
    6         Cool Bact 3 15.48
    7         Warm Bact 1 14.13
    8         Warm Bact 1 16.13
    9         Warm Bact 2 16.44
    10        Warm Bact 2 14.82
    11        Warm Bact 3 17.96
    12        Warm Bact 3 16.65
    

    因此,您只需要对因子(温度,温度)进行虚拟化,以便以下过程有效:

    xfactors <- Filter(is.factor,d) #filter only the factors to dummify
    b <- data.frame(matrix(NA,nrow=nrow(xfactors),ncol=1)) #make empty data.frame to initiate b
    for ( i in 1:ncol(xfactors)) { #start loop
      a <- data.frame(model.matrix(~xfactors[,i])) #make dummies here
      b <- cbind(b, a[-1]) #remove intercept and combine dummies
    }
    b <- data.frame(b[-1]) #make a data.frame
    #the reference dummy gets excluded automatically by model.matrix
    colnames(b) <- c('warm' , 'bact2' , 'bact3') #you will probably want to change the names to sth smaller
    
    > b
       warm bact2 bact3
    1     0     0     0
    2     0     0     0
    3     0     1     0
    4     0     1     0
    5     0     0     1
    6     0     0     1
    7     1     0     0
    8     1     0     0
    9     1     1     0
    10    1     1     0
    11    1     0     1
    12    1     0     1
    

    然后运行模型:

    new_data <- cbind(b, Time=d$Time) #add time to the data
    mymod <- lm(Time ~ warm*bact2+warm*bact3, data=new_data) #compute lm with interactions
    #you shouldn't compute the interactions between dummy variables because they come from the same variable
    

    输出:

    > summary(mymod)
    
    Call:
    lm(formula = Time ~ warm * bact2 + warm * bact3, data = new_data)
    
    Residuals:
       Min     1Q Median     3Q    Max 
     -1.00  -0.67   0.00   0.67   1.00 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  14.7750     0.6873  21.498 6.61e-07 ***
    warm          0.3550     0.9719   0.365    0.727    
    bact2         0.1700     0.9719   0.175    0.867    
    bact3        -0.0100     0.9719  -0.010    0.992    
    warm:bact2    0.3300     1.3745   0.240    0.818    
    warm:bact3    2.1850     1.3745   1.590    0.163    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.9719 on 6 degrees of freedom
    Multiple R-squared:  0.6264,    Adjusted R-squared:  0.3151 
    F-statistic: 2.012 on 5 and 6 DF,  p-value: 0.2097
    

相关问题