首页 文章

手动计算多元线性回归(OLS)的置信区间

提问于
浏览
0

我试图了解如何手动计算多元线性回归(OLS)的置信区间 . 我的问题是我不知道如何计算所有单个系数的标准误差 .

对于只有一个自变量的回归,我遵循以下教程:http://stattrek.com/regression/slope-confidence-interval.aspx . 本教程提供以下公式:

formula

事实证明,该公式有效 . 但是,我并没有完全理解这个公式 . 例如,为什么(-2)位于公式的顶部 . 为了验证正确性,我编写了以下已经显示标准错误的代码:

x<-1:50
y<-c(x[1:48]+rnorm(48,0,5),rnorm(2,150,5))

QR <- rq(y~x, tau=0.5)
summary(QR, se='boot')

LM<-lm(y~x)

alligator = data.frame(
      lnLength = c(3.78, 3.71, 3.73, 3.78),
      lnWeight = c(4.43, 4.38, 4.42, 4.25)
)

alli.mod1 = lm(lnWeight ~ ., data = alligator)

newdata = data.frame(
      lnLength = c(3.78, 3.71, 3.73, 3.78)
)

y_predicted = predict(alli.mod1, newdata, interval="predict")[,1]
length_mean = mean(alligator$lnLength)
> summary(alli.mod1)

 Call:
 lm(formula = lnWeight ~ ., data = alligator)

 Residuals:
        1        2        3        4 
  0.08526 -0.02368  0.03316 -0.09474 

 Coefficients:
             Estimate Std. Error t value Pr(>|t|)
 (Intercept)   7.5279     5.7561   1.308    0.321
 lnLength     -0.8421     1.5349  -0.549    0.638

 Residual standard error: 0.09462 on 2 degrees of freedom
 Multiple R-squared:  0.1308,   Adjusted R-squared:  -0.3038 
 F-statistic: 0.301 on 1 and 2 DF,  p-value: 0.6383

然后我使用以下r代码手动计算SE(根据上面的公式):

rss = (alligator$lnWeight[1] - y_predicted[1])^2 + 
      (alligator$lnWeight[2] - y_predicted[2])^2 +
      (alligator$lnWeight[3] - y_predicted[3])^2 + 
      (alligator$lnWeight[4] - y_predicted[4])^2

a = sqrt(rss/(length(y_predicted)-2))

b = sqrt((alligator$lnLength[1] - length_mean)^2 + 
         (alligator$lnLength[2] - length_mean)^2 +
         (alligator$lnLength[3] - length_mean)^2 + 
         (alligator$lnLength[4] - length_mean)^2)

a/b
1.534912

这导致与摘要的SE(alli.mod1)相同的值 . 所以,我想,当我尝试使用2个变量时,它可能会起作用 . 不幸的是,这导致了错误的答案 . 如下面的代码所示:

alligator = data.frame(
    lnLength = c(3.78, 3.71, 3.73, 3.78),
    lnColor = c(2.43, 2.59, 2.46, 2.22),
    lnWeight = c(4.43, 4.38, 4.42, 4.25)
  )

alli.mod1 = lm(lnWeight ~ ., data = alligator)

newdata = data.frame(
  lnLength = c(3.78, 3.71, 3.73, 3.78),
  lnColor = c(2.43, 2.59, 2.46, 2.22)
)


y_predicted = predict(alli.mod1, newdata, interval="predict")[,1]
length_mean = mean(alligator$lnLength)
color_mean = mean(alligator$lnColor)


rss = (alligator$lnWeight[1] - y_predicted[1])^2 + 
      (alligator$lnWeight[2] - y_predicted[2])^2 +
      (alligator$lnWeight[3] - y_predicted[3])^2 + 
      (alligator$lnWeight[4] - y_predicted[4])^2

a = sqrt(rss/(length(y_predicted)-2))

b = sqrt((alligator$lnColor[1] - color_mean)^2 + 
         (alligator$lnColor[2] - color_mean)^2 +
         (alligator$lnColor[3] - color_mean)^2 + 
         (alligator$lnColor[4] - color_mean)^2)

b1 = sqrt((alligator$lnLength[1] - length_mean)^2 + 
          (alligator$lnLength[2] - length_mean)^2 +
          (alligator$lnLength[3] - length_mean)^2 + 
          (alligator$lnLength[4] - length_mean)^2)

> summary(alli.mod1)
Call:
lm(formula = lnWeight ~ ., data = alligator)

Residuals:
        1         2         3         4 
 0.006725 -0.041534  0.058147 -0.023338 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.5746     8.8650  -0.403    0.756
lnLength      1.6569     2.1006   0.789    0.575
lnColor       0.7140     0.4877   1.464    0.381

Residual standard error: 0.07547 on 1 degrees of freedom
Multiple R-squared:  0.7235,    Adjusted R-squared:  0.1705 
F-statistic: 1.308 on 2 and 1 DF,  p-value: 0.5258

> a/b
             1 
     0.2009918 
> a/b1
             1 
     0.8657274

我可以遵循一般方法来计算标准误差吗?

1 回答

  • 2

    我会建议一些关于OLS的一般性阅读,包括多元回归 . 有几个免费提供的信息来源;一个起点可能是Penn State's STAT 501 website . 您可以在麻省理工学院开放课件的these slides幻灯片8和9中找到OLSβ标准误差公式的推导 .

    基本上,标准误差是β中方差的平方根 . 正如您在我链接的幻灯片中所看到的,系数方差 - 协方差矩阵的公式是σ2(X'X)-1,其中σ2是误差项的方差 . 然后每个βj的方差是该矩阵的第j个对角线 . 由于我们不知道真正的σ2,我们如上所述估计它 - 我们取平方误差之和的平方根除以n - p,其中p是解释变量的数量(包括/截距) ) - 在简单的回归中p = 2.然而,在简单回归的情况下,(X'X)-1的对角线可以通过你的公式的分母找到,这得胜了't be the case in multiple regression; you' ll需要做的矩阵代数 . 幸运的是,这在R中很容易:

    # First we make the example data
    alligator = data.frame(
        lnLength = c(3.78, 3.71, 3.73, 3.78),
        lnColor = c(2.43, 2.59, 2.46, 2.22),
        lnWeight = c(4.43, 4.38, 4.42, 4.25)
    )
    # Then we use lm() for a check on our answers later
    alli.mod1 = lm(lnWeight ~ ., data = alligator)
    # Find the sum of the squared residuals
    rss <- sum(alli.mod1$residuals^2)
    # And use that to find the estimate of sigma^2, commonly called S
    S <- sqrt(rss / (length(alli.mod1$residuals) - length(alli.mod1$coefficients)))
    # Make the X matrix; a column of 1s for the intercept and one for each variable
    X <- cbind(rep(1, nrow(alligator)), alligator$lnLength, alligator$lnColor)
    # We can multiply matrices using %*%, transpose them with t(),
    # and invert them with solve(); so we directly apply the formula above with:
    std.errors <- S * sqrt(diag(solve(t(X) %*% X)))
    # Now we check our answers:
    summary(alli.mod1)$coefficients[ , 2] # the second column is the std. errors
    # (Intercept)    lnLength     lnColor 
    #   8.8650459   2.1005738   0.4876803 
    std.errors
    # [1] 8.8650459 2.1005738 0.4876803
    

相关问题