Home Articles

在R中滚动逐步回归

Asked
Viewed 1612 times
0

我有一个12个预测变量的数据框和一个名为BEI的数字列表(我想预测) . 我想对每12行数据进行逐步选择,例如1:12,2:13等 . 对于每次滚动,我想返回系数并使用系数来预测BEI . 以下是我的代码:

k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
  BEI.subset <- BEI[i:(i+11)]
  predictors.subset <- predictors[c(i:(i+11)),]
  fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
  fit.summary <- summary(fit.stepwise)
  id <- which.min(fit.summary$cp)
  coefficients <- coef(fit.stepwise,id)
  coef.list <- append(coef.list, coefficients)
  form <- as.formula(fit.stepwise$call[[2]])
  mat <- model.matrix(form,predictors.subset)
  predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
  predicted.list <- append(predicted.list, predicted.stepwise)
}

我得到了这样的错误:重新排序变量并再次尝试:有50个或更多警告(使用warnings()查看前50个)

警告是:1:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:找到1个线性依赖关系2:在leaps.setup(x,y,wt = wt, nbest = nbest,nvmax = nvmax,...:找到1个线性相关性3:在leaps.setup中(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...:找到1个线性相关性...等等

我该如何解决?或者这是编写代码的更好方法吗?

1 Answer

  • 0

    您遇到错误的原因是滚动数据子集的缺失值(NA) .

    以数据(瑞士)为例:

    dim(swiss) 
    # [1] 47  6
    
    split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
    length(split_swiss)
    # [1] 47  ## rolling subset produce 47 data.frames. 
    
    lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames 
    [[1]]
                 Fertility Agriculture Examination Education Catholic Infant.Mortality
    Neuchatel         64.4        17.6          35        32    16.92             23.0
    Val de Ruz        77.6        37.6          15         7     4.97             20.0
    ValdeTravers      67.6        18.7          25         7     8.65             19.5
    V. De Geneve      35.0         1.2          37        53    42.34             18.0
    Rive Droite       44.7        46.6          16        29    50.43             18.2
    Rive Gauche       42.8        27.7          22        29    58.33             19.3
    
    [[2]]
                 Fertility Agriculture Examination Education Catholic Infant.Mortality
    Val de Ruz        77.6        37.6          15         7     4.97             20.0
    ValdeTravers      67.6        18.7          25         7     8.65             19.5
    V. De Geneve      35.0         1.2          37        53    42.34             18.0
    Rive Droite       44.7        46.6          16        29    50.43             18.2
    Rive Gauche       42.8        27.7          22        29    58.33             19.3
    NA                  NA          NA          NA        NA       NA               NA
    
    [[3]]
                 Fertility Agriculture Examination Education Catholic Infant.Mortality
    ValdeTravers      67.6        18.7          25         7     8.65             19.5
    V. De Geneve      35.0         1.2          37        53    42.34             18.0
    Rive Droite       44.7        46.6          16        29    50.43             18.2
    Rive Gauche       42.8        27.7          22        29    58.33             19.3
    NA                  NA          NA          NA        NA       NA               NA
    NA.1                NA          NA          NA        NA       NA               NA
    
    [[4]]
                 Fertility Agriculture Examination Education Catholic Infant.Mortality
    V. De Geneve      35.0         1.2          37        53    42.34             18.0
    Rive Droite       44.7        46.6          16        29    50.43             18.2
    Rive Gauche       42.8        27.7          22        29    58.33             19.3
    NA                  NA          NA          NA        NA       NA               NA
    NA.1                NA          NA          NA        NA       NA               NA
    NA.2                NA          NA          NA        NA       NA               NA
    
    [[5]]
                 Fertility Agriculture Examination Education Catholic Infant.Mortality
    Rive Droite      44.7        46.6          16        29    50.43             18.2
    Rive Gauche      42.8        27.7          22        29    58.33             19.3
    NA                 NA          NA          NA        NA       NA               NA
    NA.1               NA          NA          NA        NA       NA               NA
    NA.2               NA          NA          NA        NA       NA               NA
    NA.3               NA          NA          NA        NA       NA               NA
    
    [[6]]
                Fertility Agriculture Examination Education Catholic Infant.Mortality
    Rive Gauche      42.8        27.7          22        29    58.33             19.3
    NA                 NA          NA          NA        NA       NA               NA
    NA.1               NA          NA          NA        NA       NA               NA
    NA.2               NA          NA          NA        NA       NA               NA
    NA.3               NA          NA          NA        NA       NA               NA
    NA.4               NA          NA          NA        NA       NA               NA
    

    如果您要使用这些data.frames运行regsubsets,那么会出现错误,其中预测变量比情况多 .

    lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
    
     Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  : 
      y and x different lengths In addition: Warning messages:
    1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in,  :
      1  linear dependencies found
     ......
    

    相反,我只能保留12行的子集并继续进行回归,如下所示:

    split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
    lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
    

Related