Home Articles

使用Data.table滚动回归 - 更新?

Asked
Viewed 1705 times
0

我试图在data.table中运行滚动回归 . 有许多问题可以解决我想要做的事情,但它们一般都是3岁,并提供不优雅的答案 . (参见:here,例如)

我想知道是否有任何更新data.table包,使这更直观/更快?

这是我想要做的 . 我的代码看起来像这样:

DT<-data.table(
  Date = seq(as.Date("2000/1/1"), by = "day", length.out = 1000),
  x1=rnorm(1000),
  x2=rnorm(1000),
  x3=rnorm(1000),
  y=rnorm(1000),
  country=rep(c("a","b","c","d"), each=25))

我希望在一个滚动的180天窗口中按国家/地区对x1,x2和x3进行回归,并按日期存储系数 .

理想情况下,语法看起来像这样:

DT[,.(coef.x1 := coef(y~x1+x2+x3)[2] , 
coef.x2 := coef(y~x1+x2+x3)[3], 
coef(y~x1+x2+x3)[4],
by=c("country",ROLLING WINDOW)]

...但更优雅/尽可能避免重复! :)

由于某些原因,我还没有得到rollapply语法对我有用 .

谢谢!


编辑:

谢谢@michaelchirico .

你的建议接近我的目标 - 也许它可以修改代码来接收它但是再次,我被卡住了 .

这是对我需要的更仔细的阐述 . 一些代码:

DT<-data.table(
  Date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 10),times=3), #same dates per country

  x1=rep(rnorm(10),time=3), #x1's repeat - same per country
  x2=rep(rnorm(10), times=3),#x2's repeat - same per country
  x3=rep(rnorm(10), times=3), #x3's repeat - same per country
  y=rnorm(30), #y's do not repeat and are unique per country per day
  country=rep(c("a","b","c"), each=10))

#to calculate the coefficients by individual  country: 
a<-subset(DT,country=="a")
b<-subset(DT,country=="b")

window<-5 #declare window
coefs.a<-coef(lm(y~x1+x2+x3, data=a[1:window]))#initialize my coef variable
coefs.b<-coef(lm(y~x1+x2+x3, data=b[1:window]))#initialize my coef variable

##calculate coefficients per window

for(i in 1:(length(a$Date)-window)){
  coefs.a<-rbind(coefs.a, coef(lm(y~x1+x2+x3, data=a[(i+1):(i+window-1)])))
  coefs.b<-rbind(coefs.b, coef(lm(y~x1+x2+x3, data=b[(i+1):(i+window-1)])))
 }

此数据集与前一个数据集的区别在于日期,而x1,x2,x3都重复 . 我的每个国家都是独一无二的 .

在我的实际数据集中,我有120个国家 . 我可以为每个国家计算这个,但它非常慢,然后我必须将所有系数重新加入到单个数据集中以分析结果 .

是否有类似于您提议的最终单个data.table,所有观察结果?

再次感谢!!

1 Answer

  • 0

    目前还不清楚你究竟应该接近什么样的镜头(根据具体细节需要进行微调):

    我真的不能说速度 .

    TT <- DT[ , uniqueN(Date), by = country][ , max(V1)]
    window <- 5
    #pre-declare a matrix of windows; each column represents
    #one of the possible windows of days
    windows <- matrix(1:TT, nrow = TT + 1, ncol = max(TT - window + 1, 1))[1:window, ]
    
    DT[ , {
      #not all possible windows necessarily apply to each
      #  country; subset to find only the relevant windows
      windowsj <- windows[ , 1:(uniqueN(Date) - window + 1)]
      #lapply returns a list (which can be readily assigned with :=)
      lapply(1:ncol(windowsj),
             function(ii){
               #subset to relevant rows
               .SD[windowsj[ , ii],
                   #regress, extract
                   lm(y ~ x1 + x2 + x3)$coefficients]})},
      by = country]
    

    将结果与 coefs.acoefs.b 进行比较:

    country         V1          V2         V3          V4          V5          V6
     1:       a -0.8764867  0.46169717  2.6712128  2.66304537  1.18928600  0.53553900
     2:       a -1.0135961  0.03985467  0.6015446  0.61316724  0.24177034  0.86369780
     3:       a -0.1807617 -0.25767309 -2.9492897 -3.05092528 -0.04310375  0.62317993
     4:       a -0.6664342 -0.30732907 -0.3362091 -0.25776715  1.04419854  1.02294125
     5:       b  0.9548685  0.77461810 -0.5100818 -0.57726788 -0.73285223 -1.64196684
     6:       b  0.7179429  0.46107110  0.1732915  0.23262455  0.23258149  3.63679221
     7:       b  0.1639778 -0.22249382  1.4539881  0.58725270  0.54879762 -0.27115275
     8:       b  0.6192641  0.12706750  0.2671673  0.79569434  0.69031761  2.27769679
     9:       c  0.2722200  0.07279085 -0.7709578 -0.74590575 -0.15773196  0.03178821
    10:       c  0.8890314  0.74213624  0.4440650  0.34939003  0.50531166  0.16550026
    11:       c  0.1589915  0.20531447  0.9931054  1.25495206 -0.01543296 -0.09887655
    12:       c  0.7198967  0.70536869  0.4508445  0.02028332 -0.54705588 -0.64246579
    
    > coefs.a
            (Intercept)          x1          x2         x3
    coefs.a  -0.8764867 -1.01359605 -0.18076171 -0.6664342
              0.4616972  0.03985467 -0.25767309 -0.3073291
              2.6712128  0.60154458 -2.94928969 -0.3362091
              2.6630454  0.61316724 -3.05092528 -0.2577671
              1.1892860  0.24177034 -0.04310375  1.0441985
              0.5355390  0.86369780  0.62317993  1.0229412
    

    (即它是相同的,只是换位)

Related