首页 文章

R:聚合不规则长度的时间序列组

提问于
浏览
1

我认为这是一个split-apply-combine问题,但随着时间序列的扭曲 . 我的数据包含不规则的计数,我需要对每组计数执行一些汇总统计 . 这是数据的快照:

enter image description here

这是适合您的控制台:

library(xts)

date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
              "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
              "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
              "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
              "2011-03-26", "2011-03-27"))

returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
         -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
         0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
          0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
          0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)

xtsData <- xts(cbind(returns,count,maxCount,sumCount),date)

我不知道如何构造max和cumSum列,特别是因为每个计数序列的长度不规则 . 由于我不会总是知道计数系列的起点和终点,所以我在试图找出这些组的索引时迷失了方向 . 谢谢你的帮助!

更新:这是我的for循环试图计算cumSum . 它不是累积总和,只是必要的回报,我仍然不确定如何将函数应用于这些范围!

xtsData <- cbind(xtsData,mySumCount=NA)
# find groups of returns
for(i in 1:nrow(xtsData)){
  if(is.na(xtsData[i,"count"]) == FALSE){
    xtsData[i,"mySumCount"] <- xtsData[i,"returns"]
  }
  else{
   xtsData[i,"mySumCount"] <- NA
  }
}

更新2:谢谢评论者!

# report returns when not NA count
x1 <- xtsData[!is.na(xtsData$count),"returns"]

# cum sum is close, but still need to exclude the first element
# -0.009 in the first series of counts and .027 in the second series of counts
x2 <- cumsum(xtsData[!is.na(xtsData$count),"returns"]) 

# this is output is not accurate because .03 is being displayed down the entire column, not just during periods when counts != NA. is this just a rounding error?
x3 <- max(xtsData[!is.na(xtsData$count),"returns"])

enter image description here

enter image description here

解:

# function to pad a vector with a 0
lagpad <- function(x, k) {
  c(rep(0, k), x)[1 : length(x)] 
}

# group the counts
x1 <- na.omit(transform(xtsData, g =  cumsum(c(0, diff(!is.na(count)) == 1))))

# cumulative sum of the count series
z1 <- transform(x1, cumsumRet = ave(returns, g, FUN =function(x) cumsum(replace(x, 1, 0))))
# max of the count series
z2 <- transform(x1, maxRet = ave(returns, g, FUN =function(x) max(lagpad(x,1))))



 merge(xtsData,z1$cumsumRet,z2$maxRet)

enter image description here

2 回答

  • 1

    显示的代码与图像中的输出不一致,并且没有提供解释,因此不清楚需要什么操作;然而,问题确实提到主要问题是区分群体,所以我们将解决这个问题 .

    为此,我们计算一个新列 g ,其行包含1表示第一组,2表示第二组,依此类推 . 我们还删除了NA行,因为 g 列足以区分组 .

    以下代码通过首先将每个NA位置设置为FALSE并将每个非NA位置设置为TRUE来计算与 count 具有相同长度的矢量 . 然后它将该矢量的每个位置与先前位置区分开 . 为此,它隐式地将FALSE转换为0并将TRUE转换为1,然后执行差分 . 接下来,我们将最后的结果转换为逻辑向量,对于每个1组件,该逻辑向量为TRUE,否则为FALSE . 由于差分向量的第一个分量没有先前位置,因此我们为此前置0 . 前置操作隐式地将刚生成的TRUE和FALSE值分别转换为1和0 . 将 cumsum 填入第一组中,将第二组填充为2,依此类推 . 最后省略NA行:

    x <- na.omit(transform(x, g =  cumsum(c(0, diff(!is.na(count)) == 1))))
    

    赠送:

    > x
               returns count maxCount sumCount g
    2010-11-26  -0.009     1    0.030    0.000 1
    2010-12-03   0.030     1    0.030    0.030 1
    2010-12-10   0.013     2    0.030    0.042 1
    2010-12-17   0.003     2    0.030    0.045 1
    2010-12-24   0.010     3    0.030    0.056 1
    2010-12-31   0.001     4    0.030    0.056 1
    2011-01-07   0.011     5    0.030    0.067 1
    2011-01-14   0.017     6    0.030    0.084 1
    2011-01-21  -0.008     7    0.030    0.077 1
    2011-01-28  -0.005     7    0.030    0.071 1
    2011-02-04   0.027     7    0.030    0.098 1
    2011-02-11   0.014     7    0.030    0.112 1
    2011-02-18   0.010     7    0.030    0.123 1
    2011-03-18   0.027     1    0.027    0.000 2
    2011-03-25  -0.019     2    0.027   -0.019 2
    attr(,"na.action")
    2010-11-18 2010-11-19 2011-02-25 2011-03-04 2011-03-11 2011-03-26 2011-03-27 
             1          2         16         17         18         21         22 
    attr(,"class")
    [1] "omit"
    

    您现在可以使用 ave 执行您喜欢的任何计算 . 例如,按组收取累计回报金额:

    transform(x, cumsumRet = ave(returns, g, FUN = cumsum))
    

    cumsum 替换为适用于 ave 的任何其他功能 .

  • 3

    啊,所以“count”是这些组,你想要每组的cumsum和每组的最大数量 . 我认为在data.table中,所以我会这样做 .

    library(xts)
    library(data.table)
    
    date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
                      "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
                      "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
                      "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
                      "2011-03-26", "2011-03-27"))
    
    returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
                 -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
                 0.000,0.001)
    count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
    maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
                  0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
    sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
                  0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)
    
    DT<-data.table(date,returns,count)]
    DT[!is.na(count),max:=max(returns),by=count]
    DT[!is.na(count),cumSum:= cumsum(returns),by=count]
    
    #if you need an xts object at the end, then.
    
    xtsData <- xts(cbind(DT$returns,DT$count, DT$max,DT$cumSum),DT$date)
    

相关问题