首页 文章

如何将每日时间序列转换为平均每周?

提问于
浏览
0

我希望(算术上)平均每日数据,从而将我的每日时间序列转换为每周一次 .

在这个帖子之后:How does one compute the mean of weekly data by column using R?,我正在使用 xts 库 .

# Averages daily time series into weekly time series
# where my source is a zoo object
source.w <- apply.weekly(source, colMeans)

我遇到的问题是,这个系列的平均值是星期二到下一个星期一的数据 .

我正在寻找从星期一到星期五平均每日数据的选项 .

任何提示?

这里有一点:

# here is part of my data, from a "blé colza.txt" file


    24/07/2012  250.5   499
    23/07/2012  264.75  518.25
    20/07/2012  269.25  525.25
    19/07/2012  267 522.5
    18/07/2012  261.25  517
    17/07/2012  265.75  522.25
    16/07/2012  264.25  523.25
    13/07/2012  258.25  517
    12/07/2012  253.75  513
    11/07/2012  246.25  512.75
    10/07/2012  248 515
    09/07/2012  247 519.25
    06/07/2012  243.25  508.25
    05/07/2012  245 508.5
    04/07/2012  236 500.5
    03/07/2012  234 497.75
    02/07/2012  234.25  489.75
    29/06/2012  229 490.25
    28/06/2012  229.75  487.25
    27/06/2012  229.75  493
    26/06/2012  226.5   486
    25/06/2012  220 482.25
    22/06/2012  214.25  472.5
    21/06/2012  212 469.5
    20/06/2012  210.25  473.75
    19/06/2012  208 472.75
    18/06/2012  206.75  462.5
    15/06/2012  203 456.5
    14/06/2012  205.25  460.5
    13/06/2012  205.25  465.25
    12/06/2012  205.25  469
    11/06/2012  208 471.5
    08/06/2012  208 468.5
    07/06/2012  208 471.25
    06/06/2012  208 467
    05/06/2012  208 458.75
    04/06/2012  208 457.5
    01/06/2012  208 463.5
    31/05/2012  208 466.75
    30/05/2012  208 468
    29/05/2012  212.75  469.75
    28/05/2012  212.75  469.75
    25/05/2012  212.75  465.5



# Loads external libraries
library("zoo") # or require("zoo")
library("xts") # or require("xts")

# Loads data as a zoo object
source <- read.zoo("blé colza.txt", sep=",", dec=".", header=T, na.strings="NA",     format="%d/%m/%Y")

# Averages daily time series into weekly time series
# https://stackoverflow.com/questions/11129562/how-does-one-compute-the-mean-of-weekly-    data-by-column-using-r
source.w <- apply.weekly(source, colMeans)

5 回答

  • 6

    mrdwabanswer才恰好工作,因为他们与OP共享一个时区(或其特征) . 为了显示:

    Lines <- 
        "24/07/2012  250.5   499
        23/07/2012  264.75  518.25
        20/07/2012  269.25  525.25
        19/07/2012  267 522.5
        18/07/2012  261.25  517
        17/07/2012  265.75  522.25
        16/07/2012  264.25  523.25
        13/07/2012  258.25  517
        12/07/2012  253.75  513
        11/07/2012  246.25  512.75
        10/07/2012  248 515
        09/07/2012  247 519.25
        06/07/2012  243.25  508.25
        05/07/2012  245 508.5
        04/07/2012  236 500.5
        03/07/2012  234 497.75
        02/07/2012  234.25  489.75
        29/06/2012  229 490.25
        28/06/2012  229.75  487.25
        27/06/2012  229.75  493
        26/06/2012  226.5   486
        25/06/2012  220 482.25
        22/06/2012  214.25  472.5
        21/06/2012  212 469.5
        20/06/2012  210.25  473.75
        19/06/2012  208 472.75
        18/06/2012  206.75  462.5
        15/06/2012  203 456.5
        14/06/2012  205.25  460.5
        13/06/2012  205.25  465.25
        12/06/2012  205.25  469
        11/06/2012  208 471.5
        08/06/2012  208 468.5
        07/06/2012  208 471.25
        06/06/2012  208 467
        05/06/2012  208 458.75
        04/06/2012  208 457.5
        01/06/2012  208 463.5
        31/05/2012  208 466.75
        30/05/2012  208 468
        29/05/2012  212.75  469.75
        28/05/2012  212.75  469.75
        25/05/2012  212.75  465.5"
    
    # Get R's timezone information (from ?Sys.timezone)
    tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
    tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
      col.names = c("country", "coords", "name", "comments"),
      as.is = TRUE, fill = TRUE, comment.char = "#")
    
    # Run the analysis on each timezone
    out <- list()
    library(xts)
    for(i in seq_along(tzones$name)) {
      tzn <- tzones$name[i]
      Sys.setenv(TZ=tzn)
      con <- textConnection(Lines)
      Source <- read.zoo(con, format="%d/%m/%Y")
      out[[tzn]] <- apply.weekly(Source, colMeans)
    }
    

    现在你可以运行 head(out,5) 并看到一些输出根据使用的时区而有所不同:

    head(out,5)
    $`Europe/Andorra`
                   V2      V3
    2012-05-27 212.75 467.625
    2012-06-03 208.95 465.100
    2012-06-10 208.00 467.400
    2012-06-17 205.10 462.750
    2012-06-24 212.90 474.150
    2012-07-01 229.85 489.250
    2012-07-08 241.05 506.850
    2012-07-15 254.10 516.200
    2012-07-22 265.60 521.050
    2012-07-23 250.50 499.000
    
    $`Asia/Dubai`
                   V2      V3
    2012-05-27 212.75 467.625
    2012-06-03 208.95 465.100
    2012-06-10 208.00 467.400
    2012-06-17 205.10 462.750
    2012-06-24 212.90 474.150
    2012-07-01 229.85 489.250
    2012-07-08 241.05 506.850
    2012-07-15 254.10 516.200
    2012-07-22 265.60 521.050
    2012-07-23 250.50 499.000
    
    $`Asia/Kabul`
                   V2      V3
    2012-05-27 212.75 467.625
    2012-06-03 208.95 465.100
    2012-06-10 208.00 467.400
    2012-06-17 205.10 462.750
    2012-06-24 212.90 474.150
    2012-07-01 229.85 489.250
    2012-07-08 241.05 506.850
    2012-07-15 254.10 516.200
    2012-07-22 265.60 521.050
    2012-07-23 250.50 499.000
    
    $`America/Antigua`
                    V2      V3
    2012-05-25 212.750 465.500
    2012-06-01 209.900 467.550
    2012-06-08 208.000 464.600
    2012-06-15 205.350 464.550
    2012-06-22 210.250 470.200
    2012-06-29 227.000 487.750
    2012-07-06 238.500 500.950
    2012-07-13 250.650 515.400
    2012-07-20 265.500 522.050
    2012-07-24 257.625 508.625
    
    $`America/Anguilla`
                    V2      V3
    2012-05-25 212.750 465.500
    2012-06-01 209.900 467.550
    2012-06-08 208.000 464.600
    2012-06-15 205.350 464.550
    2012-06-22 210.250 470.200
    2012-06-29 227.000 487.750
    2012-07-06 238.500 500.950
    2012-07-13 250.650 515.400
    2012-07-20 265.500 522.050
    2012-07-24 257.625 508.625
    

    更强大的解决方案是确保正确表示您的时区,方法是使用 Sys.setenv(TZ="<yourTZ>") 全局设置或 indexTZ(Source) <- "<yourTZ>" 为每个单独的对象设置它 .

  • 0

    我运行了你的例子,如果我正确理解了问题, apply.weekly 函数将第一个星期五与你数据的第一个星期一聚合在一起 . 我不使用 xts 包,所以其他人必须提供更多的见解 . 我会将日期转换为日期向量,每周的星期日代表该周的每个观察 . ?strptime 汇总了我用于转换的代码 .

    # Get the year of the first observation
    start_year <- format(time(source)[1],"%Y")
    # Convert this into a date for the 1st of Jan in that year.
    start_date <- as.Date(strptime(paste(start_year, "1 1"), "%Y %d %m"))
    
    # Using the difftime function determine the distance (days) since the first day of the first year.
    jul_day <- as.numeric(difftime(time(source),start_date),units="days")
    # Get the date of the Monday before each observation and add it to the start of the year. 
    mondays <- start_date + (jul_day - (jul_day-1)%%7)
    # the %% calculates the remainder.
    # to check that it has worked convert the mondays vector into day names.
    format(mondays, "%A")
    
    # And now you can aggregate the observations using the mondays vector.
    source.w <- aggregate(source[,1:2], mondays, "mean")
    
  • 3

    再看看我手边的问题 .

    使用 xts 库是直截了当的 .

    # say you have xts object name 'dat'
    ep <- endpoints(dat, on = 'weeks')                          # 
    period.apply(x = dat, INDEX = ep, FUN = mean)
    
  • 1

    跟进约书亚乌尔里希的答案 .

    在我的系统(kUbuntu 12)上,以下内容未检索zone.tab文件

    tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
    

    但是,我能够找到zone.tab

    locate zone.tab
    

    由于某种原因(可能是文件权限),我无法直接指向该zone.tab文件,即写入:

    tzfile <- "usr/share/zoneinfo/zone.tab"
    

    回:

    Error in file(file, "rt") : cannot open the connection
    In addition: Warning message:
    In file(file, "rt") :
      cannot open file 'usr/share/zoneinfo/zone.tab': No such file or directory
    

    制作zone.tab的本地副本并指向该副本后问题已解决:

    tzfile <- "~/R/zone.tab"
    

    现在,如果您使用Google for zone.tab,您将在线找到zone.tab的副本,以防您的系统没有或已损坏或其他任何内容 . 这是一个这样的地方:

    http://www.ietf.org/timezones/data/zone.tab
    

    附:我<15所以我不能发表评论,这是我原本应该做的 .

  • 2

    我能够重现您的问题,您可以使用 period.apply() 和自定义"endpoints"来解决它 .

    首先,您提供的数据采用其他人可以轻松阅读的格式 .

    temp = structure(list(V1 = structure(c(33L, 32L, 29L, 27L, 25L, 23L, 
    22L, 19L, 17L, 15L, 13L, 12L, 9L, 7L, 5L, 3L, 2L, 41L, 39L, 37L, 
    36L, 35L, 31L, 30L, 28L, 26L, 24L, 21L, 20L, 18L, 16L, 14L, 11L, 
    10L, 8L, 6L, 4L, 1L, 43L, 42L, 40L, 38L, 34L), .Label = c("01/06/2012", 
    "02/07/2012", "03/07/2012", "04/06/2012", "04/07/2012", "05/06/2012", 
    "05/07/2012", "06/06/2012", "06/07/2012", "07/06/2012", "08/06/2012", 
    "09/07/2012", "10/07/2012", "11/06/2012", "11/07/2012", "12/06/2012", 
    "12/07/2012", "13/06/2012", "13/07/2012", "14/06/2012", "15/06/2012", 
    "16/07/2012", "17/07/2012", "18/06/2012", "18/07/2012", "19/06/2012", 
    "19/07/2012", "20/06/2012", "20/07/2012", "21/06/2012", "22/06/2012", 
    "23/07/2012", "24/07/2012", "25/05/2012", "25/06/2012", "26/06/2012", 
    "27/06/2012", "28/05/2012", "28/06/2012", "29/05/2012", "29/06/2012", 
    "30/05/2012", "31/05/2012"), class = "factor"), V2 = c(250.5, 
    264.75, 269.25, 267, 261.25, 265.75, 264.25, 258.25, 253.75, 
    246.25, 248, 247, 243.25, 245, 236, 234, 234.25, 229, 229.75, 
    229.75, 226.5, 220, 214.25, 212, 210.25, 208, 206.75, 203, 205.25, 
    205.25, 205.25, 208, 208, 208, 208, 208, 208, 208, 208, 208, 
    212.75, 212.75, 212.75), V3 = c(499, 518.25, 525.25, 522.5, 517, 
    522.25, 523.25, 517, 513, 512.75, 515, 519.25, 508.25, 508.5, 
    500.5, 497.75, 489.75, 490.25, 487.25, 493, 486, 482.25, 472.5, 
    469.5, 473.75, 472.75, 462.5, 456.5, 460.5, 465.25, 469, 471.5, 
    468.5, 471.25, 467, 458.75, 457.5, 463.5, 466.75, 468, 469.75, 
    469.75, 465.5)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
    -43L))
    

    我们将清理并将对象转换为 xts 对象 .

    temp$V1 = as.Date(temp$V1, format="%d/%m/%Y")
    library(xts)
    temp.x = xts(temp[-1], order.by=temp$V1)
    

    现在 . 我们尝试 apply.weekly() 函数,但它没有给我们你想要的 .

    apply.weekly(temp.x, colMeans)
    #                V2      V3
    # 2012-05-28 212.75 467.625
    # 2012-06-04 208.95 465.100
    # 2012-06-11 208.00 467.400
    # 2012-06-18 205.10 462.750
    # 2012-06-25 212.90 474.150
    # 2012-07-02 229.85 489.250
    # 2012-07-09 241.05 506.850
    # 2012-07-16 254.10 516.200
    # 2012-07-23 265.60 521.050
    # 2012-07-24 250.50 499.000
    

    要使用 period.apply() ,您需要指定期间的终点(可以是不规则的) . 在这里,我们的第一个时期只是第一个日期,从那里开始,每五天一次 . 剩下几天,所以我们在最后一段时间结束时加上 nrow(temp.x) .

    ep = c(0, seq(1, nrow(temp.x), by = 5), nrow(temp.x))
    period.apply(temp.x, INDEX = ep, FUN = colMeans)
    #                 V2      V3
    # 2012-05-25 212.750 465.500
    # 2012-06-01 209.900 467.550
    # 2012-06-08 208.000 464.600
    # 2012-06-15 205.350 464.550
    # 2012-06-22 210.250 470.200
    # 2012-06-29 227.000 487.750
    # 2012-07-06 238.500 500.950
    # 2012-07-13 250.650 515.400
    # 2012-07-20 265.500 522.050
    # 2012-07-24 257.625 508.625
    

相关问题