首页 文章

分组ID列表上的分配功能

提问于
浏览
2

我有一个数据框,其中包含id,开始日期,结束日期以及收入和成本值 .

table <- data.frame(id = c(1, 2, 3),
               start = c("2018-01-01", "2018-02-05", "2018-05-30"),
               end = c("2018-01-31", "2018-03-26", "2018-08-31"),
               income = c(100, 225, 399),
               costs = c(37, 98, 113))

table$start <- as.Date(table$start)
table$end <- as.Date(table$end)

结果如下:

id      start        end income costs
  1 2018-01-01 2018-01-31    100    37
  2 2018-02-05 2018-03-26    225    98
  3 2018-05-30 2018-08-31    399   113

就像这个question,这些时间段中的一些跨越n个月,我想按月汇总收入和成本 . 对于那些跨越两个月,三个月或更长时间的数量,我想在两个月,三个月或n个月之间线性分配它们 .

问题是我还想保留id,并对两个变量执行操作(不像前面提到的问题那样),这使整个事情变得复杂 .

我期待得到的是下表:

id   date      income      costs
  1   2018-01    100         37
  2   2018-02    108         47.04
  2   2018-03    117         50.96
  3   2018-05    8.489362    2.404255
  3   2018-06    127.340426  36.063830
  3   2018-07    131.585106  37.265957
  3   2018-08    131.585106  37.265957

我尝试在id创建的数据帧列表上使用rbindlist,以及以下函数:

explode <- function(start, end, income) {
              dates <- seq(start, end, "day")
              n <- length(dates)
              rowsum(rep(income, n) / n, format(dates, "%Y-%m"))                  
}

  Map(explode, table$start, table$end, table$income)

但当然它只返回内部和未命名列表中的rowsum值 .

任何帮助将非常感谢 . 谢谢!

2 回答

  • 1

    我去找 data.table

    library(data.table)
    
    table_aggregated <- setDT(table)[
      , .(id = id, income = income, costs = costs, day_var = seq(start, end, "day")), by = 1:nrow(table)][
        , `:=` (income_day = income / .N, 
                costs_day = costs / .N,
                date = format(day_var, "%Y-%m")), by = id][
                  , .(income = sum(income_day),
                      costs = sum(costs_day)), by = .(id, date)]
    

    输出:

    id    date     income     costs
    1:  1 2018-01 100.000000 37.000000
    2:  2 2018-02 108.000000 47.040000
    3:  2 2018-03 117.000000 50.960000
    4:  3 2018-05   8.489362  2.404255
    5:  3 2018-06 127.340426 36.063830
    6:  3 2018-07 131.585106 37.265957
    7:  3 2018-08 131.585106 37.265957
    
  • 1

    你的解决方案可行 . 只需将一个新参数添加到 Map 并使用 cbind 扩展您的函数以合并收入和成本,然后 rbindMap 生成的列表:

    explode <- function(start, end, income, costs) {
      dates <- seq(start, end, "day")
      n <- length(dates)
      cbind.data.frame(
        date = format(start, "%Y-%m"),
        income = rowsum(rep(income, n) / n, format(dates, "%Y-%m")),
        costs = rowsum(rep(costs, n) / n, format(dates, "%Y-%m")) 
      )
    }
    
    data_list <- Map(explode, table$start, table$end, table$income, table$costs)
    final_df <- do.call(rbind, data_list)
    
    final_df    
    #    date     income     costs
    # 2018-01 100.000000 37.000000
    # 2018-02 108.000000 47.040000
    # 2018-03 117.000000 50.960000
    # 2018-05   8.489362  2.404255
    # 2018-06 127.340426 36.063830
    # 2018-07 131.585106 37.265957
    # 2018-08 131.585106 37.265957
    

相关问题