首页 文章

从R Dataframe中的JSON中提取特定日期的值

提问于
浏览
2

我有一个下面提到的数据帧:

ID      Rank    Name       Json_Data
IR-122  RE      AFG        {as below sample}
IR-122  UI      SSw        {as below sample}
IR-123  RF      HEr        {as below sample}
IR-123  RO      djf        {as below sample}
IR-124  RE      der        {as below sample}
IR-125  RF      fet        {as below sample}

示例Json数据:

{"Jan-2018":{"10":50000.0,"11":50000.0,"12":15202.0,"13":10089.0,"14":10089.0,"15":9589.0,"16":9589.0,"17":18941.0,"18":15246.75,"19":5053.75,"20":44092.75,"21":36630.75,"22":9334.75,"23":5254.75,"24":4357.25,"25":3357.25,"26":44626.25,"27":49292.25,"28":48292.25,"29":43371.8,"30":38675.8,"31":37988.12},"Mar-2018":{"1":30799.02,"2":20775.42,"3":20657.42,"4":20657.42,"5":12657.42,"6":11110.22,"7":11110.22,"8":11110.22,"9":11111.22,"10":30272.22,"11":30272.22,"12":25316.22,"13":25316.22,"14":25316.22,"15":25316.22,"16":25316.22,"17":25316.22,"18":25316.22,"19":25316.22,"20":25316.22,"21":15316.22,"22":15316.22,"23":15316.22,"24":15316.22,"25":15204.12,"26":14791.12,"27":14791.12,"28":14791.12,"29":14791.12,"30":14791.12,"31":14791.12},"Feb-2018":{"1":36749.12,"2":36483.37,"3":35254.87,"4":27254.87,"5":15880.87,"6":14173.87,"7":7934.87,"8":7091.87,"9":5797.87,"10":5797.87,"11":5797.87,"12":283841.87,"13":283418.87,"14":283418.87,"15":253426.37,"16":242226.37,"17":227226.37,"18":197226.37,"19":147226.37,"20":111799.02,"21":111799.02,"22":66799.02,"23":64799.02,"24":64799.02,"25":63799.02,"26":53799.02,"27":36799.02,"28":36799.02},"Apr-2018":{"1":14791.12,"2":14791.12,"3":14791.12,"4":14791.12,"5":10791.12,"6":10791.12,"7":10791.12,"8":10791.12,"9":10755.72,"10":5799.72,"11":5799.72,"12":5799.72,"13":5799.72,"14":5799.72,"15":5799.72,"16":5799.72,"17":5799.72,"18":5799.72,"19":5728.92,"26":728.92,"27":728.92,"28":728.92,"29":728.92,"30":728.92}}

我的模式json通过使用我想要提取的上述日期帧来运行月份并研究该月的每个日期:

  • 一列中的最大月份(MMM-YY)格式

  • 第1,第7,第14,第21和第28日期的值

下面提到的示例输出中的 A_1 值将是 Apr-2018 的值: 14791.12

A_2 的值将是2018年4月7日: 10791.12 依此类推 .

我需要从最长月份开始的4个月(不包括最长月份) .

A_1 以下是大多数月份的实际上个月,而 A_2A_1 的前一个月精确,依此类推,而我仅从 A_1A_28 供应相同的列将重复 B_2C_3C_4 月 .

在A_1中,分析将是A_1个月的第一个日期,A_7读数为A_1个月的第7天,其他三个月也是如此 . 值必须是group_by IDRank .

在我的样本Json数据中,只有4个月,最大月份是2018年4月,因此在这种情况下A_1将是2018年3月,B_2将是2018年2月,C_3将是2018年1月,D_4将是12月 - 2017年(其中D_1,D_7_D_14,D_21和D_28将为 NA ) .

在下面的模式中,我预计最大月份是5月18日 .

样本输出:

ID     Rank     Name    Max_Month  A_1      B_2      C_3      D_4      A_1       A_7       A_14       A_21      A_28     B_1      B_7     B_14      B_21   B_28        C_1   C_7   C_14   C_21     C_28      D_1   D_7   D_14   D_21   D_28
    IR-122 RE       AFG     Apr-2018   Mar-2018 Feb-2018 Jan-2018 Dec-2017 30799.02  11110.22  25316.22   16316.22  14791.12 36749.12 7934.87 283418.87 111799.02 36799.02  NA   NA    10089  36630.75 48292.25   NA   NA     NA      NA   NA

样本输入数据帧:

structure(list(ID = "IR-122", Rank = "RE", Name = "AFG", Json_Data = "{\"Jan-2018\":{\"10\":50000.0,\"11\":50000.0,\"12\":15202.0,\"13\":10089.0,\"14\":10089.0,\"15\":9589.0,\"16\":9589.0,\"17\":18941.0,\"18\":15246.75,\"19\":5053.75,\"20\":44092.75,\"21\":36630.75,\"22\":9334.75,\"23\":5254.75,\"24\":4357.25,\"25\":3357.25,\"26\":44626.25,\"27\":49292.25,\"28\":48292.25,\"29\":43371.8,\"30\":38675.8,\"31\":37988.12},\"Mar-2018\":{\"1\":30799.02,\"2\":20775.42,\"3\":20657.42,\"4\":20657.42,\"5\":12657.42,\"6\":11110.22,\"7\":11110.22,\"8\":11110.22,\"9\":11111.22,\"10\":30272.22,\"11\":30272.22,\"12\":25316.22,\"13\":25316.22,\"14\":25316.22,\"15\":25316.22,\"16\":25316.22,\"17\":25316.22,\"18\":25316.22,\"19\":25316.22,\"20\":25316.22,\"21\":15316.22,\"22\":15316.22,\"23\":15316.22,\"24\":15316.22,\"25\":15204.12,\"26\":14791.12,\"27\":14791.12,\"28\":14791.12,\"29\":14791.12,\"30\":14791.12,\"31\":14791.12},\"Feb-2018\":{\"1\":36749.12,\"2\":36483.37,\"3\":35254.87,\"4\":27254.87,\"5\":15880.87,\"6\":14173.87,\"7\":7934.87,\"8\":7091.87,\"9\":5797.87,\"10\":5797.87,\"11\":5797.87,\"12\":283841.87,\"13\":283418.87,\"14\":283418.87,\"15\":253426.37,\"16\":242226.37,\"17\":227226.37,\"18\":197226.37,\"19\":147226.37,\"20\":111799.02,\"21\":111799.02,\"22\":66799.02,\"23\":64799.02,\"24\":64799.02,\"25\":63799.02,\"26\":53799.02,\"27\":36799.02,\"28\":36799.02},\"Apr-2018\":{\"1\":14791.12,\"2\":14791.12,\"3\":14791.12,\"4\":14791.12,\"5\":10791.12,\"6\":10791.12,\"7\":10791.12,\"8\":10791.12,\"9\":10755.72,\"10\":5799.72,\"11\":5799.72,\"12\":5799.72,\"13\":5799.72,\"14\":5799.72,\"15\":5799.72,\"16\":5799.72,\"17\":5799.72,\"18\":5799.72,\"19\":5728.92,\"26\":728.92,\"27\":728.92,\"28\":728.92,\"29\":728.92,\"30\":728.92}}"), class = "data.frame", row.names = c(NA, 
-1L))

1 回答

  • 3
    json_to_df <- function(data){
      json_as_list <- jsonlite::fromJSON(data)
      months       <- names(json_as_list)
      last4months  <- tail(months[order(lubridate::myd(paste0(months,"-01")))],4)
      max_month    <- tail(last4months,1)
      other_months <- head(last4months,-1)
      other_months_suffixes <- paste0(LETTERS[seq_along(other_months)],"_")
      last_month   <- tail(other_months,1)
      days         <- c('1','7','14','21','28')
      get_month_list <- function(x) json_as_list[[x]][days]
      list_subset <- Map(function(x,y) setNames(get_month_list(x),paste0(y,days)),
                         rev(other_months), other_months_suffixes)
      list_subset <- unlist(list_subset, recursive = FALSE)
      names(list_subset) <- gsub("^.*?\\.","",names(list_subset))
      list_subset <- map_if(list_subset, is.null,~NA)
    
      only_nas <- setNames(replicate(20,NA,F),paste(sep="_",rep(LETTERS[1:4],each=5),rep(days,4)))
      missing <- names(only_nas)[! names(only_nas) %in% names(list_subset)]
      list_subset <- c(list_subset, only_nas[missing])
    
    
      list_months <- setNames(as.list(other_months),paste0(other_months_suffixes,0))
      only_nas2 <- setNames(replicate(4,NA,F),paste(sep="_",LETTERS[1:4],0))
      missing2 <- names(only_nas2)[! names(only_nas2) %in% names(list_months)]
      list_months <- c(list_months, only_nas2[missing2])
    
      output_list <- c(
        Max_Month = max_month,
        list_months,
        list_subset)
      data.frame(output_list)
    }
    
    library(jsonlite)
    library(lubridate)
    library(tidyverse)
    df %>%
      mutate(Json_Data = map(Json_Data,json_to_df)) %>%
      unnest
    #       ID Rank Name Max_Month      A_0      B_0      C_0 D_0      A_1      A_7     A_14     A_21     A_28      B_1
    # 1 IR-122   RE  AFG  Apr-2018 Jan-2018 Feb-2018 Mar-2018  NA 30799.02 11110.22 25316.22 15316.22 14791.12 36749.12
    #       B_7     B_14   B_21     B_28 C_1 C_7  C_14     C_21     C_28 D_1 D_7 D_14 D_21 D_28
    # 1 7934.87 283418.9 111799 36799.02  NA  NA 10089 36630.75 48292.25  NA  NA   NA   NA   NA
    

相关问题