首页 文章

对日期匹配的列进行求和

提问于
浏览
1

我有一个数据框 df1 ,其中包含列ID和日期(作为字符串"XYYYYMMDD")

ID <- c(101,101,101,102,102)
date <- c("X20170101","X20170103","X20170106","X20170102","X20170104")
df1 <- data.table(ID,date)

ID      date
101 X20170101
101 X20170103
101 X20170106
102 X20170102
102 X20170104

和一个数据框 df2 ,列ID和许多日期作为 Headers

ID <- c(100,101,102,103,104)
X20170101 <- c(1,NA,NA,2,1)
X20170102 <- c(NA,1,1,1,NA)
X20170103<-c(NA,1,NA,2,1)
X20170104 <- c(2,3,NA,2,1)
X20170105 <- c(1,1,NA,1,NA)
X20170106<-c(NA,1,NA,2,1)         
df2=data.table(ID,X20170101,X20170102,X20170103,X20170104,X20170105,X20170106)

ID X20170101 X20170102 X20170103 X20170104 X20170105 X20170106
100         1        NA        NA         2         1        NA
101        NA         1         1         3         1         1
102        NA         1        NA        NA        NA        NA
103         2         1         2         2         1         2
104         1        NA         1         1        NA         1

我想将 count 列添加到 df1 ,其中包含以下内容:对于 df1 中的每个ID(例如101)和日期(例如X20170101),该日期之间 df2 中相应单元格的总和(例如X20170101)和连续的1 df1 减去一天(例如X20170102,而不是X20170103) . 也就是说,新的df1应如下所示:

ID      date  count
101 X20170101     1
101 X20170103     5
101 X20170106     1
102 X20170102     1
102 X20170104    NA

谢谢您的帮助 .

2 回答

  • 1

    你去!

    library(data.table)
    library(dplyr)
    library(tidyr)
    
    df2 %>% gather(date,val,-ID) %>%
      full_join(df1 %>% mutate(tag=1)) %>%
      arrange(ID) %>%
      replace_na(list(val=0,tag=0)) %>%
      group_by(ID) %>%
      mutate(grp=cumsum(tag)) %>%
      group_by(ID,grp) %>%
      summarize(count = sum(val),date=head(date,1)) %>%
      ungroup %>%
      mutate(count=ifelse(count== 0,NA,count)) %>%
      select(ID,date,count) %>%
      right_join(df1)
    
    # # A tibble: 5 x 3
    #      ID      date count
    #   <dbl>     <chr> <dbl>
    # 1   101 X20170101     1
    # 2   101 X20170103     5
    # 3   101 X20170106     1
    # 4   102 X20170102     1
    # 5   102 X20170104    NA
    
  • 1

    使用 dplyr 包:

    library(dplyr)
    count <- aggregate(df1$date), by = list(df1$date), FUN = length)
    df1 <- merge(df1, count, by = "date", all.x = TRUE)
    

    让我知道这个是否奏效!

相关问题