首页 文章

根据两个数据帧中两个现有列之间的匹配来添加和填充新列[重复]

提问于
浏览
2

这个问题在这里已有答案:

我有两个数据帧(df1和df2),其示例如下所示:

df1 <- data.frame(StationID = c(1,1,1,2,2,3,3,3,3,3),
              Cameras       = c("Cam1","Cam2","Cam2","Cam1","Cam1","Cam2","Cam1","Cam2","Cam1","Cam1"),
              Start         = c("2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23"),
              End           = c("2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25"))


df2 <- data.frame(StationID = c(1,1,2,2,3,3),
                  Cameras   = c("Cam1","Cam2","Cam1","Cam2","Cam1","Cam2"))

我想生成一个新的数据帧(df3),它在两列(StationID和Cameras)之间寻找匹配,然后将“Start”和“End”日期列附加到相应的匹配项 . 代码需要根据数据动态添加新列,因为某些实例没有匹配项,而其他实例将有许多匹配项 .

示例输出如下:

StationID Cameras     Start1       End1     Start2       End2     Start3       End3
1         1    Cam1 2013-04-23 2013-04-25       <NA>       <NA>       <NA>       <NA>
2         1    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
3         2    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
4         2    Cam2       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
5         3    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25 2013-04-23 2013-04-25
6         3    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>

我很感激这项任务的任何帮助 .

提前致谢!

2 回答

  • 0

    我们加入两个数据集 on 'StationID'和'Cameras'并使用 data.table 中的 dcast ,它可以将多个 value.var 列重新整形为'wide'格式 .

    library(data.table)#1.9.7+
     dcast(setDT(df1)[df2, on = c("StationID", "Cameras")], 
         StationID + Cameras ~rowid(StationID, Cameras), value.var = c("Start", "End"))
     # StationID Cameras    Start_1    Start_2    Start_3      End_1      End_2      End_3
     #1:         1    Cam1 2013-04-23         NA         NA 2013-04-25         NA         NA
     #2:         1    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
     #3:         2    Cam1 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
     #4:         2    Cam2         NA         NA         NA         NA         NA         NA
     #5:         3    Cam1 2013-04-23 2013-04-23 2013-04-23 2013-04-25 2013-04-25 2013-04-25
     #6:         3    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
    

    注意: rowid 来自data.table_1.9.7 . 它可以从here安装 . 如果我们有1.9.6或更早版本,请创建rowid

    dN <- setDT(df1)[df2, on = c("StationID", "Cameras")
                         ][, rid := 1:.N, .(StationID, Cameras)]
    

    然后做 dcast

    dcast(dN, StationID + Cameras ~rid, value.var = c("Start", "End"))
    
  • 2

    也许这很有帮助

    library(dplyr)
    library(tidyr)
    full_join(df1,df2) %>% group_by(StationID,Cameras) %>% summarise_each(funs(toString)) %>% separate(col = Start,into = paste("Start",1:3,sep=""),sep=", ",extra="merge") %>% separate(col = End,into = paste("End",1:3,sep=""),sep=", ",extra="merge")
    

相关问题