首页 文章

如果x的时间戳在y的时间间隔内,则合并两个数据帧

提问于
浏览
0

我无法解决这个问题 . 我有两个数据帧 . DF1和DF2 . 如果DF1中的时间戳在DF2中指定的时间间隔内,我想将DF2的列合并到DF1 . 以下是两个数据帧的示例:

DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame"))

DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame")

head(DF1)
head(DF2)

我想将DF1与DF2合并 . 因此,如果存在匹配(DF1的时间戳在任何DF2的时间间隔内),则应将DF2(Wind_Speed,Wind_Direction,Temperature)的值添加到DF1 .

我面临的两个问题:

  • 如何进行匹配/合并?我的数据帧非常大(DF1和DF2中有7000行)

  • 如果有多个匹配项,如何确保DF1行重复?

我期待着你的帮助!谢谢

3 回答

  • 0

    你可以使用sqldf:

    library(sqldf)
    df<-sqldf('select d1.*,d2.*
               from DF1 d1
               left join DF2 d2
                 on d1.Timestamp >= d2.onebef
                   AND d1.Timestamp <= d2.oneaft
              ')
    df
    
  • 2

    您可以使用 merge()all = TRUE 选项将 DF1 的所有行与 DF2 中的所有行组合在一起 . 然后你可以检查你的情况:

    x <- merge(DF1, DF2, all = TRUE)
    
    x[x$Timestamp >= x$onebef & x$Timestamp <= x$oneaft,]
    
      Airspeed Outbound.Track Rem.Ground.Dist           Timestamp Temperature Wind_Direction Wind_Speed              onebef
    1      582            119             369 2016-01-01 09:26:57       -18.5            324        032 2016-01-01 07:27:00
    4      582            119             369 2016-01-01 09:26:57       -60.0            335        041 2016-01-01 08:51:00
    5      478             78             119 2016-01-01 09:50:03       -60.0            335        041 2016-01-01 08:51:00
    6      524            134             196 2016-01-01 09:42:33       -60.0            335        041 2016-01-01 08:51:00
    8      478             78             119 2016-01-01 09:50:03       -35.0            313        056 2016-01-01 09:30:00
    9      524            134             196 2016-01-01 09:42:33       -35.0            313        056 2016-01-01 09:30:00
               oneaft
    1 2016-01-01 09:27:00
    4 2016-01-01 10:51:00
    5 2016-01-01 10:51:00
    6 2016-01-01 10:51:00
    8 2016-01-01 11:30:00
    9 2016-01-01 11:30:00
    
  • 1

    这将适用于您的示例,但您可能会对真实数据集感到困难,因为它会在保留相关行之前创建一个非常大的数据集(将DF1的每一行与DF2相结合) .

    试一试,看看它是如何工作的 .

    library(dplyr)
    
    DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame"))
    
    DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame")
    
    
    merge(DF1, DF2) %>%                                  # combine every row of DF1 with DF2
      filter(onebef <= Timestamp & Timestamp <= oneaft)  # keep rows where Timestampe is between the interval
    
    
    #   Airspeed Outbound.Track Rem.Ground.Dist           Timestamp Temperature Wind_Direction Wind_Speed              onebef              oneaft
    # 1      582            119             369 2016-01-01 08:26:57       -18.5            324        032 2016-01-01 06:27:00 2016-01-01 08:27:00
    # 2      582            119             369 2016-01-01 08:26:57       -60.0            335        041 2016-01-01 07:51:00 2016-01-01 09:51:00
    # 3      478             78             119 2016-01-01 08:50:03       -60.0            335        041 2016-01-01 07:51:00 2016-01-01 09:51:00
    # 4      524            134             196 2016-01-01 08:42:33       -60.0            335        041 2016-01-01 07:51:00 2016-01-01 09:51:00
    # 5      478             78             119 2016-01-01 08:50:03       -35.0            313        056 2016-01-01 08:30:00 2016-01-01 10:30:00
    # 6      524            134             196 2016-01-01 08:42:33       -35.0            313        056 2016-01-01 08:30:00 2016-01-01 10:30:00
    

相关问题