首页 文章

使用dplyr过滤R中时间间隔内的事件日志

提问于
浏览
0

我有一个事件日志,格式如下 .

Original format
我使用dplyr通过DATE和ID创建了组,因此日期或ID的更改将被视为不同的组 .

我希望只有> = 5秒的时间间隔的事件,并删除其余的事件 . Desired output

我已经使用dplyr和时间延迟来实现这一点,因为我无法为此动态分配滞后间隔 . 但是我当前的代码会检查一个延迟间隔,最后我删除的行数超过了预期 . Current output - all rows in yellow are removed . 理想情况下,我希望保留组2中的"13:10:22","13:10:24",因为从"13:10:17"到这些时间的时滞是5秒甚至更长 .

我正在使用“chron”来处理时代 . 我理解时滞逻辑在mycase中不起作用 . 除了使用昂贵的for / if循环之外,还有更好的选择 .

我用的代码

data$Date <- as.Date(data$Date,format = "%m/%d/%Y")  
data$Time <- chron(times = data$Time)  

data <- data  %>% arrange(Date,Time,ID)    
data$Group <- data %>%  group_by(Date,ID) %>% group_indices    
data <- data %>%     
        group_by(Group)  %>%       
        mutate(time.difference = Time - lag(Time)) %>%    
        filter(time.difference >= 0.00005787 | is.na(time.difference))

Dput of the data

结构(列表(日期=结构(c)(17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17470,17470,17470,17470), class = “日期”),时间=结构(C(0.936400462962963,0.9425,0.9425,0.942511574074074,0.942523148148148,0.9703125,0.548518518518519,0.548530092592593,0.54880787037037,0.54880787037037,0.548819444444444,0.548842592592593,0.548865740740741,0.548888888888889,0.557337962962963,0.6140625,0.618761574074074,0.618958333333333,0.622303240740741) ,format =“h:m:s”,class =“times”),ID = c(“P1”,“P1”,“P1”,“P1”,“P1”,“P1”,“P5”, “P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P9”,“P9”,“P9”,“P9”)), .Names = c(“日期”,“时间”,“ID”),row.names = c(NA,-19L),class =“data.frame”)

2 回答

  • 1
    library(dplyr)
    data %>%
      group_by(Group) %>%
      arrange(Group, Date, Time) %>% 
      filter((Time - lag(Time)) >= 5.787037e-05 | row_number() == 1L)
    
  • 1
    data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S")  
    data$group <-  data %>% group_by(ID,by5sec=cut(datetime, breaks="5 sec")) %>%  group_indices
    data_filter <- data %>% group_by(group) %>% filter(row_number()==1)
    

    我这样做了两步,因为我希望将组索引的中间结果写入CSV .

相关问题