我有一个事件日志,格式如下 .
Original format
我使用dplyr通过DATE和ID创建了组,因此日期或ID的更改将被视为不同的组 .
我希望只有> = 5秒的时间间隔的事件,并删除其余的事件 . Desired output
我已经使用dplyr和时间延迟来实现这一点,因为我无法为此动态分配滞后间隔 . 但是我当前的代码会检查一个延迟间隔,最后我删除的行数超过了预期 . Current output - all rows in yellow are removed . 理想情况下,我希望保留组2中的"13:10:22","13:10:24",因为从"13:10:17"到这些时间的时滞是5秒甚至更长 .
我正在使用“chron”来处理时代 . 我理解时滞逻辑在mycase中不起作用 . 除了使用昂贵的for / if循环之外,还有更好的选择 .
我用的代码
data$Date <- as.Date(data$Date,format = "%m/%d/%Y")
data$Time <- chron(times = data$Time)
data <- data %>% arrange(Date,Time,ID)
data$Group <- data %>% group_by(Date,ID) %>% group_indices
data <- data %>%
group_by(Group) %>%
mutate(time.difference = Time - lag(Time)) %>%
filter(time.difference >= 0.00005787 | is.na(time.difference))
Dput of the data
结构(列表(日期=结构(c)(17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17469,17470,17470,17470,17470), class = “日期”),时间=结构(C(0.936400462962963,0.9425,0.9425,0.942511574074074,0.942523148148148,0.9703125,0.548518518518519,0.548530092592593,0.54880787037037,0.54880787037037,0.548819444444444,0.548842592592593,0.548865740740741,0.548888888888889,0.557337962962963,0.6140625,0.618761574074074,0.618958333333333,0.622303240740741) ,format =“h:m:s”,class =“times”),ID = c(“P1”,“P1”,“P1”,“P1”,“P1”,“P1”,“P5”, “P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P5”,“P9”,“P9”,“P9”,“P9”)), .Names = c(“日期”,“时间”,“ID”),row.names = c(NA,-19L),class =“data.frame”)
2 回答
我这样做了两步,因为我希望将组索引的中间结果写入CSV .