我有一个数据帧:
df <- data.frame(
Group=c('A','A','A','A','B','B','B','B'),
Activity = c('EOSP','NOR','EOSP','COSP','NOR','EOSP','WL','NOR'),
TimeLine=c(1,2,3,4,1,2,3,4)
)
我想过滤******每个组的两个活动以及我过滤的顺序。例如,我只是在寻找活动EOSP
和NOR
,但也在顺序中。这段代码:
df %>% group_by(Group) %>%
filter(all(c('EOSP','NOR') %in% Activity) & Activity %in% c('EOSP','NOR'))
结果是:
# A tibble: 6 x 3
# Groups: Group [2]
Group Activity TimeLine
<fct> <fct> <dbl>
1 A EOSP 1
2 A NOR 2
3 A EOSP 3
4 B NOR 1
5 B EOSP 2
6 B NOR 4
我不希望在NOR
之后发生第 3 行EOSP
。同样对于 B 组,我不想要第 4 行,因为NOR
发生在EOSP
之前。我该如何实现这一目标?
3 回答
您可以使用
match
获取Activity == EOSP
的第一个实例,并使用slice
删除之前的所有内容。一旦你这样做,那么你可以删除重复项并过滤EOSP
和NOR
,i.e。这使,
#A tibble:4 x 4
团体:小组[10]
Group Activity TimeLine new
<fct> <fct> <dbl> <int>
1 A EOSP 1 1
2 A NOR 2 1
3 B EOSP 2 2
4 B NOR 4 2
here is an option with data.table package: you join
df
with itself, subsetted it to keep onlyEOSP
Activity
and computing the min ofTimeLine
by group, then you can keep only the rows withTimeLine
greater or equal to thisTimeLine
, in order to be sure you keepNOR
only if there isEOSP
before. Then you drop duplicated Group and Activity if you want to only keep 2 activities per group:这是一个
dplyr
想法: