首页 文章

使用tidyverse根据来自另一个数据帧的分组值范围从数据框中提取分组值

提问于
浏览
3

我试图从数据帧(df1)中提取分组索引值,该数据帧表示分组时间范围(开始 - 结束),并且包含在另一个数据帧(df2)中给出的分组时间 . 我需要的输出是df3 .

df1<-data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.frame(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df3<-data.frame(time=c(11,17,24,5,5,22),index=c(2,7,8,9,1,12))

我发布的上一个相关问题是针对未分组数据的整齐管道解决方案:

library(tidyverse)
df1 %>% 
    select(from = start, to = end) %>% 
    pmap(seq) %>% 
    do.call(cbind, .) %>% 
    list(.) %>%
    mutate(df2, new = ., 
                ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
    select(-new)

是否可以通过df1和df2中的'group'列对其进行分组,以得到输出df3?

2 回答

  • 2

    使用 group_by ,我们可以 nest 然后进行连接

    library(tidyverse)
    df1 %>% 
      group_by(group) %>%
      nest(-group)  %>%
      mutate(new = map(data, ~.x %>% 
      select(from = start, to = end) %>%
      pmap(seq) %>% 
      do.call(cbind, .) %>% 
      list(.))) %>%
      right_join(df2) %>%
      mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
              ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
      select(time, ind)
    # A tibble: 6 x 2
    #   time   ind
    #  <dbl> <dbl>
    #1 11.0   2.00
    #2 17.0   7.00
    #3 24.0   8.00
    #4  5.00  9.00
    #5  5.00  1.00
    #6 22.0  12.0
    
  • 1

    这里有一些很好的data.table,

    df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
    df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
    
    
    df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]
    
    
       start index
    1:    11     2
    2:    17     7
    3:    24     8
    4:     5     9
    5:     5     1
    6:    22    12
    

    然后,您可以将开始列重命名为时间,我认为您得到了答案 .

相关问题