首页 文章

r dplyr过滤掉时间序列

提问于
浏览
3

我有一些数据可以看到一群人和他们随着时间的推移吃的水果 . 我想用dplyr来看每个人,直到他们吃香蕉并总结他们吃的所有水果,直到他们吃了他们的第一个香蕉 .

数据:

data <-  structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
    1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L, 
    9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L, 
    4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L, 
    5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L, 
    4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon", 
    "lime", "orange", "pear"), class = "factor"), time = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana", 
    "other"), class = "factor")), .Names = c("user", "site", "time", 
    "int"), row.names = c(NA, -27L), class = "data.frame")

我最初的想法是将数据分组以找到每个用户吃香蕉的第一个实例:

data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))

data_ban <- data %>% 
    filter(var=='banana') %>% 
    group_by(user, var, time) %>%
    group_by(user) %>%
    summarise(first_banana = min(time))

但现在我仍然坚持如何将其实际应用于原始的“数据”数据框,并设置一个过滤器,其中说明:对于每个用户,只包括直到“data_ban”中给出的时间的数据 . 有任何想法吗?

2 回答

  • 4

    这样的事情:按 user 分组并过滤 time 低于他们第一次吃香蕉 .

    > data %>% group_by(user) %>% filter( time <= which(site=="banana")[1] )
    Source: local data frame [17 x 4]
    Groups: user
    
       user   site time    int
    1  1234  apple    1  other
    2  1234   pear    2  other
    3  1234  apple    3  other
    4  1234  apple    4  other
    5  1234   pear    5  other
    6  1234 orange    6  other
    7  1234 orange    7  other
    8  1234  lemon    8  other
    9  1234   lime    9  other
    10 1234  apple   10  other
    11 1234 banana   11 banana
    12 9584  apple    1  other
    13 9584   pear    2  other
    14 9584 orange    3  other
    15 9584 orange    4  other
    16 9584  lemon    5  other
    17 9584 banana    6 banana
    

    否则也许是 anti_join .

  • 2

    你可以试试 slice

    data %>%
         group_by(user) %>% 
         slice(1:(which(int=='banana')[1L]))
    

相关问题