首页 文章

根据每个案例的开始和结束日期,计算每周存在多少案例[关闭]

提问于
浏览
2

我是新来的,所以如果我错过任何约定,我会道歉 .

我有一个~2000行数据集,其中包含三年内发生的独特案例数据 . 每个案例都有一个开始日期和结束日期 . 我希望能够获得一个新的数据框,显示在这三年期间每周发生的案例数量 .

我拥有的数据集的结构如下:

ID  Start_Date  End_Date      
1   2015-01-04  2017-11-02    
2   2015-01-05  2015-10-26    
3   2015-01-07  2015-03-04     
4   2015-01-12  2016-05-17  
5   2015-01-15  2015-04-08
6   2015-01-21  2016-07-31 
7   2015-01-21  2015-07-16
8   2015-01-22  2015-03-03

`

2 回答

  • 0

    使用 sqldf 包可以更轻松地解决这个问题,但我认为坚持 dplyr 包 .

    该方法:

    library(dplyr)
    library(lubridate)
    
    # First create a data frame having all weeks from chosen start date to end date. 
    # 2015-01-01 to 2017-12-31
    df_week <- data.frame(weekStart = seq(floor_date(as.Date("2015-01-01"), "week"),
                                         as.Date("2017-12-31"), by = 7))
    df_week <- df_week %>% 
      mutate(weekEnd = weekStart + 7,
                       weekNum = as.character(weekStart, "%V-%Y"), 
                       dummy = TRUE)
    # The dummy column is only for joining purpose. 
    # Header looks like
    #> head(df_week)
    #   weekStart    weekEnd weekNum dummy
    #1 2014-12-28 2015-01-04 52-2014  TRUE
    #2 2015-01-04 2015-01-11 01-2015  TRUE
    #3 2015-01-11 2015-01-18 02-2015  TRUE
    #4 2015-01-18 2015-01-25 03-2015  TRUE
    #5 2015-01-25 2015-02-01 04-2015  TRUE
    #6 2015-02-01 2015-02-08 05-2015  TRUE
    
    # Prepare the data as mentioned in OP
    df <- read.table(text = "ID  Start_Date  End_Date      
    1   2015-01-04  2017-11-02    
    2   2015-01-05  2015-10-26    
    3   2015-01-07  2015-03-04     
    4   2015-01-12  2016-05-17  
    5   2015-01-15  2015-04-08
    6   2015-01-21  2016-07-31 
    7   2015-01-21  2015-07-16
    8   2015-01-22  2015-03-03", header = TRUE, stringsAsFactors = FALSE)
    
    df$Start_Date <- as.Date(df$Start_Date)
    df$End_Date <- as.Date(df$End_Date)
    df <- df %>% mutate(dummy = TRUE)     # just for joining
    
    # Use dplyr to join, filter and then group on week to find number of cases
    # in each week
    df_week %>%
      left_join(df, by = "dummy") %>%
      select(-dummy) %>%
      filter((weekStart >= Start_Date & weekStart <= End_Date) | 
               (weekEnd >= Start_Date & weekEnd <= End_Date)) %>%
      group_by(weekStart, weekEnd, weekNum) %>%
      summarise(cases = n())
    
    # Result
    #   weekStart  weekEnd    weekNum cases
    #   <date>     <date>     <chr>   <int>
    # 1 2014-12-28 2015-01-04 52-2014     1
    # 2 2015-01-04 2015-01-11 01-2015     3
    # 3 2015-01-11 2015-01-18 02-2015     5
    # 4 2015-01-18 2015-01-25 03-2015     8
    # 5 2015-01-25 2015-02-01 04-2015     8
    # 6 2015-02-01 2015-02-08 05-2015     8
    # 7 2015-02-08 2015-02-15 06-2015     8
    # 8 2015-02-15 2015-02-22 07-2015     8
    # 9 2015-02-22 2015-03-01 08-2015     8
    #10 2015-03-01 2015-03-08 09-2015     8
    # ... with 139 more rows
    
  • 0

    欢迎来到SO!

    在解决问题之前一定要安装一些软件包并运行

    install.packages(c("tidyr","dplyr","lubridate"))
    

    如果你还没有安装这些包 .

    接下来我将向您展示一个现代R解决方案,这些包是神奇的 .

    这是一种解决方法:

    library(readr)
    library(dplyr)
    library(lubridate)
    
    raw_data <- 'id start_date  end_date
    1   2015-01-04  2017-11-02
    2   2015-01-05  2015-10-26
    3   2015-01-07  2015-03-04
    4   2015-01-12  2016-05-17
    5   2015-01-15  2015-04-08
    6   2015-01-21  2016-07-31
    7   2015-01-21  2015-07-16
    8   2015-01-22  2015-03-03'
    
    curated_data <- read_delim(raw_data, delim = "\t") %>% 
      mutate(start_date = as.Date(start_date)) %>% # convert column 2 to date format assuming the date is yyyy-mm-dd
      mutate(weeks_lapse = as.integer((start_date - min(start_date))/dweeks(1))) # count how many weeks passed since the lowest date in the data
    
    curated_data %>% 
      group_by(weeks_lapse) %>%  # I group to count by week
      summarise(cases_per_week = n()) # now count by group by week
    

    解决方案是:

    # A tibble: 3 x 2
      weeks_lapse cases_per_week
            <int>          <int>
    1           0              3
    2           1              2
    3           2              3
    

相关问题