首页 文章

计算外部数据帧中的行数

提问于
浏览
1

请考虑以下问题:我有两个数据帧 casesevents . 对于每个 case ,可以有几个 events (1:n) . events $ caseId(外键)指的是$ id(主键)的情况:

cases       events
------      ------
id          id
date        caseId
var1        date
var2        var1
...         var2
            ...

现在,我想在 cases 中创建一个新列,给出每个案例的事件数量(即事件$ caseId等于案例$ id)

# Sample data            
cases<-data.frame(id=1:5, date=c("2017-01-02","2016-02-03","2015-02-12","2016-01-03","2016-08-09"), var1=sample(c("A", "B"), 5, replace=T))
events<-data.frame(id=1:10, date=c("2017-01-01","2016-12-12","2016-07-04","2017-04-03","2015-02-13","2015-01-01","2013-05-07","2015-12-25","2016-05-04","2016-10-11"), caseId=c(1,1,1,1,3,3,3,4,5,5))

# Calculate the number of events for every caseId
library(tidyverse)
events %>%  
  count(caseId) %>%
  right_join(cases, by = c("caseId" = "id"))

我遇到的问题是我想要计算 only those events that happened after the date specified in the case-dataframe (i.e. events$date > cases$date) .

我将不胜感激任何帮助 .

1 回答

  • 1

    您可以使用data.table包进行非equi连接(首先需要将 date -columns转换为日期格式,请参阅下面的 Data -heading):

    library(data.table)
    setDT(cases)
    setDT(events)
    
    cases[events, on = .(id = caseId, date < date), events := .N, by = .EACHI][]
    

    结果:

    id       date var1 events
    1:  1 2017-01-02    B      1
    2:  2 2016-02-03    A     NA
    3:  3 2015-02-12    A      1
    4:  4 2016-01-03    B     NA
    5:  5 2016-08-09    B      1
    

    一个变种:

    cases[, events := events[cases, on = .(caseId = id, date > date), .N, by = .EACHI]$N][]
    

    结果是:

    id       date var1 events
    1:  1 2017-01-02    A      1
    2:  2 2016-02-03    A      0
    3:  3 2015-02-12    A      1
    4:  4 2016-01-03    A      0
    5:  5 2016-08-09    A      1
    

    数据:

    cases <- data.frame(id=1:5, date=c("2017-01-02","2016-02-03","2015-02-12","2016-01-03","2016-08-09"),
                        var1=sample(c("A", "B"), 5, replace=T))
    events <- data.frame(id=1:10,
                         date=c("2017-01-01","2016-12-12","2016-07-04","2017-04-03","2015-02-13","2015-01-01","2013-05-07","2015-12-25","2016-05-04","2016-10-11"),
                         caseId=c(1,1,1,1,3,3,3,4,5,5))
    cases$date <- as.Date(cases$date)
    events$date <- as.Date(events$date)
    

相关问题