请考虑以下问题:我有两个数据帧 cases
和 events
. 对于每个 case
,可以有几个 events
(1:n) . events $ caseId(外键)指的是$ id(主键)的情况:
cases events
------ ------
id id
date caseId
var1 date
var2 var1
... var2
...
现在,我想在 cases
中创建一个新列,给出每个案例的事件数量(即事件$ caseId等于案例$ id)
# Sample data
cases<-data.frame(id=1:5, date=c("2017-01-02","2016-02-03","2015-02-12","2016-01-03","2016-08-09"), var1=sample(c("A", "B"), 5, replace=T))
events<-data.frame(id=1:10, date=c("2017-01-01","2016-12-12","2016-07-04","2017-04-03","2015-02-13","2015-01-01","2013-05-07","2015-12-25","2016-05-04","2016-10-11"), caseId=c(1,1,1,1,3,3,3,4,5,5))
# Calculate the number of events for every caseId
library(tidyverse)
events %>%
count(caseId) %>%
right_join(cases, by = c("caseId" = "id"))
我遇到的问题是我想要计算 only those events that happened after the date specified in the case-dataframe (i.e. events$date > cases$date) .
我将不胜感激任何帮助 .
1 回答
您可以使用data.table包进行非equi连接(首先需要将
date
-columns转换为日期格式,请参阅下面的 Data -heading):结果:
一个变种:
结果是:
数据: