首先,让我们从DataFrame 1(DF1)开始:
DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
我也有DataFrame 2(DF2):
DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
c(1, 1, 2, 2),
c(9999, 8888, 777, 555),
c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
对于DF1中的每一行,我必须查看DF2中是否有一行具有相同的日期和ID . 如果是,我必须用DF2中的值替换DF1中的值 .
DF2的列总是比DF1少 . 如果列不在DF2中,我必须保留该特定列的DF1中的原始值 .
最终的输出是这样的:
results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
你有什么建议吗?
3 回答
您可以使用
data.table
包的连接功能:这使:
如果两个数据集中都有多列,则更容易使用
mget
而不是键入所有列名称 . 对于问题中使用的数据,它看起来像:用您的变量替换
column
.