首页 文章

根据其他数据框替换特定值

提问于
浏览
5

首先,让我们从DataFrame 1(DF1)开始:

DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016", 
                    "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                    "06/22/2016", "06/23/2016"),
                  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                  c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
                  c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                  c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                    "NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")

我也有DataFrame 2(DF2):

DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
                  c(1, 1, 2, 2),
                  c(9999, 8888, 777, 555),
                  c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")

对于DF1中的每一行,我必须查看DF2中是否有一行具有相同的日期和ID . 如果是,我必须用DF2中的值替换DF1中的值 .

DF2的列总是比DF1少 . 如果列不在DF2中,我必须保留该特定列的DF1中的原始值 .

最终的输出是这样的:

results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
                        "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                        "06/22/2016", "06/23/2016"),
                      c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                      c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
                      c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                      c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                        "NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")

你有什么建议吗?

3 回答

  • 9

    您可以使用 data.table 包的连接功能:

    library(data.table)
    setDT(DF1)
    setDT(DF2)
    
    DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
    

    这使:

    > DF1
              date id sales cost city
     1: 06/19/2016  1  9999  101  LON
     2: 06/20/2016  1   150  102  MTL
     3: 06/21/2016  1   151  104  MTL
     4: 06/22/2016  1   152  107  MTL
     5: 06/23/2016  1   155   99  MTL
     6: 06/19/2016  2    84   55   NY
     7: 06/20/2016  2    83   55   NY
     8: 06/21/2016  2    80   56   NY
     9: 06/22/2016  2   777   57   QC
    10: 06/23/2016  2   555   58   QC
    

    如果两个数据集中都有多列,则更容易使用 mget 而不是键入所有列名称 . 对于问题中使用的数据,它看起来像:

    DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
    
  • 2
    df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)
    
    tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
    tmp1$sales.y <- NULL
    tmp1$city.y <- NULL
    names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
    names(tmp1)[names(tmp1) == "city.x"] <- "city"
    
    tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
    tmp2$sales.x <- NULL
    tmp2$city.x <- NULL
    names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
    names(tmp2)[names(tmp2) == "city.y"] <- "city"
    
    results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)
    

    See the result

  • 1
    df <- merge(DF1, DF2, by = c("date", "id"))
    df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
    

    用您的变量替换 column .

相关问题