首页 文章

通过各列将列表中的多个数据帧合并到另一个数据帧

提问于
浏览
0

在我的代码中,我使用多个 left_join 将单独的数据帧合并到我在dplyr链中处理的数据帧 . 我将想要与另一个合并的数据框导入到列表中,然后直接在该列表上进行一些操作,并使用lapply来准备它们进行合并 .

截至目前,我已使用 list2env(list, envir = .GlobalEnv) 从列表中创建单独的数据帧,然后使用 left_join 分别合并每个数据帧,每个数据帧的唯一列,如下所示:

测试数据:

列表:

structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452, 
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA, 
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L, 
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114, 
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA, 
-3L), class = "data.frame")), .Names = c("df2", "df3"))

要创建单独的数据框:

list2env(testlist, envir = .GlobalEnv)

数据框:

structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), 
    y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -3L), class = "data.frame")

加入:

library(dplyr)

test_df %>%
    left_join(., df2, by = "x") %>%
    left_join(., df3, by = "z")

(Note that my list has around eight data frames with 2 - 3 columns each. I included only two data frames in this list for simplicity)

所有数据框都有各自的"by"列 . What I want to know is if there is a simpler way to do this, f. ex by merging with the whole list directly, and automatically detecting which columns are similar and merging by them for each data frame instead of doing left_join eight separate times?

EDIT

我按照@akrun的建议尝试运行以下代码:

out <- test
for(i in seq_along(table_list)) {
  nm1 <- intersect(names(out), names(table_list[[i]]))
  out <- merge(out, table_list[[i]], by = nm1)
}
out

其中 test 是要合并的数据框, table_list 是数据框列表 . 这适用于这些小型测试数据帧,但似乎在数据帧中引入了各行的重复,从而产生更多行 .

更复杂的示例数据框:

structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1, 
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -8L), class = "data.frame")

2 回答

  • 1

    使用复杂的test_df,为什么不使用 purrr 中的 reduce 和dplyr中的left_join?我在下面的代码中包含了消息和警告消息 .

    library(dplyr)
    library(purrr)
    
    all_dfs <- reduce(my_list, left_join, .init = test_df)
    
    # (warning) messages from using left_join
    # Joining, by = "x"
    # Joining, by = "z"
    # Warning message:
    # Column `x` joining factors with different levels, coercing to character vector 
    
    all_dfs
    
      x y z           a          b
    1 A 1 K  0.05883502  0.6502641
    2 B 2 L -0.33154394  0.9761208
    3 C 3 M  1.03657230 -0.8970942
    4 D 4 K          NA  0.6502641
    5 A 1 L  0.05883502  0.9761208
    6 B 2 M -0.33154394 -0.8970942
    7 C 3 K  1.03657230  0.6502641
    8 D 4 L          NA  0.9761208
    
  • 1

    根据描述,似乎我们需要在每个 merge 之前检查相交的列名 .

    out <- test_df
    for(i in seq_along(testlist)) {
       nm1 <- intersect(names(test_df), names(testlist[[i]]))
       out <- merge(out, testlist[[i]], by = nm1, all.x = TRUE)
     }
    
    out
    #  z x y           a          b
    #1 K A 1  0.05883502  0.6502641
    #2 L B 2 -0.33154394  0.9761208
    #3 M C 3  1.03657230 -0.8970942
    

相关问题