在我的代码中,我使用多个 left_join
将单独的数据帧合并到我在dplyr链中处理的数据帧 . 我将想要与另一个合并的数据框导入到列表中,然后直接在该列表上进行一些操作,并使用lapply来准备它们进行合并 .
截至目前,我已使用 list2env(list, envir = .GlobalEnv)
从列表中创建单独的数据帧,然后使用 left_join
分别合并每个数据帧,每个数据帧的唯一列,如下所示:
测试数据:
列表:
structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452,
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA,
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L,
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114,
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA,
-3L), class = "data.frame")), .Names = c("df2", "df3"))
要创建单独的数据框:
list2env(testlist, envir = .GlobalEnv)
数据框:
structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"),
y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x",
"y", "z"), row.names = c(NA, -3L), class = "data.frame")
加入:
library(dplyr)
test_df %>%
left_join(., df2, by = "x") %>%
left_join(., df3, by = "z")
(Note that my list has around eight data frames with 2 - 3 columns each. I included only two data frames in this list for simplicity)
所有数据框都有各自的"by"列 . What I want to know is if there is a simpler way to do this, f. ex by merging with the whole list directly, and automatically detecting which columns are similar and merging by them for each data frame instead of doing left_join eight separate times?
EDIT
我按照@akrun的建议尝试运行以下代码:
out <- test
for(i in seq_along(table_list)) {
nm1 <- intersect(names(out), names(table_list[[i]]))
out <- merge(out, table_list[[i]], by = nm1)
}
out
其中 test
是要合并的数据框, table_list
是数据框列表 . 这适用于这些小型测试数据帧,但似乎在数据帧中引入了各行的重复,从而产生更多行 .
更复杂的示例数据框:
structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1,
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x",
"y", "z"), row.names = c(NA, -8L), class = "data.frame")
2 回答
使用复杂的test_df,为什么不使用
purrr
中的reduce
和dplyr中的left_join?我在下面的代码中包含了消息和警告消息 .根据描述,似乎我们需要在每个
merge
之前检查相交的列名 .