如何根据R [duplicate]中的表信息计算数据表或框架中的列-Java 学习之路

这个问题在这里已有答案：

Find all sequences with the same column value 9个答案

我正在尝试使用表中的值在R中的数据框或表中创建新列 . 例如，下面是一个表格：

RowID| Col1   | Col2 |
----------------------
1    | apple  | cow  |
2    | orange | dog  |
3    | apple  | cat  |
4    | cherry | fish |
5    | cherry | ant  |
6    | apple  | rat  |

我想在此表中添加一个额外的列 . 此列检查以查看col1中具有相同值的其他行，并打印出这些行的col3值的连接字符串 . IE：

RowID| Col1   | Col2 | newCol
------------------------------
1    | apple  | cow  | cat,rat   (Row 3 & 6 match col1 values)
2    | orange | dog  | na        (No rows match this col1 value)
3    | apple  | cat  | cow,rat   (Row 1 & 6 match col1 values)
4    | cherry | fish | ant       (Row 5 matches col1 values)
5    | cherry | ant  | fish      (Row 4 matches col1 values)
6    | apple  | rat  | cow,cat   (Row 1 & 3 match col1 values)

所以重申一下 . 我们检查第一列中其他行的值是否相同 . 一旦找到这些行，就可以获取它们的col2值，将它们连接在一起，并将其作为我们所比较的col1行的结果 .

在过去的几天里，我一直在努力想象它，根本不能 .

2 回答

dat$newCol <- Map(function(x, y) dat[dat$Col1 == y & dat$RowID != x, "Col2"],
                  dat$RowID, dat$Col1)

dat

#   RowID   Col1 Col2   newCol
# 1     1  apple  cow cat, rat
# 2     2 orange  dog         
# 3     3  apple  cat cow, rat
# 4     4 cherry fish      ant
# 5     5 cherry  ant     fish
# 6     6  apple  rat cow, cat

（其中 dat 是：

dat <- read.table(text =
"RowID| Col1   | Col2 |
1    | apple  | cow  |
2    | orange | dog  |
3    | apple  | cat  |
4    | cherry | fish |
5    | cherry | ant  |
6    | apple  | rat  |
", header = TRUE, stringsAsFactors = FALSE,
sep = "|", strip.white = TRUE)[, -4]

回复于 2024-05-19T17:55:21+08:00

此代码无需行ID即可运行 . 它使用 data.table 内部行编号 .I .

F <- c("x", "y", "x", "z", "x", "y")
V <- c("dog", "cat", "monkey", "cow", "cat", "lion")

library(data.table)

dt <- data.table(F, V)
print(dt)

   F      V
1: x    dog
2: y    cat
3: x monkey
4: z    cow
5: x    cat
6: y   lion

grasp <- function(id, data=data) {
    subset <- data[ ,.I==id]
    level <- data[subset,F]
    x <- data[!subset][F==level, V]
    return (paste(x, collapse=", "))
}

dt[ ,newCol:=lapply(.I, grasp, data=dt)]
print(dt)

   F      V      newCol
1: x    dog monkey, cat
2: y    cat        lion
3: x monkey    dog, cat
4: z    cow            
5: x    cat dog, monkey
6: y   lion         cat

回复于 2024-05-19T17:55:21+08:00

如何根据R [duplicate]中的表信息计算数据表或框架中的列

2 回答

相关问题