R-在两个数据帧中找到匹配列以进行t检验统计（R初学者）-Java 学习之路

我想对R中的数据执行双样本t检验 . 鉴于两个高维数据框，我需要以某种方式遍历所有行的匹配列（ Headers 中的String colnames（））并执行测试每列对 - 分别来自df1和df2 . 问题是数据帧中的列不是正确的顺序，即col1形式df1与df2中的col1不匹配，df2具有df1中不存在的其他列 . 我从来没有使用R来完成这些任务，我想知道是否有一个快速而方便的解决方案来在t测试的数据帧中找到匹配的列对 .

我考虑过for循环，但我认为这对于大型数据帧来说效率非常低 .

预先感谢您的任何帮助 .

*EDITED-------Two small example dataframes, df1 and df2--------------------------------

**** **** DF1

"Row\Column"    "A2"    "A1"    "A4"    "A3"
"id_1"           10      20      0       40
"id_2"           5       15      25      35
"id_3"           8       0       12      16
"id_4"           17      25      0       40

**** **** DF2

"Row\Column"    "A3"    "A8"    "A5"    "A6"    "A1"    "A7"    "A4"    "A2"
"id_1"           0       2       0       4       0       1       2       3
"id_2"           1       5       8       3       4       5       6       7
"id_3"           2       10      6       9       8       9       10      11
"id_4"           7       2       10      2       55      0       0       0
"id_5"           0       1       0       0       9       1       3       4
"id_6"           8       0       1       2       7       2       3       0

匹配列只是df1中的列名与df2中的列名匹配 . 例如，df1和df2中的两个匹配列是e . G . “A1”和“A1”，“A2”和“A2”......你明白了......

2 回答

没有可重复的例子，很难给你一个好的答案 . 您还需要定义 matching 列的含义 .

这里有2个data.frames的例子，它们有一些共同的列名 .

df1 <- matrix(sample(1:100,5*5,rep=TRUE),ncol=5,nrow=5)
df2 <- matrix(sample(1:100,5*8,rep=TRUE),ncol=8,nrow=5)
colnames(df1) <- letters[6:10]
colnames(df2) <- rev(letters[1:8])

然后我定义了 t.test 的包装器，以限制例如p值的输出和自由度 .

f <- function(x,y){
  test <- t.test(x,y)
  data.frame(df   = test$parameter,
                    pval = test$p.value)
}

然后使用 sapply 迭代我使用的常见列 intersect

sapply(intersect(colnames(df1),colnames(df2)), 
                 function(x) f(df1[,x], df2[,x]))

     f         g         h        
df   7.85416   6.800044  7.508915 
pval 0.5792354 0.2225824 0.4392895

回复于 2024-04-20T12:47:03+08:00

0
mapply is the function you are looking for.
如果你的 data.frame 的列匹配，你可以简单地使用
```
mapply(t.test, df1, df2)
```
但是，由于它们没有，您需要以某种方式确定 df1 的哪一列与 df2 的哪一列一致 . 幸运的是， R 中的索引选项很聪明，如果您输入列名称的向量（集合），您将按给定的顺序返回这些列 . 这让生活变得轻松 .
```
# find the matching names
## this will give you those names in df1 that are also in df2
## and *only* such names (ie, strict intersect)
matchingNames <- names(df1)[names(df1) %in% names(df2)]
```
请注意 matchingNames 有一些顺序现在看看当你使用 matchingNames 向量作为df1和df2的每列的索引时会发生什么（还要注意列顺序）
```
df1[, matchingNames]
df2[, matchingNames]
matchingNames
```
因此，我们现在有两个data.frames具有正确匹配的列，我们可以使用它们 mapply .
```
## mapply will apply a function to each data.frame, one pair of columns at a time

## The first argument to `mapply` is your function, in this example, `t.test`
## The second and third arguments are the data.frames (or lists) to simultaneously iterate over
mapply(t.test, df1[, matchingNames], df2[, matchingNames])
```
回复于 2024-04-20T12:47:03+08:00

R-在两个数据帧中找到匹配列以进行t检验统计（R初学者）

2 回答

相关问题