首页 文章

如何将wilcox.test应用于R中的整个数据帧?

提问于
浏览
7

我有一个数据框,其中一个分组因子(第一列)具有多个级别(多于两个)和几个包含数据的列 . 我想将wilcox.test应用于整个日期框架,以将每个组变量与其他变量进行比较 . 我怎样才能做到这一点?

更新:我知道wilcox.test只会测试两组之间的差异,而我的数据框包含三组 . 但我更感兴趣的是如何做到这一点,而不是使用什么测试 . 最有可能的一个组将被删除,但我还没有决定,所以我想测试所有变种 .

这是一个示例:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), var1 = c(9.3, 
9.05, 7.78, 7.11, 7.14, 8.12, 7.5, 7.84, 7.8, 7.52, 8.84, 6.98, 
6.1, 6.89, 6.5, 7.5, 7.8, 5.5, 6.61, 7.65, 7.68), var2 = c(11L, 
11L, 10L, 1L, 3L, 7L, 11L, 11L, 11L, 11L, 4L, 1L, 1L, 1L, 2L, 
2L, 1L, 4L, 8L, 8L, 1L), var3 = c(7L, 11L, 3L, 7L, 11L, 2L, 11L, 
5L, 11L, 11L, 5L, 11L, 11L, 2L, 9L, 9L, 3L, 8L, 11L, 11L, 2L), 
    var4 = c(11L, 11L, 11L, 11L, 6L, 11L, 11L, 11L, 10L, 7L, 
    11L, 2L, 11L, 3L, 11L, 11L, 6L, 11L, 1L, 11L, 11L), var5 = c(11L, 
    1L, 2L, 2L, 11L, 11L, 1L, 10L, 2L, 11L, 1L, 3L, 11L, 11L, 
    8L, 8L, 11L, 11L, 11L, 2L, 9L)), .Names = c("group", "var1", 
"var2", "var3", "var4", "var5"), class = "data.frame", row.names = c(NA, 
-21L))

UPDATE

感谢大家的所有答案!

3 回答

  • 5

    更新我的答案以跨列工作

    test.fun <- function(dat, col) { 
    
     c1 <- combn(unique(dat$group),2)
     sigs <- list()
     for(i in 1:ncol(c1)) {
        sigs[[i]] <- wilcox.test(
                       dat[dat$group == c1[1,i],col],
                       dat[dat$group == c1[2,i],col]
                     )
        }
        names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])
    
     tests <- data.frame(Test=names(sigs),
                        W=unlist(lapply(sigs,function(x) x$statistic)),
                        p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)
    
     return(tests)
    }
    
    
    tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
    names(tests) <- colnames(dat)[-1]
    # tests <- do.call(rbind, tests) reprints as data.frame
    
    # This solution is not "slow" and outperforms the other answers significantly: 
    system.time(
      rep(
       tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
      )
    )
    
    #   user  system elapsed 
    #  0.056   0.000   0.053
    

    结果如下:

    tests
    
    $var1
                    Test  W          p
    1 Group 1 by Group 2 28 0.36596737
    2 Group 1 by Group 3 39 0.05927406
    3 Group 2 by Group 3 38 0.27073136
    
    $var2
                    Test    W         p
    1 Group 1 by Group 2 19.0 0.8205958
    2 Group 1 by Group 3 36.5 0.1159945
    3 Group 2 by Group 3 40.5 0.1522726
    
    $var3
                    Test    W         p
    1 Group 1 by Group 2 13.0 0.2425786
    2 Group 1 by Group 3 23.5 1.0000000
    3 Group 2 by Group 3 41.0 0.1261647
    
    $var4
                    Test  W         p
    1 Group 1 by Group 2 26 0.4323470
    2 Group 1 by Group 3 30 0.3729664
    3 Group 2 by Group 3 29 0.9479518
    
    $var5
                    Test    W         p
    1 Group 1 by Group 2 24.0 0.7100968
    2 Group 1 by Group 3 19.0 0.5324295
    3 Group 2 by Group 3 17.5 0.2306609
    
  • 3

    pairwise.wilcox.test 函数似乎在这里很有用;也许是这样的?

    out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
    names(out) <- names(d)[2:6]
    out
    

    如果你只想要p值,你可以通过并提取它们并制作一个矩阵 .

    sapply(out, function(x) {
        p <- x$p.value
        n <- outer(rownames(p), colnames(p), paste, sep='v')
        p <- as.vector(p)
        names(p) <- n
        p
    })
    ##         var1      var2      var3 var4      var5
    ## 2v1 0.5414627 0.8205958 0.4851572    1 1.0000000
    ## 3v1 0.1778222 0.3479835 1.0000000    1 1.0000000
    ## 2v2        NA        NA        NA   NA        NA
    ## 3v2 0.5414627 0.3479835 0.3784941    1 0.6919826
    

    另请注意 pairwise.wilcox.test 使用Holm方法调整多次比较;如果你想做一些不同的事情,请查看 p.adjust 参数 .

  • 9

    您可以使用 apply 循环遍历列,然后使用匿名函数将列传递给您要使用的任何测试,如下所示(假设数据框名为 df ):

    apply(df[-1],2,function(x) kruskal.test(x,df$group))
    

    注意:我使用了Kruskal-Wallis测试,因为它适用于多个组 . 如果只有两组,使用Wilcoxon测试也可以正常工作 .

    如果你想对所有变量进行成对的Wilcoxon测试,这里是一个双线程,它将遍历所有列和所有对,并将结果作为列表返回:

    group.pairs <- combn(unique(df$group),2,simplify=FALSE)
    # this loops over the 2nd margin - the columns - of df and makes each column
    # available as x
    apply(df[-1], 2, function(x)
                 # this loops over the list of group pairs and makes each such pair
                 # available as an integer vector y
                 lapply(group.pairs, function(y)
                        wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))
    

相关问题