首页 文章

R:如何在他们的小组的盒子图上为样品着色?

提问于
浏览
0

我目前在基质中具有基因表达数据,通过列中的样品和行中的基因排列 . 我有大约300个样本对30,000个基因 .

前三行数据是这样的:

Sample1  Sample2  Sample3   Sample4   Sample5
Gene1    6.53845  6.38723  6.41613   6.07901   6.45148
Gene2    6.34303  6.52751  6.48025   6.79185   6.94955
Gene3    6.17286  6.31772  6.44266   6.61777   7.05509
...      ...

等等多达30,000行,300个样本 .

我已经能够使用R绘制数据的箱线图,但我现在希望根据样本的批次/组对箱线图进行着色 .

我有一张批处理信息表 .

Sample   Batch
Sample1  A
Sample2  A
Sample3  B
Sample4  A
Sample5  C
...      ...

等等8批次 . 使用R,我该如何根据样本属于哪个批次来着色箱图?谢谢!

1 回答

  • 1

    其中一种方法可能是

    library(dplyr)
    library(tidyr)
    library(tibble)
    library(ggplot2)
    
    df %>%
      rownames_to_column("Genes") %>%                          #add rownames as column
      gather(Sample, Sample_value, -Genes) %>%                 #convert data to long format from wide format for plotting
      left_join(batch_lookup, by = "Sample") %>%               #join it with lookup table to add 'Batch' column
      ggplot(aes(x=Sample, y=Sample_value, color=Batch)) +     #plot data
        geom_boxplot()
    

    哪个情节

    enter image description here

    Sample data:

    df <- structure(list(Sample1 = c(6.53845, 6.34303, 6.17286), Sample2 = c(6.38723, 
    6.52751, 6.31772), Sample3 = c(6.41613, 6.48025, 6.44266), Sample4 = c(6.07901, 
    6.79185, 6.61777), Sample5 = c(6.45148, 6.94955, 7.05509)), .Names = c("Sample1", 
    "Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame", row.names = c("Gene1", 
    "Gene2", "Gene3"))
    
    batch_lookup <- structure(list(Sample = c("Sample1", "Sample2", "Sample3", "Sample4", 
    "Sample5"), Batch = c("A", "A", "B", "A", "C")), .Names = c("Sample", 
    "Batch"), class = "data.frame", row.names = c(NA, -5L))
    

相关问题