R的可变长度不同（lme4的线性建模）-Java 学习之路

我的输入文件：

Treat1  Treat2  Batch   gene1    gene2
High    Low     1       92.73    4.00
Low     Low     1       101.85   6.00
High    High    1       136.00   4.00
Low     High    1       104.00   3.00
High    Low     2       308.32   10.00
Low     Low     2       118.93   3.00
High    High    2       144.47   3.00
Low     High    2       189.66   4.00
High    Low     3       95.12    2.00
Low     Low     3       72.08    6.00
High    High    3       108.65   2.00
Low     High    3       75.00    3.00
High    Low     4       111.39   5.00
Low     Low     4       119.80   4.00
High    High    4       466.55   11.00
Low     High    4       125.00   3.00

有数万个额外的列，每个列都有一个 Headers 和一个数字列表，长度与“gene1”列相同 .

我的代码：

library(lme4)
library(lmerTest)

# Import the data.
mydata <- read.table("input_file", header=TRUE, sep="\t")

# Make batch into a factor
mydata$Batch <- as.factor(mydata$Batch)

# Check structure
str(mydata)

# Get file without the factors, so that names(df) gives gene names.
genefile <- mydata[c(4:2524)]

# Loop through all gene names and run the model once per gene and print to file.
for (i in names(genefile)){
    lmer_results <- lmer(i ~ Treat1*Treat2 + (1|Batch), data=mydata)
    lmer_summary <- summary(lmer_results)
    write(lmer_summary,file="results_file",append=TRUE, sep="\t", quote=FALSE)
}

结构体：

'data.frame':     16 obs. of  2524 variables:
$ Treat1          : Factor w/ 2 levels "High","Low": 1 2 1 2 1 2 1 2 1 2 ...
$ Treat2          : Factor w/ 2 levels "High","Low": 2 2 1 1 2 2 1 1 2 2 ...
$ Batch           : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
$ gene1           : num  92.7 101.8 136 104 308.3 ...
$ gene2           : num  4 6 4 3 10 3 3 4 2 6 ...

我的错误信息：

model.frame.default中的错误（data = mydata，drop.unused.levels = TRUE，formula = i~：变量长度不同（找到'Treat1'）调用：lmer ... - > eval - > eval - > - > model.frame.default执行暂停

我试图检查所涉及的所有对象，并且看不到变量长度的任何差异，我也确保没有丢失的数据 . 使用na.exclude运行它不会改变任何东西 .

怎么知道发生了什么？

1 回答

@Roland的诊断（ lmer 正在寻找一个名为i的变量，而不是名为 i 的变量：强制Lewis Carroll reference）是正确的，我想 . 处理此问题的最直接方法是 reformulate() ，类似于：

for (i in names(genefile)){
    form <- reformulate(c("Treat1*Treat2","(1|Batch)"),response=i)
    lmer_results <- lmer(form, data=mydata)
    lmer_summary <- summary(lmer_results)
    write(lmer_summary,file="results_file",
           append=TRUE, sep="\t", quote=FALSE)
}

第二个想法，你应该能够使用内置的 refit() 方法显着加快你的计算速度，该方法为一个新的响应变量重新设计模型：为简单起见，假设第一个基因被称为 geneAAA ：

wfun <- function(x)  write(summary(x), 
       file="results_file", append=TRUE, sep="\t",quote=FALSE)
mod0 <- lmer(geneAAA ~ Treat1*Treat2 + (1|Batch), data=mydata)
wfun(mod0)
for (i in names(genefile)[-1]) {
    mod1 <- refit(mod0,mydata[[i]])
    wfun(mod1)
}

（顺便说一下，我不确定你的 write() 命令做了什么明智......）

回复于 2024-04-27T19:45:41+08:00

R的可变长度不同（lme4的线性建模）

1 回答

相关问题