读取R中的文本文件并将其转换为字符对象-Java 学习之路

我在R 2.10.0中正在阅读这样的文本文件

248585_at   250887_at   245638_s_at AFFX-BioC-5_at
248585_at   250887_at   264488_s_at 245638_s_at AFFX-BioC-5_at  AFFX-BioC-3_at  AFFX-BioDn-5_at
248585_at   250887_at

使用命令集群<-read.delim（“test”，sep =“\ t”，fill = TRUE，header = FALSE）

现在，我必须将此文件中的每一行传递给BioConductor函数，该函数仅将字符向量作为输入 . 我的问题是在这个“集群”对象上使用“as.character”会将所有内容转换为数字字符串 .

> clusters[1,]
         V1        V2          V3             V4 V5 V6 V7
1 248585_at 250887_at 245638_s_at AFFX-BioC-5_at

但

> as.character(clusters[1,])
[1] "1" "1" "2" "3" "1" "1" "1"

有没有办法保留原始名称并将它们放入字符向量中？

也许它会有所帮助：“read.delim”文件给出的“cluster”对象属于“list”类型 .

非常感谢：-）

费德里科

2 回答

6
默认情况下，字符列将转换为因子 . 您可以通过设置 as.is=TRUE 参数来避免这种情况：
```
clusters <- read.delim("test", sep="\t", fill=TRUE, header=FALSE, as.is=TRUE)
```
如果只将参数从文本文件传递到字符向量，则可以执行以下操作：
```
x <- readLines("test")
xx <- strsplit(x,split="\t")
xx[[1]] # xx is a list
# [1] "248585_at"      "250887_at"      "245638_s_at"    "AFFX-BioC-5_at"
```
回复于 2024-04-29T23:15:22+08:00
1
我从来没有想到会发生这种情况，但尝试一个小的测试用例会产生你给出的相同结果 .

由于 df[1,] 的结果本身就是 data.frame ，我认为尝试的一个修复是使用 unlist - 似乎有效：
```
> df <- data.frame(a=LETTERS[1:10], b=LETTERS[11:20], c=LETTERS[5:14])
> df[1,]
  a b c
1 A K E
> as.character(df[1,])
[1] "1" "1" "1"
> as.character(unlist(df[2,]))
[1] "B" "L" "F"
```
我认为首先将 data.frame 转换为 matrix 也可以解决这个问题：
```
m <- as.matrix(df)
> as.character(m[2,])
[1] "B" "L" "F"
```
为避免 data.frame 中的因素出现问题，您可能需要在从文本文件中读取数据时设置 stringsAsFactors=TRUE ，例如：
```
clusters <- read.delim("test", sep="\t", fill=TRUE, header=FALSE,
                       stringsAsFactors=FALSE)
```
而且，在所有这些之后，出乎意料的行为似乎来自这样一个事实，即你的数据框架中的原始关注探测被视为因素 . 因此，做_11555517事件将会大肆宣扬：
```
df <- data.frame(a=LETTERS[1:10], b=LETTERS[11:20],
                 c=LETTERS[5:14], stringsAsFactors=FALSE)
> as.character(df[1,])
[1] "A" "K" "E"
```
回复于 2024-04-29T23:15:22+08:00

读取R中的文本文件并将其转换为字符对象

2 回答

相关问题