R3.4.1从多个.csv文件中读取数据-Java 学习之路

-1

我正在尝试构建一个可以在.csv文件中导入/读取多个数据表的函数，然后计算所选文件的统计信息 . 332 .csv文件中的每一个都包含一个具有相同列名的表：Date，Pollutant和id . 有很多缺失值 .

这是我到目前为止编写的函数，用于计算污染物的平均值：

pollutantmean <- function(directory, pollutant, id = 1:332) { 

  library(dplyr)
  setwd(directory)
  good<-c()

  for (i in (id)){
    task1<-read.csv(sprintf("%03d.csv",i))
  }

  p<-select(task1, pollutant)
  good<-c(good,complete.cases(p))
  mean(p[good,]) 
}

我遇到的问题是，每次通过循环时，都会读取一个新文件，并且已读取的数据将被新文件中的数据替换 . 所以我最终得到一个功能与1个单个文件完美配合，但不是当我想选择多个文件时，例如如果我要求id = 10:20，我最终只计算在文件20上的平均值 .

我怎么能改变代码，以便我可以选择多个文件？

谢谢！

2 回答

我的回答提供了一种方法，可以在不使用循环的情况下完成您想要做的事情（如果我理解了一切） . 我的两个假设是：（1）你有332 * .csv文件具有相同的 Headers （列名） - 所以所有文件都具有相同的结构，（2）你可以将你的表组合成一个大数据框 .

如果这两个假设是正确的，我会使用你的文件列表将你的文件作为数据框导入（所以这个答案不包含循环函数！） .

# This creates a list with the name of your file. You have to provide the path to this folder.
file_list <- list.files(path = [your path where your *.csv files are saved in], full.names = TRUE)

# This will create a list of data frames.
mylist <- lapply(file_list, read.csv)

# This will 'row-bind' the data frames of the list to one big list.
mydata <- rbindlist(mylist)

# Now you can perform your calculation on this big data frame, using your column information to filter or subset to get information of just a subset of this table (if necessary).

我希望这有帮助 .

回复于 2024-04-19T11:57:54+08:00

0
也许是这样的？
```
library(dplyr)

pollutantmean <- function(directory, pollutant, id = 1:332) { 
    od <- setwd(directory)
    on.exit(setwd(od))

    task_list <- lapply(sprintf("%03d.csv", id), read.csv)
    p_list <- lapply(task_list, function(x) complete.cases(select(x, pollutant)))
    mean(sapply(p_list, mean))
}
```
笔记：
- 将所有 library 调用放在脚本的开头，它们将更容易阅读 . 从不在功能内部 .
- 在函数内设置工作目录也是一个坏主意 . 当函数返回时，该更改仍将打开，您可能会丢失 . 更好的方法是设置wd 's outside functions, but since you' ve在函数内部设置它，我相应地添加了代码 .
回复于 2024-04-19T11:57:54+08:00

R3.4.1从多个.csv文件中读取数据

2 回答

相关问题