两列完整案例中R的相关性 - 只返回向量的第一个元素？-Java 学习之路

我是R的新手并尝试完成以下提示：

编写一个函数，该函数获取数据文件目录和完整案例的阈值，并计算监测位置的硫酸盐和硝酸盐之间的相关性，其中完全观察到的病例数（在所有变量上）大于阈值 . 该函数应返回满足阈值要求的监视器的相关向量 . 如果没有监视器满足阈值要求，则该函数应返回长度为0的数字向量 . 此函数的原型如下：

corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating the location of
## the CSV files

## 'threshold' is a numeric vector of length 1 indicating the number of
        ## completely observed observations (on all variables) required to compute
        ## the correlation between nitrate and sulfate; the default is 0

## Return a numeric vector of correlations

spectdata<- list.files(pattern= ".csv") #creates vector with list of filenames
corr<-function(directory,threshold =0, id = 1:332){
  info<-list()
  for(i in id){
    info<-read.csv(directory[i], header=TRUE)
    NOBS<-sum(complete.cases(info))
    if (NOBS>threshold){
      return(cor(info$nitrate,info$sulfate,use="complete.obs"))
    }

  }
  corr<-sapply(spectdata,corr)
  corr<-unlist(corr[!sapply(corr,is.null)])
  return(corr)
}
cr<-corr(spectdata,threshold =150)     
head(cr)

它看起来像程序有效，但只返回5元素向量的第一个元素：

> cr<-corr(spectdata,threshold =150)     
> head(cr)
[1] -0.01895754

应该是什么：

cr <- corr("specdata", 150)
head(cr)
## [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814

有人有什么想法吗？我对R很新，我很难过 . 如果我尝试将矢量定义为更长，我会得到相同的答案（例如0.01895754 NA NA NA NA） .

1 回答

你不可能免费得到完整的答案，但假设你想学习，你的功能有很多问题 .

First 你定义了两次函数：

### corr defined here
corr <- function(directory, threshold = 0) {
             spectdata<- list.files(pattern= ".csv") 
             ### corr defined here again!!
             corr<-function(directory,threshold =0, id = 1:332) {

Second 您没有指定list.files的外观：

spectdata<- list.files(pattern= ".csv") # you should add directory
                                        # I'll leave it to you to add it

Third 您的函数返回多个参数：

return(cor(info$nitrate,info$sulfate,use="complete.obs")) # 1st return
return(corr)                                              # 2nd return

Fourth 您重复变量名称 . 您的函数名为 corr ，您将局部变量定义为 corr ：

corr <- sapply(spectdata, corr)      ## local variable

MY SUGGESTIONS 给出了您提供的代码

First 坚持1个功能定义第一个很好
Second 指定 list.files 应该看的目录
Third 只返回一个参数 .

You can make a vector of elements with a for loop like so:

info <- NULL
for (i in 1:4) {
    info <- c(info, i)
}

Fourth 您不需要 sapply 或 unlist . 尽量让它没有它们 .
Fifth 对与函数名冲突的局部变量使用不同的变量名 .

MOST IMPORTANTLY 逐行运行每个命令并查看每个输出 .

回复于 2024-05-05T12:21:10+08:00

两列完整案例中R的相关性 - 只返回向量的第一个元素？

1 回答

相关问题