首页 文章

将R data.frame导出到SPSS

提问于
浏览
2

有一个 foreign 包,其函数 write.foreign() 可以编写SPS和CSV文件 . SPS文件可以将CSV文件读入包含标签的SPSS . 到目前为止很好,但是该功能存在一些问题:

  • 较新的SPSS版本可能会显示错误,您在 DATA LIST 中的格式定义太少

  • 如果通过 attr() 存储的数值变量有"labels",则会丢失这些变量 .

  • 即使SPSS vesion支持最多32767的字符串,如果任何变量中的字符数超过255,函数 write.foreign() 也会停止 .

  • 如果使用任何字符变量,则为星号(*),但较新的SPSS版本无法处理 .

  • CSV文件以逗号分隔,并且(不)不使用引号,因此字符串中不允许使用逗号(字符)

  • 非ASCII字符(例如变音符号)将导致导入崩溃

  • 如果您的角色包含任何NA值,您会看到......

...这样的错误消息:

Error in if (any(lengths > 255L)) stop("Cannot handle character variables longer than 255") : 
    missing value where TRUE/FALSE needed

我花了很多时间,然后发现一个好的帖子(http://r.789695.n4.nabble.com/SPSS-export-in-R-package-foreign-td921491.html)开始并让它变得更好 . 在这里's my result, I' d想与大家分享 .

3 回答

  • 1

    SPSS扩展命令STATS GET R可以将数据帧直接从保存的R工作空间读取到SPSS数据集中 . 如果尚未安装此扩展命令(它将显示在“文件”菜单上),则可以从“实用工具”菜单(统计信息22-23)或“扩展”菜单(统计信息24)进行安装 .

  • 4

    要将R data.frame导出到SPSS,请使用避货包中的 write_sav

    library(haven)
    write_sav(mtcars, "mtcars.sav")
    
  • 1

    此功能可替代 foreign:write.foreign 来处理上述问题 .

    Note: 为避免SPSS找到CSV文件的问题,请至少为 datafile 指定完整路径(!)(如果使用原始 foreign:write.foreign() 也是如此) .

    Note: 此脚本将在没有警告的情况下用空格替换字符串中的制表符(TAB)和其他间距(包括CR LF) . 可以考虑使用 GET DATA 而不是麻烦的 DATA LIST 来解决这个限制 .

    Note: 可能有警告 In FUN(X[[i]], ...) : probable complete loss of accuracy in modulus - 这是指对小数进行计数,可以忽略 .

    Note: POSIXltPOSIXct 变量尚未由脚本正确处理 .

    writeForeignMySPSS = function (df, datafile, codefile, varnames = NULL, len = 32767) {
        adQuote <-  function (x) paste("\"", x, "\"", sep = "")
    
        # Last variable must not be empty for DATA LIST
        if (any(is.na(df[[length(df)]]))) {
            df$END_CASE = 0
        }
    
        # http://stackoverflow.com/questions/5173692/how-to-return-number-of-decimal-places-in-r
        decimalplaces <- function(x) {
            y = x[!is.na(x)]
            if (length(y) == 0) {
                return(0)
            }
            if (any((y %% 1) != 0)) {
                info = strsplit(sub('0+$', '', as.character(y)), ".", fixed=TRUE)
                info = info[sapply(info, FUN=length) == 2]
                if (length(info) >= 2) {
                  dec = nchar(unlist(info))[seq(2, length(info), 2)]
                } else {
                  return(0)
                }
                return(max(dec, na.rm=T))
            } else {
                return(0)
            }
        }
    
        dfn <- lapply(df, function(x) if (is.factor(x))
            as.numeric(x)
            else x)
    
        # Boolean variables (dummy coding)
        bv = sapply(dfn, is.logical)
        for (v in which(bv)) {
            dfn[[v]] = ifelse(dfn[[v]], 1, 0)
        }
    
        varlabels <- names(df)
        # Use comments where applicable
        for (i in 1:length(df)) {
          cm = comment(df[[i]])
          if (is.character(cm) && (length(cm) > 0)) {
            varlabels[i] = comment(df[[i]])
          }
        }
    
        if (is.null(varnames)) {
            varnames <- abbreviate(names(df), 8L)
            if (any(sapply(varnames, nchar) > 8L))
                stop("I cannot abbreviate the variable names to eight or fewer letters")
            if (any(varnames != varlabels))
                warning("some variable names were abbreviated")
        }
        varnames <- gsub("[^[:alnum:]_\\$@#]", "\\.", varnames)
        dl.varnames <- varnames
        chv = sapply(df, is.character)
        if (any(chv)) {
            for (v in which(chv)) {
                dfn[[v]] = gsub("\\s", " ", dfn[[v]])
            }
            lengths <- sapply(df[chv], function(v) max(nchar(v), na.rm=T))
            if (any(lengths > len)) {
                warning(paste("Clipped strings in", names(df[chv]), "to", len, "characters"))
                for (v in which(chv)) {
                    df[[v]] = substr(df[[v]], start=1, stop=len)
                }
            }
            lengths[is.infinite(lengths)] = 0
            lengths[lengths < 1] = 1
            lengths <- paste("(A", lengths, ")", sep = "")
            # star <- ifelse(c(FALSE, diff(which(chv) > 1)), " *",
            dl.varnames[chv] <- paste(dl.varnames[chv], lengths)
        }
    
        # decimals and bools
        nmv = sapply(df, is.numeric)
        dbv = sapply(df, is.numeric)
        nv = (nmv | dbv)
        decimals = sapply(df[nv], FUN=decimalplaces)
        dl.varnames[nv] = paste(dl.varnames[nv], " (F", decimals+8, ".", decimals, ")", sep="")
        if (length(bv) > 0) {
            dl.varnames[bv] = paste(dl.varnames[bv], "(F1.0)")
        }
        rmv = !(chv | nv | bv)
        if (length(rmv) > 0) {
            dl.varnames[rmv] = paste(dl.varnames[rmv], "(F8.0)")
        }
        # Breaks in output
        brv = seq(1, length(dl.varnames), 10)
        dl.varnames[brv] = paste(dl.varnames[brv], "\n", sep=" ")
    
        cat("SET LOCALE = ENGLISH.\n", file = codefile)
        cat("DATA LIST FILE=", adQuote(datafile), " free (TAB)\n", file = codefile, append = TRUE)
        cat("/", dl.varnames, " .\n\n", file = codefile, append = TRUE)
        cat("VARIABLE LABELS\n", file = codefile, append = TRUE)
        cat(paste(varnames, adQuote(varlabels), "\n"), ".\n", file = codefile,
            append = TRUE)
        factors <- sapply(df, is.factor)
        if (any(factors)) {
            cat("\nVALUE LABELS\n", file = codefile, append = TRUE)
            for (v in which(factors)) {
                cat("/\n", file = codefile, append = TRUE)
                cat(varnames[v], " \n", file = codefile, append = TRUE)
                levs <- levels(df[[v]])
                cat(paste(1:length(levs), adQuote(levs), "\n", sep = " "),
                    file = codefile, append = TRUE)
            }
            cat(".\n", file = codefile, append = TRUE)
        }
    
        # Labels stored in attr()
        attribs <- !unlist(lapply(sapply(df, FUN=attr, which="1"), FUN=is.null))
        if (any(attribs)) {
            cat("\nVALUE LABELS\n", file = codefile, append = TRUE)
            for (v in which(attribs)) {
                cat("/\n", file = codefile, append = TRUE)
                cat(varnames[v], " \n", file = codefile, append = TRUE)
                # Check labeled values
                tc = list()
                for (tcv in dimnames(table(df[[v]]))[[1]]) {
                    if (!is.null(tcl <- attr(df[[v]], tcv))) {
                        tc[tcv] = tcl
                    }
                }
                cat(paste(names(tc), tc, "\n", sep = " "),
                    file = codefile, append = TRUE)
            }
            cat(".\n", file = codefile, append = TRUE)
        }
    
        ordinal <- sapply(df, is.ordered)
        if (any(ordinal)) {
            tmp = varnames[ordinal]
            brv = seq(1, length(tmp), 10)
            tmp[brv] = paste(tmp[brv], "\n")
            cat(paste("\nVARIABLE LEVEL", paste(tmp, collapse=" "), "(ORDINAL).\n"),
                file = codefile, append = TRUE)
        }
        num <- sapply(df, is.numeric)
        if (any(num)) {
            tmp = varnames[num]
            brv = seq(1, length(tmp), 10)
            tmp[brv] = paste(tmp[brv], "\n")
            cat(paste("\nVARIABLE LEVEL", paste(tmp, collapse=" "), "(SCALE).\n"),
                file = codefile, append = TRUE)
        }
        cat("\nEXECUTE.\n", file = codefile, append = TRUE)
    
        write.table(dfn, file = datafile, row = FALSE, col = FALSE,
                    sep = "\t", quote = F, na = "", eol = "\n", fileEncoding="UTF-8")
    }
    

    从长远来看,可能会认为更改已合并到 foreign 包中 . 不幸的是,r项目的错误报告系统目前仅限于以前注册的开发人员 .

相关问题