首页 文章

转换data.table中的列类

提问于
浏览
95

我在使用data.table时遇到问题:如何转换列类?这是一个简单的例子:使用data.frame我没有转换它的问题,data.table我只是不知道如何:

df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])

library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE) 
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE]) 
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") : 
#unused argument(s) (with = FALSE)

我想念一些明显的东西吗?

由于马修的帖子更新:之前我使用过旧版本,但即使在更新到1.6.6(我现在使用的版本)之后,我仍然会收到错误 .

更新2:假设我想将类“factor”的每一列转换为“character”列,但事先并不知道哪个列属于哪个类 . 使用data.frame,我可以执行以下操作:

classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)

我可以用data.table做类似的事情吗?

更新3:

sessionInfo()R版本2.13.1(2011-07-08)平台:x86_64-pc-mingw32 / x64(64位)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.6.6

loaded via a namespace (and not attached):
[1] tools_2.13.1

7 回答

  • 79

    如果data.table中有列名列表,则需要更改do的类:

    convert_to_character <- c("Quarter", "value")
    
    dt[, convert_to_character] <- dt[, lapply(.SD, as.character), .SDcols = convert_to_character]
    
  • 0

    试试这个

    DT <- data.table(X1 = c("a", "b"), X2 = c(1,2), X3 = c("hello", "you"))
    changeCols <- colnames(DT)[which(as.vector(DT[,lapply(.SD, class)]) == "character")]
    
    DT[,(changeCols):= lapply(.SD, as.factor), .SDcols = changeCols]
    
  • 0

    这是一个很糟糕的方法!我很有 Value 来记录这种困难的方式 . 另外,这是 eval substitute 语法的一个很好的语法示例 .

    library(data.table)
    dt <- data.table(ID = c(rep("A", 5), rep("B",5)), 
                     fac1 = c(1:5, 1:5), 
                     fac2 = c(1:5, 1:5) * 2, 
                     val1 = rnorm(10),
                     val2 = rnorm(10))
    
    names_factors = c('fac1', 'fac2')
    names_values = c('val1', 'val2')
    
    for (col in names_factors){
      e = substitute(X := as.factor(X), list(X = as.symbol(col)))
      dt[ , eval(e)]
    }
    for (col in names_values){
      e = substitute(X := as.numeric(X), list(X = as.symbol(col)))
      dt[ , eval(e)]
    }
    
    str(dt)
    

    给你的

    Classes ‘data.table’ and 'data.frame':  10 obs. of  5 variables:
     $ ID  : chr  "A" "A" "A" "A" ...
     $ fac1: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
     $ fac2: Factor w/ 5 levels "2","4","6","8",..: 1 2 3 4 5 1 2 3 4 5
     $ val1: num  0.0459 2.0113 0.5186 -0.8348 -0.2185 ...
     $ val2: num  -0.0688 0.6544 0.267 -0.1322 -0.4893 ...
     - attr(*, ".internal.selfref")=<externalptr>
    
  • 29

    尝试:

    dt <- data.table(A = c(1:5), 
                     B= c(11:15))
    
    x <- ncol(dt)
    
    for(i in 1:x) 
    {
         dt[[i]] <- as.character(dt[[i]])
    }
    
  • 2

    我尝试了几种方法 .

    # BY {dplyr}
    data.table(ID      = c(rep("A", 5), rep("B",5)), 
               Quarter = c(1:5, 1:5), 
               value   = rnorm(10)) -> df1
    df1 %<>% dplyr::mutate(ID      = as.factor(ID),
                           Quarter = as.character(Quarter))
    # check classes
    dplyr::glimpse(df1)
    # Observations: 10
    # Variables: 3
    # $ ID      (fctr) A, A, A, A, A, B, B, B, B, B
    # $ Quarter (chr) "1", "2", "3", "4", "5", "1", "2", "3", "4", "5"
    # $ value   (dbl) -0.07676732, 0.25376110, 2.47192852, 0.84929175, -0.13567312,  -0.94224435, 0.80213218, -0.89652819...
    

    或者其他

    # from list to data.table using data.table::setDT
    list(ID      = as.factor(c(rep("A", 5), rep("B",5))), 
         Quarter = as.character(c(1:5, 1:5)), 
         value   = rnorm(10)) %>% setDT(list.df) -> df2
    class(df2)
    # [1] "data.table" "data.frame"
    
  • 0

    我提供了更通用,更安全的方法来做这些事情,

    ".." <- function (x) 
    {
      stopifnot(inherits(x, "character"))
      stopifnot(length(x) == 1)
      get(x, parent.frame(4))
    }
    
    
    set_colclass <- function(x, class){
      stopifnot(all(class %in% c("integer", "numeric", "double","factor","character")))
      for(i in intersect(names(class), names(x))){
        f <- get(paste0("as.", class[i]))
        x[, (..("i")):=..("f")(get(..("i")))]
      }
      invisible(x)
    }
    

    函数 .. 确保我们得到一个超出data.table范围的变量; set_colclass将设置cols的类 . 你可以像这样使用它:

    dt <- data.table(i=1:3,f=3:1)
    set_colclass(dt, c(i="character"))
    class(dt$i)
    
  • -3

    对于单个列:

    dtnew <- dt[, Quarter:=as.character(Quarter)]
    str(dtnew)
    
    Classes ‘data.table’ and 'data.frame':  10 obs. of  3 variables:
     $ ID     : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
     $ Quarter: chr  "1" "2" "3" "4" ...
     $ value  : num  -0.838 0.146 -1.059 -1.197 0.282 ...
    

    使用 lapplyas.character

    dtnew <- dt[, lapply(.SD, as.character), by=ID]
    str(dtnew)
    
    Classes ‘data.table’ and 'data.frame':  10 obs. of  3 variables:
     $ ID     : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
     $ Quarter: chr  "1" "2" "3" "4" ...
     $ value  : chr  "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...
    

相关问题