首页 文章

ddply错误的含义:'names' attribute [9]的长度必须与vector [1]的长度相同

提问于
浏览
37

我正在通过黑客机器学习,我被困在这一行:

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

这会产生以下错误:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

这是一个追溯():

> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   }(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

priority.train对象是一个数据框,这里有更多信息:

> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date"       "From.EMail" "Subject"    "Message"    "Path"      
> sapply(priority.train, mode)
       Date  From.EMail     Subject     Message        Path 
     "list" "character" "character" "character" "character" 
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt" 

$From.EMail
[1] "character"

$Subject
[1] "character"

$Message
[1] "character"

$Path
[1] "character"

> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame':   1250 obs. of  5 variables:
 $ Date      : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
 $ From.EMail: chr  "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ...
 $ Subject   : chr  "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
 $ Message   : chr  "    \n Hello,\n   \n         I just installed redhat 7.2 and I think I have everything \nworking properly.  Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file.  Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file.  Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n>  I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
 $ Path      : chr  "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform

我会发布一个样本,但内容有点长,我不认为这里的内容是相关的 .

这里也会发生同样的错误:

> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

有没有人知道这里发生了什么?错误似乎是由priority.train之外的其他对象生成的,因为它的names属性显然有9个元素 .

我很感激任何帮助 . 谢谢!

Problem solved

由于@ user1317221_G使用dput功能的提示,我发现了这个问题 . 问题在于Date字段,此时此列表包含9个字段(sec,min,hour,mday,mon,year,wday,yday,isdst) . 为了解决这个问题,我只是将日期转换为字符向量,使用ddply然后将日期转换回日期:

> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)

7 回答

  • 1

    我也面临同样的问题,我通过使用as.character保持ddply所需的数据并将过滤器变量和所有必需的Text变量转换为字符来解决它

    有效

  • 6

    我通过将格式从POSIXlt转换为POSIXct来解决我遇到的问题,正如Hadley在上面建议的那样 - 一行代码:

    mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" 
        mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply
    
  • 3

    你可能已经seen this并没有帮助 . 我想我们可能还没有回答,因为人们无法重现你的错误 .

    dput 或更小的 head(dput()) 可能有助于此 . 但是这里有一个使用 base 的替代方案:

    x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d"))
    
    ddply(x,.(A),summarise, Freq = length(B))
      A Freq
    1 a    2
    2 b    1
    3 c    1
    
     tapply(x$B,x$A,length)
    a b c 
    2 1 1
    

    tapply 对你有用吗?

    x2 <- data.frame(A=c("removed@removed.ca", "removed@removed.net"),
                     B=c("please help a newbie compile mplayer :-)", 
                         "re: please help a newbie compile mplayer :-)"))
    
    tapply(x2$B,x2$A,length)
    removed@removed.ca removed@removed.net 
                  1                   1 
    
    ddply(x2,.(A),summarise, Freq = length(B))
                        A Freq
    1  removed@removed.ca    1
    2 removed@removed.net    1
    

    你也可以尝试更简单:

    table(x2$A)
    
     removed@removed.ca removed@removed.net 
                  1                   1
    
  • 43

    我有一个非常相似的问题,虽然不确定它是否是一个相同的问题 . 我收到了以下错误 .

    Error in attributes(out) <- attributes(col) : 
      'names' attribute [20388] must be the same length as the vector [128]
    

    我在列表模式下没有任何变量,因此Mota的解决方案对我的情况不起作用 . 我排序问题的方法是删除plyr 1.8并手动安装plyr 1.7 . 然后错误就消失了 . 我也尝试重新安装plyr 1.8并重复了这个问题 .

    HTH .

  • 4

    我也遇到了与ddply类似的问题,并给出了下面的代码/错误:

    test <- ddply(test, "catColumn", function(df) df[1:min(nrow(df), 3),])
        Error: 'names' attribute [11] must be the same length as the vector [2]
    

    数据框“测试”中有相当多的分类变量 .

    将分类变量转换为字符变量如下所示使ddply命令起作用:

    test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE)
    
  • 2

    一旦你理解了一个干扰你的日期列,你也可以在运行命令时将该列留下而不是转换它...

    所以

    from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
    

    可以变成

    from.weight <- ddply(priority.train[,c(1:7,9:10)], .(From.EMail), summarise, Freq = length(Subject))
    

    例如,如果POSIXlt日期恰好位于数据帧的第8列 . 报告错误的奇怪之处在于,它可能与您正在分组的内容或您要求的输出信息无关...

  • 0

    我在使用 ddply 时遇到了同样的问题并用 doBy 修复了它

    library(doBy) 
    bylength = function(x){length(x)} 
    newdt = bylength(X ~From.EMail + To.EMail, data = dt, FUN = bylength)
    

相关问题