首页 文章

用dplyr mutate改变因子水平

提问于
浏览
33

这可能很简单,我觉得这很愚蠢 . 我想使用mutate更改数据框中因子的级别 . 简单的例子:

library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")

我明白了:

Error: Unexpected '=' in "mutate(dat,levels(x) ="

为什么这不起作用?如何用mutate改变因子水平?

6 回答

  • 33

    使用tidyverse中的forcats包也很容易 .

    mutate(dat, x = fct_recode(x, "B" = "A"))
    
  • 13

    我不太确定我是否理解你的问题,但是如果你想用 mutate() 更改 cyl 的因子水平,你可以这样做:

    df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
    

    你会得到:

    #> str(df$cyl)
    # Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
    
  • 22

    也许您正在寻找这个plyr :: revalue函数:

    mutate(dat, x = revalue(x, c("A" = "B")))
    

    你也可以看到plyr :: mapvalues .

  • 27

    您可以使用 dplyr 中的 recode 函数 .

    df <- iris %>%
         mutate(Species = recode(Species, setosa = "SETOSA",
             versicolor = "VERSICOLOR",
             virginica = "VIRGINICA"
         )
    )
    
  • 9

    无法发表评论,因为我没有足够的声望点,但重新编码仅适用于矢量,所以@ Stefano的答案中的上述代码应该是

    df <- iris %>%
      mutate(Species = recode(Species, 
         setosa = "SETOSA",
         versicolor = "VERSICOLOR",
         virginica = "VIRGINICA")
      )
    
  • 10

    根据我的理解,当前接受的答案仅改变因子水平的顺序,而不是实际标签(即,如何调用因子的水平) . 要说明级别和标签之间的区别,请考虑以下示例:

    cyl 变为因子(因为它们以字母数字顺序编码,所以不需要指定级别):

    mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8))) 
        mtcars2$cyl[1:5]
        #[1] 6 6 4 6 8
        #Levels: 4 6 8
    

    更改 levels 的顺序(但不是标签本身:cyl仍然是同一列)

    mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4))) 
        mtcars3$cyl[1:5]
        #[1] 6 6 4 6 8
        #Levels: 8 6 4
        all(mtcars3$cyl==mtcars2$cyl)
        #[1] TRUE
    

    将新 labels 分配给 cyl 标签的顺序为:c(8,6,4),因此我们指定新标签,如下所示:

    mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8", 
                                                                   "new_value_for_6", 
                                                                   "new_value_for_4" )))
        mtcars4$cyl[1:5]
        #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
        #Levels: new_value_for_8 new_value_for_6 new_value_for_4
    

    请注意此列与第一列的不同之处:

    all(as.character(mtcars4$cyl)!=mtcars3$cyl) 
        #[1] TRUE 
        #Note: TRUE here indicates that all values are unequal because I used != instead of ==
        #as.character() was required as the levels were numeric and thus not comparable to a character vector
    

    更多细节:

    如果我们使用 mtcars2 而不是 mtcars3 更改 cyl 的级别,我们需要以不同方式指定标签以获得相同的结果 . mtcars2 的标签顺序为:c(4,6,8),因此我们指定新标签如下

    #change labels of mtcars2 (order used to be: c(4, 6, 8)
        mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4", 
                                                                   "new_value_for_6", 
                                                                   "new_value_for_8" )))
    

    mtcars3$cylmtcars4$cyl 不同, mtcars4$cylmtcars5$cyl 的标签因此是相同的,即使它们的级别具有不同的顺序 .

    mtcars4$cyl[1:5]
        #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
        #Levels: new_value_for_8 new_value_for_6 new_value_for_4
    
        mtcars5$cyl[1:5]
        #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
        #Levels: new_value_for_4 new_value_for_6 new_value_for_8
    
        all(mtcars4$cyl==mtcars5$cyl)
        #[1] TRUE
    
        levels(mtcars4$cyl) == levels(mtcars5$cyl)
        #1] FALSE  TRUE FALSE
    

相关问题