用dplyr mutate改变因子水平-Java 学习之路

这可能很简单，我觉得这很愚蠢 . 我想使用mutate更改数据框中因子的级别 . 简单的例子：

library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")

我明白了：

Error: Unexpected '=' in "mutate(dat,levels(x) ="

为什么这不起作用？如何用mutate改变因子水平？

6 回答

33
使用tidyverse中的forcats包也很容易 .
```
mutate(dat, x = fct_recode(x, "B" = "A"))
```
回复于 2024-04-28T21:44:12+08:00
13
我不太确定我是否理解你的问题，但是如果你想用 mutate() 更改 cyl 的因子水平，你可以这样做：
```
df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
```
你会得到：
```
#> str(df$cyl)
# Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
```
回复于 2024-04-28T21:44:12+08:00
22
也许您正在寻找这个plyr :: revalue函数：
```
mutate(dat, x = revalue(x, c("A" = "B")))
```
你也可以看到plyr :: mapvalues .
回复于 2024-04-28T21:44:12+08:00

您可以使用 dplyr 中的 recode 函数 .

df <- iris %>%
     mutate(Species = recode(Species, setosa = "SETOSA",
         versicolor = "VERSICOLOR",
         virginica = "VIRGINICA"
     )
)

回复于 2024-04-28T21:44:12+08:00

9
无法发表评论，因为我没有足够的声望点，但重新编码仅适用于矢量，所以@ Stefano的答案中的上述代码应该是
```
df <- iris %>%
  mutate(Species = recode(Species, 
     setosa = "SETOSA",
     versicolor = "VERSICOLOR",
     virginica = "VIRGINICA")
  )
```
回复于 2024-04-28T21:44:12+08:00

根据我的理解，当前接受的答案仅改变因子水平的顺序，而不是实际标签（即，如何调用因子的水平） . 要说明级别和标签之间的区别，请考虑以下示例：

将 cyl 变为因子（因为它们以字母数字顺序编码，所以不需要指定级别）：

mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8))) 
    mtcars2$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 4 6 8

更改 levels 的顺序（但不是标签本身：cyl仍然是同一列）

mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4))) 
    mtcars3$cyl[1:5]
    #[1] 6 6 4 6 8
    #Levels: 8 6 4
    all(mtcars3$cyl==mtcars2$cyl)
    #[1] TRUE

将新 labels 分配给 cyl 标签的顺序为：c（8,6,4），因此我们指定新标签，如下所示：

mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8", 
                                                               "new_value_for_6", 
                                                               "new_value_for_4" )))
    mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4

请注意此列与第一列的不同之处：

all(as.character(mtcars4$cyl)!=mtcars3$cyl) 
    #[1] TRUE 
    #Note: TRUE here indicates that all values are unequal because I used != instead of ==
    #as.character() was required as the levels were numeric and thus not comparable to a character vector

更多细节：

如果我们使用 mtcars2 而不是 mtcars3 更改 cyl 的级别，我们需要以不同方式指定标签以获得相同的结果 . mtcars2 的标签顺序为：c（4,6,8），因此我们指定新标签如下

#change labels of mtcars2 (order used to be: c(4, 6, 8)
    mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4", 
                                                               "new_value_for_6", 
                                                               "new_value_for_8" )))

与 mtcars3$cyl 和 mtcars4$cyl 不同， mtcars4$cyl 和 mtcars5$cyl 的标签因此是相同的，即使它们的级别具有不同的顺序 .

mtcars4$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_8 new_value_for_6 new_value_for_4

    mtcars5$cyl[1:5]
    #[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
    #Levels: new_value_for_4 new_value_for_6 new_value_for_8

    all(mtcars4$cyl==mtcars5$cyl)
    #[1] TRUE

    levels(mtcars4$cyl) == levels(mtcars5$cyl)
    #1] FALSE  TRUE FALSE

回复于 2024-04-28T21:44:12+08:00

用dplyr mutate改变因子水平

6 回答

相关问题