R dplyr，使用带有na.omit的mutate导致错误不兼容的大小（％d）-Java 学习之路

我正在做数据清理 . 我在Dplyr中使用mutate很多，因为它逐步生成新的列，我可以很容易地看到它是如何进行的 .

以下是我遇到此错误的两个示例

Error: incompatible size (%d), expecting %d (the group size) or 1

示例1：从邮政编码获取城镇名称 . 数据就像这样：

Zip
1 02345
2 02201

我注意到当数据中包含NA时，它不起作用 .

没有NA它有效：

library(dplyr)
library(zipcode)
data(zipcode)

test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE)

test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )

导致

Source: local data frame [2 x 2]
Groups: <by row>

    Zip   Town1
1 02345 Manomet
2 02201  Boston

使用NA它不起作用：

library(dplyr)
library(zipcode)
data(zipcode)

test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE)

test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )

导致

Error: incompatible size (%d), expecting %d (the group size) or 1

例2 . 我想摆脱以下数据中Town列中出现的冗余状态名称 .

Town State
1   BOSTON MA    MA
2 NORTH AMAMS    MA
3  CHICAGO IL    IL

我就是这样做的：（1）将Town中的字符串分成单词，例如第1行的'BOSTON'和'MA' . （2）看看这些单词中是否有任何一个符合该行的状态（3）删除匹配的单词

library(dplyr)
test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE)

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-is.state])

这导致：

Town State Town.word is.state   Town1
1   BOSTON MA    MA  <chr[2]>        2  BOSTON
2 NORTH AMAMS    MA  <chr[2]>       NA      NA
3  CHICAGO IL    IL  <chr[2]>        2 CHICAGO

含义：例如，第1行显示is.state == 2，表示Town中的第二个单词是州名 . 摆脱那项工作后，Town1是正确的城镇名称 .

现在我想在第2行修复NA，但添加na.omit会导致错误：

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-na.omit(is.state)])

结果是：

Error: incompatible size (%d), expecting %d (the group size) or 1

我检查了数据类型和大小：

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(length(is.state) ) %>%       
  mutate(class(na.omit(is.state)))

结果是：

Town State Town.word is.state length(is.state) class(na.omit(is.state))
1   BOSTON MA    MA  <chr[2]>        2                1                  integer
2 NORTH AMAMS    MA  <chr[2]>       NA                1                  integer
3  CHICAGO IL    IL  <chr[2]>        2                1                  integer

所以它的长度为％d = 1 . 有人哪里错了？谢谢

1 回答

3
你能把它 sub 出来吗？
```
test %>%
    rowwise() %>%
    mutate(Town=sub(sprintf('[, ]*%s$', State), '', Town))
## Source: local data frame [3 x 2]
## Groups: <by row>
##
##          Town State
## 1      BOSTON    MA
## 2 NORTH AMAMS    MA
## 3     CHICAGO    IL
```
（如果发生这种情况，这种方式也会在城镇之后捕获逗号 . ）

注意：如果你在 rowwise_df 这里使用了 rowwise_df （就是这样），它也会擦除 tbl_df 类并输出一个直接的data.frame，这对你的数据来说很好，但是如果你没有做到无数就会破坏你的屏幕 . 次） . （Github参考#936和#553 . ）
回复于 2024-05-17T10:30:29+08:00

R dplyr，使用带有na.omit的mutate导致错误不兼容的大小（％d）

1 回答

相关问题