首页 文章

R:如何删除循环中子集内的数据

提问于
浏览
0
df <- data.frame(id = c(1, 2, 3, 3, 3, 4), gender = c("Male", "Female", "Both", "Male", "Female", "Female"))
ids <- unique(df$id)

> df
  id gender
1  1   Male
2  2 Female
3  3   Both
4  3   Male
5  3 Female
6  4 Female

对于每个唯一的 id ,我想确保如果相应的 genderBothMaleFemale ,那么我需要删除对应于 Both 的行 . 换句话说,我想要的输出是:

> df
  id gender
1  1   Male
2  2 Female
3  3   Male
4  3 Female
5  4 Female

我试过写一个循环:

  • 子集 df by id 并将每个子集存储到名为 sub 的列表中

每个 sub

  • ,检查性别是否包含"Both","Male"和“女性”

  • 如果是,请删除性别=“两者”的行

  • 重新组合data.frame

但是,下面的代码并没有真正起作用,并且非常笨重......在 dplyr 中使用 group_by 是一种更简单的方法吗?

sub <- list()
for(i in 1:length(ids)){
  sub[[i]] <- subset(df, id %in% ids[i])
  if(all(grepl(sub[[i]]$gender, c("Both", "Male", "Female")))){
    sub[[i]] <- sub[[i]][-which(sub[[i]]$gender == "Both"), ]
  }else sub[[i]] = sub[[i]]
}

2 回答

  • 0

    使用 dplyr

    df %>% 
        group_by(id) %>% 
        mutate(A = ifelse(length(unique(gender)) >= 3 & gender == 'Both', F, T)) %>% 
        filter(A) %>% 
        select(-A)
    # A tibble: 5 x 2
    # Groups:   id [4]
         id gender
      <dbl>  <chr>
    1     1   Male
    2     2 Female
    3     3   Male
    4     3 Female
    5     4 Female
    
  • 2

    除了tidyverse解决方案,这里有一个使用 lapply 的解决方案:

    result <- lapply(ids,function(x){
        tmp <- df[df$id == x,]
        if(all(c("Both","Male", "Female") %in% tmp$gender)){
            tmp <- tmp[tmp$gender != "Both",]
        }
        return(tmp)
    })
    do.call("rbind",result)
    #   id gender
    # 1  1   Male
    # 2  2 Female
    # 4  3   Male
    # 5  3 Female
    # 6  4 Female
    

相关问题