首页 文章

删除重复项,dplyr删除非重复行 .

提问于
浏览
0

我正在尝试向数据帧添加行,然后检查/删除在数据帧的单个列中具有重复值的行 . 如果提供了新的值,最终目标是在数据帧中基本上覆盖一行 . 我一直无法弄清楚如何使用dplyr动态指定rownames(或者完全使用R),所以我这样做 .

我从测试数据框开始,并使用dplyr删除列位置重复的第一组行,如下所示:

testData.df<-data_frame(Position=c("B1","B2","B3","B1","B2","B3"), rep=c("B1","B2","B3","B4","B5","B6"),name=c(rep("wibble",each=6)),status=c(rep("unknown", each=6)))
    testData.df <- testData.df %>%
      filter(duplicated(Position))
    testData.df
# A tibble: 3 x 4
  Position   rep   name  status
     <chr> <chr>  <chr>   <chr>
1       B1    B4 wibble unknown
2       B2    B5 wibble unknown
3       B3    B6 wibble unknown

这正如我所料 . 当我再次运行相同的过滤器时,我得到这个:

testData.df <- testData.df %>%
  filter(duplicated(Position))
testData.df
# A tibble: 0 x 4
# ... with 4 variables: Position <chr>, rep <chr>, name <chr>, status <chr>

为什么它会删除不重复的行?第一次运行它意味着它按预期工作,即它删除了实际的重复项 . 我无法解释第二轮的行为差异 .

1 回答

  • 0

    你期望 filter(duplicated(...)) 保持非重复的行,但它实际上正好相反 . 如果我们将 row_numbers 添加到每一行,您可以看到这一点

    testData.df<-data_frame(Position=c("B1","B2","B3","B1","B2","B3"), rep=c("B1","B2","B3","B4","B5","B6"),name=c(rep("wibble",each=6)),status=c(rep("unknown", each=6))) %>%
                   mutate(rn = row_number())
    testData.df <- testData.df %>%
                      filter(duplicated(Position))
    

    产量

    # A tibble: 3 x 5
      Position   rep   name  status    rn
         <chr> <chr>  <chr>   <chr> <int>
    1       B1    B4 wibble unknown     4
    2       B2    B5 wibble unknown     5
    3       B3    B6 wibble unknown     6
    

    你应该使用 filter(!duplicated(...))


    编辑

    请尝试这样做,以便第一次保留重复的行,但不要在第二次丢失它

    testData.df<-data_frame(Position=c("B1","B2","B3","B1","B2","B3"), rep=c("B1","B2","B3","B4","B5","B6"),name=c(rep("wibble",each=6)),status=c(rep("unknown", each=6))) %>%
                    mutate(rn = row_number())
    
    run1 <- testData.df %>%
            group_by(Position) %>%
            slice(n()) %>%
                ungroup()
    
    run2 <- run1 %>%
            group_by(Position) %>%
            slice(n()) %>%
                ungroup()
    
    # A tibble: 3 x 5
    # Groups:   Position [3]
      Position   rep   name  status    rn
         <chr> <chr>  <chr>   <chr> <int>
    1       B1    B4 wibble unknown     4
    2       B2    B5 wibble unknown     5
    3       B3    B6 wibble unknown     6
    

相关问题