首页 文章

在MySQL中查找重复记录

提问于
浏览
596

我想在MySQL数据库中提取重复记录 . 这可以通过以下方式完成:

SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1

结果如下:

100 MAIN ST    2

我想拉它,以便显示每一行是重复的 . 就像是:

JIM    JONES    100 MAIN ST
JOHN   SMITH    100 MAIN ST

有关如何做到这一点的任何想法?我试图避免做第一个,然后在代码中用第二个查询查找重复项 .

22 回答

  • 317

    Powerlord answer确实是最好的,我建议再做一次更改:使用LIMIT确保db不会超载:

    SELECT firstname, lastname, list.address FROM list
    INNER JOIN (SELECT address FROM list
    GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
    LIMIT 10
    

    如果没有WHERE和进行连接时使用LIMIT是一个好习惯 . 从小值开始,检查查询的重量,然后增加限制 .

  • 8
    SELECT firstname, lastname, address FROM list
     WHERE 
     Address in 
     (SELECT address FROM list
     GROUP BY address
     HAVING count(*) > 1)
    
  • -1
    SELECT date FROM logs group by date having count(*) >= 2
    
  • 15

    我尝试了为这个问题选择的最佳答案,但它让我感到困惑 . 我实际上只需要在我的 table 上的一个字段上 . 来自this link的以下示例对我来说非常好:

    SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1;
    
  • 10

    最快的重复删除查询过程:

    /* create temp table with one primary column id */
    INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
    DELETE FROM list WHERE id IN (SELECT id FROM temp);
    DELETE FROM temp;
    
  • 4

    这将在一个表传递中选择重复,没有子查询 .

    SELECT  *
    FROM    (
            SELECT  ao.*, (@r := @r + 1) AS rn
            FROM    (
                    SELECT  @_address := 'N'
                    ) vars,
                    (
                    SELECT  *
                    FROM
                            list a
                    ORDER BY
                            address, id
                    ) ao
            WHERE   CASE WHEN @_address <> address THEN @r := 0 ELSE 0 END IS NOT NULL
                    AND (@_address := address ) IS NOT NULL
            ) aoo
    WHERE   rn > 1
    

    此查询可以模拟 OracleSQL Server 中的 ROW_NUMBER()

    有关详细信息,请参阅我博客中的文章:

  • 0

    关键是重写此查询,以便它可以用作子查询 .

    SELECT firstname, 
       lastname, 
       list.address 
    FROM list
       INNER JOIN (SELECT address
                   FROM   list
                   GROUP  BY address
                   HAVING COUNT(id) > 1) dup
               ON list.address = dup.address;
    
  • 11

    使用此查询按 email address 查找重复的用户...

    SELECT users.name, users.uid, users.mail, from_unixtime(created)
    FROM users
    INNER JOIN (
      SELECT mail
      FROM users
      GROUP BY mail
      HAVING count(mail) > 1
    ) dupes ON users.mail = dupes.mail
    ORDER BY users.mail;
    
  • 4

    这也将显示有多少重复项,并将在没有连接的情况下对结果进行排序

    SELECT  `Language` , id, COUNT( id ) AS how_many
    FROM  `languages` 
    GROUP BY  `Language` 
    HAVING how_many >=2
    ORDER BY how_many DESC
    
  • 3

    select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address

    内部子查询返回具有重复地址的行,然后外部子查询返回具有重复项的地址的地址列 . 外部子查询必须只返回一列,因为它用作运算符'= any'的操作数

  • 4

    查找重复地址要比看起来复杂得多,特别是如果您需要准确性 . 在这种情况下,MySQL查询是不够的......

    我在SmartyStreets工作,我们在那里处理验证和重复数据删除以及其他问题,我看到了很多类似问题的各种挑战 .

    有几个第三方服务会在列表中标记重复项 . 仅使用MySQL子查询执行此操作不会考虑地址格式和标准的差异 . USPS(针对美国地址)具有制定这些标准的某些指导原则,但只有少数供应商获得认证才能执行此类操作 .

    因此,我建议您最好的答案是将表格导出为CSV文件,然后将其提交给有能力的列表处理器 . 其中一个就是LiveAddress,可以在几秒到几分钟内自动完成 . 它将使用名为"Duplicate"的新字段标记重复行,并在其中标记 Y .

  • 50

    为什么不只是INNER加入表自己?

    SELECT a.firstname, a.lastname, a.address
    FROM list a
    INNER JOIN list b ON a.address = b.address
    WHERE a.id <> b.id
    

    如果地址可以存在两次以上,则需要DISTINCT .

  • 7

    个人这个查询已经解决了我的问题:

    SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;
    

    此脚本的作用是显示表中不止一次存在的所有订户ID以及找到的重复项数 .

    这是表格列:

    | SUB_SUBSCR_ID | int(11)     | NO   | PRI | NULL    | auto_increment |
    | MSI_ALIAS     | varchar(64) | YES  | UNI | NULL    |                |
    | SUB_ID        | int(11)     | NO   | MUL | NULL    |                |    
    | SRV_KW_ID     | int(11)     | NO   | MUL | NULL    |                |
    

    希望它对你有帮助!

  • 40

    这不容易:

    SELECT *
    FROM tc_tariff_groups
    GROUP BY group_id
    HAVING COUNT(group_id) >1
    

  • 0

    不会非常有效,但它应该工作:

    SELECT *
    FROM list AS outer
    WHERE (SELECT COUNT(*)
            FROM list AS inner
            WHERE inner.address = outer.address) > 1;
    
  • 35
    select `cityname` from `codcities` group by `cityname` having count(*)>=2
    

    这是您要求的类似查询,其200%的工作也很容易 . 请享用!!!

  • 12
    SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc
    

    city 替换为您的表格 . 将 name 替换为您的字段名称

  • 18

    我们可以发现重复项也取决于多个字段 . 对于这些情况,您可以使用以下格式 .

    SELECT COUNT(*), column1, column2 
    FROM tablename
    GROUP BY column1, column2
    HAVING COUNT(*)>1;
    
  • 4
    SELECT *
        FROM (SELECT  address, COUNT(id) AS cnt
        FROM list
        GROUP BY address
        HAVING ( COUNT(id) > 1 ))
    
  • 2

    另一种解决方案是使用表别名,如下所示:

    SELECT p1.id, p2.id, p1.address
    FROM list AS p1, list AS p2
    WHERE p1.address = p2.address
    AND p1.id != p2.id
    

    在这种情况下你真正做的就是获取原始列表,创建两个 p 1和 p 2 - 然后在地址列(第3行)上执行连接 . 第4行确保同一记录不会在您的结果集中多次显示("duplicate duplicates") .

  • 190
    select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name
    

    对于你的 table ,它会是这样的

    select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address
    

    此查询将为您提供列表中的所有不同地址条目...如果您有任何名称的主键值等,我不知道这将如何工作 .

  • 631
    Find duplicate Records:
    
        Suppose we have table : Student 
        student_id int
        student_name varchar
        Records:
        +------------+---------------------+
        | student_id | student_name        |
        +------------+---------------------+
        |        101 | usman               |
        |        101 | usman               |
        |        101 | usman               |
        |        102 | usmanyaqoob         |
        |        103 | muhammadusmanyaqoob |
        |        103 | muhammadusmanyaqoob |
        +------------+---------------------+
    
        Now we want to see duplicate records
        Use this query:
    
    
       select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
    
    +--------------------+------------+---+
    | student_name        | student_id | c |
    +---------------------+------------+---+
    | usman               |        101 | 3 |
    | muhammadusmanyaqoob |        103 | 2 |
    +---------------------+------------+---+
    

相关问题