首页 文章

在MySQL中查找重复值

提问于
浏览
648

我有一个包含varchar列的表,我想在此列中找到所有具有重复值的记录 . 我可以用什么来查找重复的最佳查询?

22 回答

  • 5

    尝试使用此查询:

    SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;
    
  • 3

    我没有看到任何JOIN aproaches,在重复方面有很多用途 .

    这个aproeach给你实际的双倍结果 .

    SELECT t1.* FROM table as t1 LEFT JOIN table as t2 ON t1.name=t2.name and t1.id!=t2.id WHERE t2.id IS NOT NULL ORDER BY t1.name
    
  • 7
    SELECT 
        t.*,
        (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
    FROM `city` AS t 
    WHERE 
        (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC
    
  • 3
    SELECT ColumnA, COUNT( * )
    FROM Table
    GROUP BY ColumnA
    HAVING COUNT( * ) > 1
    
  • 1

    我看到上面的结果,如果你需要检查重复的单列值,查询将正常工作 . 例如电子邮件 .

    但是,如果您需要检查更多列,并希望检查结果的组合,以便此查询将正常工作:

    SELECT COUNT(CONCAT(name,email)) AS tot,
           name,
           email
    FROM users
    GROUP BY CONCAT(name,email)
    HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
                  AND also COUNT)
    
  • 1286

    以下将找到多次使用的所有product_id . 您只能为每个product_id获取一条记录 .

    SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1
    

    代码取自:http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html

  • 0
    CREATE TABLE tbl_master
        (`id` int, `email` varchar(15));
    
    INSERT INTO tbl_master
        (`id`, `email`) VALUES
        (1, 'test1@gmail.com'),
        (2, 'test2@gmail.com'),
        (3, 'test1@gmail.com'),
        (4, 'test2@gmail.com'),
        (5, 'test5@gmail.com');
    
    QUERY : SELECT id, email FROM tbl_master
    WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)
    
  • 12

    一个非常晚的贡献......万一它可以帮助任何人下线...我有一个任务是在银行应用程序中找到匹配的交易对(实际上是账户到账户转账的两面),以确定哪些是每个账户间转账交易的“从”和“到”,所以我们最终得到了:

    SELECT 
        LEAST(primaryid, secondaryid) AS transactionid1,
        GREATEST(primaryid, secondaryid) AS transactionid2
    FROM (
        SELECT table1.transactionid AS primaryid, 
            table2.transactionid AS secondaryid
        FROM financial_transactions table1
        INNER JOIN financial_transactions table2 
        ON table1.accountid = table2.accountid
        AND table1.transactionid <> table2.transactionid 
        AND table1.transactiondate = table2.transactiondate
        AND table1.sourceref = table2.destinationref
        AND table1.amount = (0 - table2.amount)
    ) AS DuplicateResultsTable
    GROUP BY transactionid1
    ORDER BY transactionid1;
    

    结果是 DuplicateResultsTable 提供了包含匹配(即重复)事务的行,但它也在第二次匹配同一对时反向提供相同的事务ID,因此外部 SELECT 是按第一个事务ID分组的,通过使用 LEASTGREATEST 来确保两个transactionid在结果中始终处于相同的顺序,这使得第一个事件对于 GROUP 是安全的,从而消除了所有重复的匹配 . 超过近百万条记录,并在不到2秒的时间内确定了12,000场比赛 . 当然,transactionid是主要索引,这确实有帮助 .

  • 1

    假设您的表名为TableABC,您想要的列是Col,而T1的主键是Key .

    SELECT a.Key, b.Key, a.Col 
    FROM TableABC a, TableABC b
    WHERE a.Col = b.Col 
    AND a.Key <> b.Key
    

    这种方法优于上述答案的优势在于它给出了密钥 .

  • 1

    根据levik的答案来获取重复行的ID,如果你的服务器支持它,你可以做 GROUP_CONCAT (这将返回逗号分隔的id列表) .

    SELECT GROUP_CONCAT(id), name, COUNT(*) c FROM documents GROUP BY name HAVING c > 1;
    
  • 4
    SELECT t.*,(select count(*) from city as tt
      where tt.name=t.name) as count
      FROM `city` as t
      where (
         select count(*) from city as tt
         where tt.name=t.name
      ) > 1 order by count desc
    

    city 替换为您的表格 . 将 name 替换为您的字段名称

  • 9
    SELECT varchar_col
    FROM table
    GROUP BY varchar_col
    HAVING count(*) > 1;
    
  • 4

    我更喜欢使用窗口函数(MySQL 8.0)来查找重复项,因为我可以看到整行:

    WITH cte AS (
      SELECT *
        ,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
        ,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
      FROM table
    )
    SELECT *
    FROM cte
    WHERE num_of_duplicates_group > 1;
    

    DB Fiddle Demo

  • 0
    SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;
    
  • 3
    Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1
    
  • 3

    进一步使用@maxyfc's answer,我需要查找带有重复值返回的所有行,因此我可以在MySQL Workbench中编辑它们:

    SELECT * FROM table
       WHERE field IN (
         SELECT field FROM table GROUP BY field HAVING count(*) > 1
       ) ORDER BY field
    
  • 2

    使用 GROUP BY 子句执行 SELECT . 假设name是要查找重复项的列:

    SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
    

    这将返回第一列中名称值的结果,以及该值在第二列中出现的次数 .

  • 1

    要查找Employee中名称列中有多少记录重复,下面的查询是有帮助的;

    Select name from employee group by name having count(*)>1;
    
  • 144

    我的最终查询包含了一些有用的答案 - 结合group by,count和GROUP_CONCAT .

    SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
    FROM product_variant 
    GROUP BY `magento_simple` HAVING c > 1;
    

    这提供了两个示例(逗号分隔)的ID,我需要的条形码以及重复的数量 .

    相应地更改表格和列 .

  • 111
    SELECT * 
    FROM `dps` 
    WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)
    
  • 186

    要删除具有多个字段的重复行,首先将它们设置为为唯一不同的行指定的新唯一键,然后使用“group by”命令删除具有相同新唯一键的重复行:

    Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
    Create index x_tmp_cfs on tmp(cfs);
    Create table unduptable select f1,f2,... from tmp group by cfs;
    
  • 9
    SELECT  *
    FROM    mytable mto
    WHERE   EXISTS
            (
            SELECT  1
            FROM    mytable mti
            WHERE   mti.varchar_column = mto.varchar_column
            LIMIT 1, 1
            )
    

    此查询返回完整记录,而不仅仅是不同的 varchar_column .

    此查询不使用 COUNT(*) . 如果有很多重复项, COUNT(*) 是昂贵的,并且您不需要整个 COUNT(*) ,您只需要知道是否有两行具有相同的值 .

    当然,在 varchar_column 上有一个索引会大大加快这个查询的速度 .

相关问题