首页 文章

在SQL表中查找重复值

提问于
浏览
1436

使用一个字段很容易找到 duplicates

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

所以,如果我们有一张 table

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

这个查询会给我们John,Sam,Tom,Tom,因为他们都有相同的 email .

但是,我想要的是使用相同的 emailname 获取重复项 .

也就是说,我想得到“汤姆”,“汤姆” .

我需要这个的原因:我犯了一个错误,并允许插入重复的 nameemail 值 . 现在我需要 remove/change 重复,所以我需要先找到它们 .

25 回答

  • 8
    SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
    u.email=u1.email);
    
  • 0

    派对有点晚了,但我找到了一个很酷的解决方法来找到所有重复的ID:

    SELECT GROUP_CONCAT( id )
    FROM users
    GROUP BY email
    HAVING ( COUNT(email) > 1 )
    
  • 24

    如果你想看看你的表中是否有任何重复的行,我使用下面的查询:

    create table my_table(id int, name varchar(100), email varchar(100));
    
    insert into my_table values (1, 'shekh', 'shekh@rms.com');
    insert into my_table values (1, 'shekh', 'shekh@rms.com');
    insert into my_table values (2, 'Aman', 'aman@rms.com');
    insert into my_table values (3, 'Tom', 'tom@rms.com');
    insert into my_table values (4, 'Raj', 'raj@rms.com');
    
    
    Select COUNT(1) As Total_Rows from my_table 
    Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
    
  • 17
    SELECT
        name, email, COUNT(*)
    FROM
        users
    GROUP BY
        name, email
    HAVING 
        COUNT(*) > 1
    

    只需在两个列上分组 .

    注意:较旧的ANSI标准是在GROUP BY中包含所有非聚合列,但这已经改变了"functional dependency"的想法:

    在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束 . 换句话说,功能依赖是描述关系中属性之间关系的约束 .

    支持不一致:

  • 46

    如果您使用Oracle,这种方式更可取:

    create table my_users(id number, name varchar2(100), email varchar2(100));
    
    insert into my_users values (1, 'John', 'asd@asd.com');
    insert into my_users values (2, 'Sam', 'asd@asd.com');
    insert into my_users values (3, 'Tom', 'asd@asd.com');
    insert into my_users values (4, 'Bob', 'bob@asd.com');
    insert into my_users values (5, 'Tom', 'asd@asd.com');
    
    commit;
    
    select *
      from my_users
     where rowid not in (select min(rowid) from my_users group by name, email);
    
  • 6
    SELECT name, email 
        FROM users
        WHERE email in
        (SELECT email FROM users
        GROUP BY email 
        HAVING COUNT(*)>1)
    
  • 8

    我们可以使用这里有关于聚合函数的工作,如下所示

    create table #TableB (id_account int, data int, [date] date)
    insert into #TableB values (1 ,-50, '10/20/2018'),
    (1, 20, '10/09/2018'),
    (2 ,-900, '10/01/2018'),
    (1 ,20, '09/25/2018'),
    (1 ,-100, '08/01/2018')  
    
    SELECT id_account , data, COUNT(*)
    FROM #TableB
    GROUP BY id_account , data
    HAVING COUNT(id_account) > 1
    
    drop table #TableB
    

    这里有两个字段id_account和数据与Count(*)一起使用 . 因此,它将提供在两列中具有多于一倍相同值的所有记录 .

    我们有些错误地错过了在SQL服务器表中添加任何约束,并且已经在前端应用程序的所有列中插入了重复的记录 . 然后我们可以使用下面的查询从表中删除重复的查询 .

    SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
    TRUNCATE TABLE #OriginalTable
    INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
    DROP TABLE #TemNewTable
    

    这里我们已经获取了原始表的所有不同记录并删除了原始表的记录 . 我们再次将新表中的所有不同值插入到原始表中,然后删除新表 .

  • -2

    如果要查找重复数据(通过一个或多个标准)并选择实际行 .

    with MYCTE as (
        SELECT DuplicateKey1
            ,DuplicateKey2 --optional
            ,count(*) X
        FROM MyTable
        group by DuplicateKey1, DuplicateKey2
        having count(*) > 1
    ) 
    SELECT E.*
    FROM MyTable E
    JOIN MYCTE cte
    ON E.DuplicateKey1=cte.DuplicateKey1
        AND E.DuplicateKey2=cte.DuplicateKey2
    ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
    

    http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

  • 15
    SELECT
      FirstName, LastName, MobileNo, COUNT(1) as CNT 
    FROM        
      CUSTOMER
    GROUP BY
      FirstName, LastName, MobileNo 
    HAVING
      COUNT(1) > 1;
    
  • 14

    How to get duplicate record in table

    SELECT COUNT(代码),代码FROM Employees WHERE状态= 1 GROUP BY代码具有COUNT(代码)> 1

  • 2341
    select emp.ename, emp.empno, dept.loc 
              from emp
     inner join dept 
              on dept.deptno=emp.deptno
     inner join
        (select ename, count(*) from
        emp
        group by ename, deptno
        having count(*) > 1)
     t on emp.ename=t.ename order by emp.ename
    /
    
  • 6

    试试这个:

    SELECT name, email
    FROM users
    GROUP BY name, email
    HAVING ( COUNT(*) > 1 )
    
  • 5

    这也应该有用,也许试一试 .

    Select * from Users a
                where EXISTS (Select * from Users b 
                    where (     a.name = b.name 
                            OR  a.email = b.email)
                         and a.ID != b.id)
    

    特别适用于您的情况如果您搜索具有某种前缀或一般更改的重复项,例如邮件中的新域名 . 那么你可以在这些列上使用replace()

  • 14

    试试这个:

    declare @YourTable table (id int, name varchar(10), email varchar(50))
    
    INSERT @YourTable VALUES (1,'John','John-email')
    INSERT @YourTable VALUES (2,'John','John-email')
    INSERT @YourTable VALUES (3,'fred','John-email')
    INSERT @YourTable VALUES (4,'fred','fred-email')
    INSERT @YourTable VALUES (5,'sam','sam-email')
    INSERT @YourTable VALUES (6,'sam','sam-email')
    
    SELECT
        name,email, COUNT(*) AS CountOf
        FROM @YourTable
        GROUP BY name,email
        HAVING COUNT(*)>1
    

    OUTPUT:

    name       email       CountOf
    ---------- ----------- -----------
    John       John-email  2
    sam        sam-email   2
    
    (2 row(s) affected)
    

    如果你想要复制的ID使用这个:

    SELECT
        y.id,y.name,y.email
        FROM @YourTable y
            INNER JOIN (SELECT
                            name,email, COUNT(*) AS CountOf
                            FROM @YourTable
                            GROUP BY name,email
                            HAVING COUNT(*)>1
                        ) dt ON y.name=dt.name AND y.email=dt.email
    

    OUTPUT:

    id          name       email
    ----------- ---------- ------------
    1           John       John-email
    2           John       John-email
    5           sam        sam-email
    6           sam        sam-email
    
    (4 row(s) affected)
    

    要删除重复项,请尝试:

    DELETE d
        FROM @YourTable d
            INNER JOIN (SELECT
                            y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                            FROM @YourTable y
                                INNER JOIN (SELECT
                                                name,email, COUNT(*) AS CountOf
                                                FROM @YourTable
                                                GROUP BY name,email
                                                HAVING COUNT(*)>1
                                            ) dt ON y.name=dt.name AND y.email=dt.email
                       ) dt2 ON d.id=dt2.id
            WHERE dt2.RowRank!=1
    SELECT * FROM @YourTable
    

    OUTPUT:

    id          name       email
    ----------- ---------- --------------
    1           John       John-email
    3           fred       John-email
    4           fred       fred-email
    5           sam        sam-email
    
    (4 row(s) affected)
    
  • 6

    这将选择/删除除每组重复项中的一条记录之外的所有重复记录 . 因此,删除会从每组重复项中留下所有唯一记录一条记录 .

    选择重复:

    SELECT *
    FROM table
    WHERE
        id NOT IN (
            SELECT MIN(id)
            FROM table
            GROUP BY column1, column2
    );
    

    删除重复项:

    DELETE FROM table
    WHERE
        id NOT IN (
            SELECT MIN(id)
            FROM table
            GROUP BY column1, column2
    );
    

    注意大量记录,可能会导致性能问题 .

  • 4

    如果你想删除重复项,这里有一个更简单的方法,而不是在三重子选择中找到偶数/奇数行:

    SELECT id, name, email 
    FROM users u, users u2
    WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
    

    所以要删除:

    DELETE FROM users
    WHERE id IN (
        SELECT id/*, name, email*/
        FROM users u, users u2
        WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
    )
    

    更容易阅读和理解恕我直言

    Note: 唯一的问题是您必须执行请求,直到没有删除任何行,因为每次只删除每个副本中的一个

  • 287

    By Using CTE also we can find duplicate value like this

    with MyCTE
    as
    (
    select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
    
    )
    select * from MyCTE where Duplicate>1
    
  • 31

    我们如何计算重复的值?要么重复2次,要么重复2次 . 只计算它们,而不是按组计算 .

    就像

    select COUNT(distinct col_01) from Table_01
    
  • 5

    这是我想出的容易的事情 . 它使用公用表表达式(CTE)和分区窗口(我认为这些功能在SQL 2008及更高版本中) .

    此示例查找名称和dob重复的所有学生 . 要检查重复的字段位于OVER子句中 . 您可以在投影中包含所需的任何其他字段 .

    with cte (StudentId, Fname, LName, DOB, RowCnt)
    as (
    SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
    FROM tblStudent
    )
    SELECT * from CTE where RowCnt > 1
    ORDER BY DOB, LName
    
  • 6

    请尝试以下方法:

    SELECT * FROM
    (
        SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
            AS Rank 
            FROM Customers
    ) AS B WHERE Rank>1
    
  • 4

    SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

    我认为这可以正常搜索特定列中的重复值 .

  • 1

    试试这段代码

    WITH CTE AS
    
    ( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
    FROM ccnmaster )
    select * from CTE
    
  • 95

    SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;

  • 5
    select name, email
    , case 
    when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
    else 'No'
    end "duplicated ?"
    from users
    
  • -1
    select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1
    

相关问题