首页 文章

如何删除sql server中的重复行?

提问于
浏览
283

我怎么能 delete duplicate rows 哪里没有 unique row id

我的 table 是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我希望在重复删除后留下以下内容:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我已经尝试了一些查询,但我认为它们依赖于行ID,因为我没有得到理想的结果 . 例如:

DELETE FROM table WHERE col1 IN (
    SELECT id FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

15 回答

  • 1

    我喜欢CTE和 ROW_NUMBER ,因为这两个组合允许我们查看哪些行被删除(或更新),因此只需将 DELETE FROM CTE... 更改为 SELECT * FROM CTE

    WITH CTE AS(
       SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
           RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
       FROM dbo.Table1
    )
    DELETE FROM CTE WHERE RN > 1
    

    演示(结果不同;我认为这是由于你的错字)

    COL1    COL2    COL3    COL4    COL5    COL6    COL7
    john    1        1       1       1       1       1
    sally   2        2       2       2       2       2
    

    由于 PARTITION BY col1 ,此示例通过单个列 col1 确定重复项 . 如果要包含多个列,只需将它们添加到 PARTITION BY

    ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
    
  • 5

    我更喜欢CTE从sql server表中删除重复的行

    强烈建议遵循这篇文章:: http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

    保持原创

    WITH CTE AS
    (
    SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
    FROM MyTable
    )
    
    DELETE FROM CTE WHERE RN<>1
    

    不保留原创

    WITH CTE AS
    (SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
    FROM MyTable)
     
    DELETE CTE
    WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
    
  • -2

    在不使用 CTEROW_NUMBER() 的情况下,你可以只使用group by MAX 函数删除记录这里是和示例

    DELETE
    FROM MyDuplicateTable
    WHERE ID NOT IN
    (
    SELECT MAX(ID)
    FROM MyDuplicateTable
    GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
    
  • 13
    DELETE from search
    where id not in (
       select min(id) from search
       group by url
       having count(*)=1
    
       union
    
       SELECT min(id) FROM search
       group by url
       having count(*) > 1
    )
    
  • 1

    微软有一个关于如何删除重复项的完整指南 . 看看http://support.microsoft.com/kb/139444

    简而言之,当您只删除几行时,这是删除重复项的最简单方法:

    SET rowcount 1;
    DELETE FROM t1 WHERE myprimarykey=1;
    

    myprimarykey是行的标识符 .

    我将rowcount设置为1,因为我只有两行是重复的 . 如果我有3行重复,那么我将rowcount设置为2,以便它删除它看到的前两个,只留下表t1中的一个 .

    希望它能帮助任何人

  • 1

    如果没有引用,例如外键,则可以执行此操作 . 在测试概念证明并且测试数据重复时,我做了很多 .

    SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

    INTO [newTable]

    ;

    进入对象资源管理器并删除旧表 .

    使用旧表的名称重命名新表 .

  • 604

    请看下面的删除方式 .

    Declare @table table
    (col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
    Insert into @table values 
    ('john',1,1,1,1,1,1),
    ('john',1,1,1,1,1,1),
    ('sally',2,2,2,2,2,2),
    ('sally',2,2,2,2,2,2)
    

    创建了一个名为 @table 的示例表,并使用给定的数据加载它 .

    enter image description here

    Delete  aliasName from (
    Select  *,
            ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
    From    @table) aliasName 
    Where   rowNumber > 1
    
    Select * from @table
    

    enter image description here

    注意:如果要给出 Partition by 部分中的所有列,则 order by 没有太大意义 .

    我知道,这个问题是在三年前提出来的,我的回答是蒂姆发布的另一个版本,但是发布只是对任何人都有帮助 .

  • 30
    -- this query will keep only one instance of a duplicate record.
    ;WITH cte
         AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
                                           ORDER BY ( SELECT 0)) RN
             FROM   Mytable)
    
    
    
    delete  FROM cte
    WHERE  RN > 1
    
  • 3
    with myCTE
    as
    
    (
    select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
    )
    Delete from myCTE where Duplicate>1
    
  • 2

    With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server

    删除重复的想法涉及到

    • a)保护那些不重复的行

    • b)保留合并为重复的许多行中的一行 .

    一步步

    • 1)首先确定那些满足重复定义的行,并将它们插入到临时表中,比如#tableAll .

    • 2)在临时表中选择非重复(单行)或不同的行,如#tableUnique .

    • 3)从加入#tableAll的源表中删除以删除重复项 .

    • 4)从#tableUnique中插入源表中的所有行 .

    • 5)删除#tableAll和#tableUnique

  • 3

    如果您能够临时向表中添加列,这是一个对我有用的解决方案:

    ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)
    

    然后使用MIN和GROUP BY的组合执行DELETE

    DELETE b
    FROM dbo.DUPPEDTABLE b
    WHERE b.RowID NOT IN (
                         SELECT MIN(RowID) AS RowID
                         FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
                         GROUP BY a.ITEM_NUMBER,
                                  a.CHARACTERISTIC,
                                  a.INTVALUE,
                                  a.FLOATVALUE,
                                  a.STRINGVALUE
                     );
    

    验证DELETE是否正确执行:

    SELECT a.ITEM_NUMBER,
        a.CHARACTERISTIC,
        a.INTVALUE,
        a.FLOATVALUE,
        a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
    FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
    GROUP BY a.ITEM_NUMBER,
        a.CHARACTERISTIC,
        a.INTVALUE,
        a.FLOATVALUE,
        a.STRINGVALUE
    ORDER BY COUNT(*) DESC
    

    结果应该没有计数大于1的行 . 最后,删除rowid列:

    ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;
    
  • 101

    尝试上面建议的解决方案后,适用于小型媒体表 . 我可以为非常大的表建议解决方案 . 因为它在迭代中运行 .

    • 删除 LargeSourceTable 的所有依赖关系视图

    • 你可以通过使用sql managment studio找到dependecies,右键单击表并单击"View Dependencies"

    • 重命名表格:

    • sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO

    • 再次创建 LargeSourceTable ,但现在,添加一个主键,其中包含定义重复项的所有列添加 WITH (IGNORE_DUP_KEY = ON)

    • 例如:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

    • 再次创建您为新创建的表首先放置的视图

    • 现在,运行以下sql脚本,您将看到每页1,000,000行的结果,您可以更改每页的行数以更频繁地查看结果 .

    • 注意,我设置 IDENTITY_INSERT 开启和关闭,因为其中一列包含自动增量ID,我也在复制

    SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

    While ((@PageNumber - 1) * @RowspPage < @TotalRows )
    Begin
        begin transaction tran_inner
            ; with cte as
            (
                SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
                OFFSET ((@PageNumber) * @RowspPage) ROWS
                FETCH NEXT @RowspPage ROWS ONLY
            )
    
            INSERT INTO LargeSourceTable 
            (
                 ID                     
                ,[CreateDate]       
                ,[Column1]   
                ,[Column2] 
                ,[Column3]       
            )       
            select 
                 ID                     
                ,[CreateDate]       
                ,[Column1]   
                ,[Column2] 
                ,[Column3]       
            from cte
    
        commit transaction tran_inner
    
        PRINT 'Page: ' + convert(varchar(10), @PageNumber)
        PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
        PRINT 'Of: ' + convert(varchar(20), @TotalRows)
    
        SELECT @dt = convert(varchar(19), getdate(), 121)
        RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
        SET @PageNumber = @PageNumber + 1
    End
    

    SET IDENTITY_INSERT LargeSourceTable OFF

  • -2

    另一种删除方式在一步中没有丢失信息的共同行如下:

    delete from dublicated_table t1 (nolock)
    join (
        select t2.dublicated_field
        , min(len(t2.field_kept)) as min_field_kept
        from dublicated_table t2 (nolock)
        group by t2.dublicated_field having COUNT(*)>1
    ) t3 
    on t1.dublicated_field=t3.dublicated_field 
        and len(t1.field_kept)=t3.min_field_kept
    
  • 0

    如果您可以找到重复行的数量,例如您有n个重复行,则使用此命令

    SET rowcount n-1
    DELETE FROM your_table
    WHERE (spacial condition)
    

    了解更多信息我建议this

  • 1

    哇,通过准备好所有这些答案我觉得很蠢,他们就像专家对所有CTE和临时表等的答案 .

    我所做的就是使用MAX汇总ID列 .

    DELETE FROM table WHERE col1 IN (
        SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
    )
    

    注意:您可能需要多次运行它才能删除重复项,因为这样一次只会删除一组重复的行 .

相关问题