首页 文章

Pgsql删除一些列(不是全部)重复的行

提问于
浏览
1

表 - col_pk,col1,col2,col3,col4,col_date_updated

此表包含一些行,其中col2和col3的列值重复 . 我想保留col_date_updated的那些行是最新的(最大) .

例如:

col_pk, col1, col2,  col3, col4,   col_date_updated
1,      A,    hello, now,  200.00, 2017-12-12 15:09:44.437546
2,      B,    hello, now,  490.00, 2017-12-12 15:09:42.437065
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617
5,      E,    hi,    now,  250.00, 2017-12-12 10:09:41.436617

预期结果:

col_pk, col1, col2,  col3, col4,   col_date_updated
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617

5 回答

  • 0

    检查一下 .

    SELECT DISTINCT ON (col2, col3) t.*
    FROM table t
    ORDER BY col_date_updated DESC
    

    在col2和col3上应用distinct,因为你希望它们是唯一的,并通过desc保持最新的顺序

  • 0

    如果您只想选择获得预期的输出,那么 ROW_NUMBER 会派上用场:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3
            ORDER BY col_date_updated DESC) rn
        FROM yourTable
    )
    SELECT col_pk, col1, col2,  col3, col4, col_date_updated
    FROM cte
    WHERE rn = 1;
    

    如果您想要删除其他记录,那么我们也可以重用CTE:

    DELETE FROM yourTable WHERE col_pk IN (SELECT col_pk FROM cte WHERE rn > 1);
    
  • -1

    你可以尝试这样的事情 .

    SELECT t.*
      FROM yourtable t
     WHERE col_date_updated IN (SELECT MAX (col_date_updated)
                        FROM yourtable i
                       WHERE t.col2 = i.col2 AND t.col3 = i.col3);
    

    因此,如果您想删除其他记录,您可以使用它 .

    DELETE 
      FROM yourtable t
     WHERE col_date_updated NOT IN (SELECT MAX (col_date_updated)
                        FROM yourtable i
                       WHERE t.col2 = i.col2 AND t.col3 = i.col3);
    

    DEMO

  • 1

    如果要为任何{col2,col3}抑制除最新行之外的所有行:


    SELECT *
    FROM thetable zt
    WHERE NOT EXISTS (
            -- If a record exists with the same col2,col3,
            -- but a more recent date than zt.col_date_updated
            -- then zt.* cannot be the most recent one
            SELECT *
            FROM thetable nx
            WHERE nx.col2 = zt.col2 -- same value
            AND nx.col3 = zt.col3   -- same value
            AND nx.col_date_updated > zt.col_date_updated -- more recent
            );
    

    如果要物理删除同一{col2,col3}的除最近行之外的所有行:


    DELETE
    FROM thetable zt
    WHERE EXISTS (
            -- If a record exists with the same col2,col3,
            -- but a more recent date than zt.t.col_date_updated
            -- then zt.* cannot be the most recent one
            -- and we can delete zt.
            SELECT *
            FROM thetable nx
            WHERE nx.col2 = zt.col2 -- same value
            AND nx.col3 = zt.col3   -- same value
            AND nx.col_date_updated > zt.col_date_updated -- more recent
            );
    
  • 0

    这是最快的方法:

    SELECT * FROM tablename WHERE col_pk IN 
           (SELECT col_pk FROM 
                   (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
            FROM tablename) t
     WHERE t.rnum > 1);
    

    如果你想删除:

    DELETE FROM tablename WHERE col_pk IN 
           (SELECT col_pk FROM 
                   (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
            FROM tablename DESC) t
     WHERE t.rnum > 1);
    

相关问题