首页 文章

PostgreSQL将数据从递归CTE传递到函数

提问于
浏览
3

我有以下问题:我试图发现从源节点( node_s )到目标节点( node_t )的所有可能路径 .

具有图形边缘的原始表格的格式很简单: | node_x | node_y | strength | ,其中"node_x" - > "node_y"是边缘强度为"weight"的直接边缘 .

我们的想法是,如果在路径探索过程中的任何时刻我们发现其子节点中的节点已经定位 node_t ,我们会记录此路径并停止从该节点探索路径,否则继续探索 .

简单的解决方案是使用PostgreSQL的递归CTE来构造图的传递闭包:

WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
  )
  SELECT * 
  FROM transitive_closure tc
  WHERE tc.target = node_t --target_node

上面的代码将从源节点 node_s 发现 all 可能的路径 . 只有在传递闭包构造之后,我才能从源节点到目标节点选择所需的路径行(参见最后的SELECT语句) .

例:

best_path表包含以下数据:

node_x | node_y | strength 
--------+--------+----------
      1 |      2 |        1
      1 |      3 |        1
      2 |      4 |        1
      2 |      5 |        1
      3 |      6 |        1
      3 |      7 |        1
      4 |      8 |        1
      4 |      9 |        1
      5 |     10 |        1
      5 |     11 |        1

查询:

找到从源节点= 1到目标节点= 4的路径

结果:

source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      5 |        1 | 1.2.5.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.
      1 |      8 |        1 | 1.2.4.8.
      1 |      9 |        1 | 1.2.4.9.
      1 |     10 |        1 | 1.2.5.10.
      1 |     11 |        1 | 1.2.5.11.

这不是我需要的 . 由于已经存在从节点2到节点4(目标)的直接边缘,因此我不需要路径1.2.5 . ,1.2.4.8 . ,1.2.4.9 . ,1.2.5.10 . ,1.2.5.11 . ,路径探索对于节点2,应该在发现从2到4的路径时停止 .

总而言之,如果节点已经具有到目标节点的直接边缘,我不想发现节点的路径 . 这意味着在CTE的递归术语中,我希望有一些条件可以说下面的伪代码如下:

WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term (as before)
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
     AND b.node_y = node_t 
     --will select only rows with direct edge to target

   UNION (second union is not allowed in CTE)

   SELECT those rows which do not have direct edge to target 
   AND which parents did not contribute to constructing the query above. 
   i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
  )

作为查询从源节点= 1到目标节点= 4的查询的结果,我想要具有以下内容:

source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.

在此先感谢您的帮助!

我已经尝试了很多方法:例如FROM / WHERE子句中的条件,试图将CTE传递给函数,但没有成功 .

任何建议将不胜感激 .

我有自己的递归函数来实现我想要的,然而,它在大量数据上非常慢;而PostgreSQL的CTE显然已经过优化,所以我想深入研究一下 .

4 回答

  • 2

    优化的解决方案,最终结果不再有WHERE子句;尽管是特定于Postgresql的解决方案,即我们使用DISTINCT ON来选择特定的行:

    鉴于此数据:

    CREATE TABLE best_path
        (node_x int, node_y int, strength int);
    
    INSERT INTO best_path
        (node_x, node_y, strength)
    VALUES
        (1, 2, 1),
        (1, 3, 1),
        (2, 4, 1),
        (2, 5, 1),
        (3, 6, 1),
        (3, 7, 1),
        (4, 8, 1),
        (4, 9, 1),
        (5, 10, 1),
        (5, 11, 1),
        (5, 12, 1);
    

    第一阶段的查询显示在幕后(source = 1,target = 11):

    with recursive -- this denotes that at least one CTE is using recursion
    
    inputs as ( select 1 as d_source, 11 as d_target )
    ,traverse_path(filter, source, target, path, bingo, distincter) as
    (
      select                      
          bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
          ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
          ,coalesce(
                   nullif(
                       max((bp.node_y = i.d_target)::int) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , 0)          
                   ,bp.node_y)                           
      from best_path bp cross join inputs i
      where bp.node_x = i.d_source -- filter      
      union      
      select 
          tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
          ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
          ,coalesce(
                   nullif(
                       max((bp.node_y = i.d_target)::int) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , 0)          
                   ,bp.node_y)                       
      from best_path bp cross join inputs i
      join traverse_path tp on bp.node_x = tp.target
          and not tp.bingo
    )    
    select tp.*
    from traverse_path tp
    

    source = 1,target = 11的输出:http://www.sqlfiddle.com/#!1/db290/56

    FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
    1        1        2        1.2      0        2
    1        1        3        1.3      0        3
    1        2        4        1.2.4    0        4
    1        2        5        1.2.5    0        5
    1        3        6        1.3.6    0        6
    1        3        7        1.3.7    0        7
    1        4        8        1.2.4.8  0        8
    1        4        9        1.2.4.9  0        9
    1        5        11       1.2.5.11 1        1
    1        5        10       1.2.5.10 1        1
    1        5        12       1.2.5.12 1        1
    

    正如我们所看到的,目标11是5的源中的第一行 . 这是怎么发生的?在我们的DISTINCTER列上,我们在MAX上使用ORDER BY,这是MAX窗口上的ORDER有意义的罕见情况之一 . 我们只是使用它来对结果进行排序 . 我尝试在查询结束时放置ORDER BY,但数据库抱怨它无法在CTE上使用ORDER . CTE禁止放置ORDER BY子句 . 但是众所周知,我们可以影响 OVER() 子句的排序,这就是我们的结果如何排序 . 在上面的结果中,在5的来源中,数字11在10和12之前排序,因为11是我们的目标 . 因此,当我们执行 DISTINCT ON (Postgresql特定功能)时,Postgres将拾取第一行,因此它将拾取目标11 .


    这是我们的最终查询,经过优化,尽管是Postgresql特有的:

    with recursive -- this denotes that at least one CTE is using recursion    
    inputs as ( select 1 as d_source, 11 as d_target )
    ,traverse_path(filter, source, target, path, bingo) as (
      select 
          distinct on (
            bp.node_x,            
            coalesce(
                   nullif(
                       max((bp.node_y = i.d_target)::int) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , 0)          
                   ,bp.node_y)
          )          
          bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
          ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
      from best_path bp cross join inputs i
      where bp.node_x = i.d_source -- filter
      union
      select       
          distinct on (
            bp.node_x,            
            coalesce(
                   nullif(
                       max((bp.node_y = i.d_target)::int) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , 0)          
                   ,bp.node_y)
          )    
          tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
          ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
      from best_path bp cross join inputs i
      join traverse_path tp on bp.node_x = tp.target
          and not tp.bingo
    ) select tp.* from traverse_path tp
    

    source = 1,target = 11 http://www.sqlfiddle.com/#!1/db290/55的输出

    FILTER   SOURCE   TARGET   PATH     BINGO
    1        1        2        1.2      0
    1        1        3        1.3      0
    1        2        4        1.2.4    0
    1        2        5        1.2.5    0
    1        3        6        1.3.6    0
    1        3        7        1.3.7    0
    1        4        8        1.2.4.8  0
    1        4        9        1.2.4.9  0
    1        5        11       1.2.5.11 1
    

    source = 1,target = 4:http://www.sqlfiddle.com/#!1/db290/53的输出

    FILTER  SOURCE  TARGET  PATH    BINGO
    1       1       2       1.2     0
    1       1       3       1.3     0
    1       2       4       1.2.4   1
    1       3       6       1.3.6   0
    1       3       7       1.3.7   0
    

    source = 2,target = 5的输出:http://www.sqlfiddle.com/#!1/db290/54

    FILTER  SOURCE  TARGET  PATH    BINGO
    2       2       5       2.5     1
    


    另一种方法,而不是BINGO方法 . 使用continue_search作为继续travesal的逻辑 . 并使用每个聚合函数来确定是否需要继续遍历图形 .

    以下是它的工作原理:http://www.sqlfiddle.com/#!1/db290/84

    最终查询:http://www.sqlfiddle.com/#!1/db290/85

    有趣的是,每个人都非常喜欢英语:

    对比如下:

    ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
    

    使用每个:

    ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
    

    哪一个更容易阅读?

    这是输出,说明了使用EVERY来促进DISTINCT ON的原理:

    Source = 1,Target = 5. Note that 5 在属于同一源的其他数字之前先排序,稍后将由DISTINCT ON选择 .

    FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
    1         1         2         1.2       1                   2
    1         1         3         1.3       1                   3
    1         2         5         1.2.5     0                   0
    1         2         4         1.2.4     0                   0
    1         3         6         1.3.6     1                   6
    1         3         7         1.3.7     1                   7
    

    这是执行该原则的查询:

    with recursive -- this denotes that at least one CTE is using recursion
    
    inputs as ( select 1 as d_source, 5 as d_target )
    ,traverse_path(filter, source, target, path, continue_search, distincter) as
    (
      select                      
          bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
          ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
          ,coalesce(        
                  cast(nullif(
                       every(bp.node_y <> i.d_target) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , true) as int)
                   ,bp.node_y)                           
      from best_path bp cross join inputs i
      where bp.node_x = i.d_source -- filter      
      union      
      select 
          tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
          ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
          ,coalesce(
                  cast(nullif(
                       every(bp.node_y <> i.d_target) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , true) as int)
                   ,bp.node_y)                       
      from best_path bp cross join inputs i
      join traverse_path tp on bp.node_x = tp.target
         and tp.continue_search
    )    
    select tp.*
    from traverse_path tp
    

    最终查询:

    with recursive -- this denotes that at least one CTE is using recursion
    
    inputs as ( select 1 as d_source, 5 as d_target )
    ,traverse_path(filter, source, target, path, continue_search) as
    (
      select                      
          distinct on
          (
            bp.node_x
            ,coalesce(
                  cast(nullif(
                       every(bp.node_y <> i.d_target) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , true) as int)
                   ,bp.node_y)                       
          )    
          bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
          ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      from best_path bp cross join inputs i
      where bp.node_x = i.d_source -- filter      
      union      
      select   
          distinct on
          (
            bp.node_x
            ,coalesce(
                  cast(nullif(
                       every(bp.node_y <> i.d_target) 
                       over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                       , true) as int)
                   ,bp.node_y)                       
          )  
          tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
          ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      from best_path bp cross join inputs i
      join traverse_path tp on bp.node_x = tp.target
         and tp.continue_search
    )    
    select tp.*
    from traverse_path tp
    

    输出:

    FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
    1         1         2         1.2       1
    1         1         3         1.3       1
    1         2         5         1.2.5     0
    1         3         6         1.3.6     1
    1         3         7         1.3.7     1
    
  • 1

    在重新阅读OP的问题后,我想出了这个解决方案:

    source

    with recursive -- this denotes that at least one CTE is using recursion
    
    inputs as ( select 1 as d_source, 4 as d_target )
    ,traverse_path(filter, source, target, path, bingo) as
    (
      select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,
    
        max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
    
      from best_path bp cross join inputs i
      where bp.node_x = i.d_source -- filter
    
      union
    
      select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   
    
          max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   
    
      from best_path bp cross join inputs i
      join traverse_path tp on bp.node_x = tp.target
          and not tp.bingo
    )    
    select tp.*
    from traverse_path tp cross join inputs i    
    -- if Postgresql has Oracle KEEP windowing, 
    -- we don't need to use WHERE clause here     
    where 
      not tp.bingo   
      or
      (
        tp.bingo and tp.target = d_target
      )
    

    上述WHERE子句可以缩短为:

    WHERE not bingo or target = 4
    

    Output:

    FILTER  SOURCE  TARGET  PATH    BINGO
    1       1       2       1.2     0
    1       1       3       1.3     0
    1       2       4       1.2.4   1
    1       3       6       1.3.6   0
    1       3       7       1.3.7   0
    

    现场测试:http://www.sqlfiddle.com/#!1/cdde6/55


    这是Source = 2,Target = 5的结果:

    FILTER  SOURCE  TARGET  PATH    BINGO
    2       2       5       2.5     1
    

    Data:

    CREATE TABLE best_path
        (node_x int, node_y int, strength int);
    
    INSERT INTO best_path
        (node_x, node_y, strength)
    VALUES
        (1, 2, 1),
        (1, 3, 1),
        (2, 4, 1),
        (2, 5, 1),
        (3, 6, 1),
        (3, 7, 1),
        (4, 8, 1),
        (4, 9, 1),
        (5, 10, 1),
        (5, 11, 1);
    
  • 1

    如果从底部开始,您可以更有效地搜索路径 . 从孩子们开始 . 如果你从父母开始,那就需要遍历所有的孩子;而如果你从孩子那里搜索,它只有一个父母,因此不会浪费时间找到源和目标之间的路径 .

    with recursive find_parent(source, target, recentness) as
    (
        select source, target, 0 
        from tbl
        where target = 9
    
        union all
    
        select i.source, i.target, fp.recentness + 1
        from tbl i
        join find_parent fp on i.target = fp.source
    ),
    construct_path(source, target, recentness, path) as
    (
      select source, target, recentness, source || '.' || target
      from find_parent 
      where recentness = (select max(recentness) from find_parent)
    
      union
    
      select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
      from find_parent dd
      join construct_path cp on dd.recentness = cp.recentness - 1  
    )
    select source, target, path 
    from construct_path
    order by recentness desc
    

    输出:

    SOURCE   TARGET   PATH
    1        2        1.2
    2        4        1.2.4
    4        9        1.2.4.9
    

    现场测试:http://www.sqlfiddle.com/#!1/13e6b/1

    与此类似:How to get the parent given a child in SQL SERVER 2005


    这是优化,削减如果已经找到特定的(源),则递归到父级 .

    来源= 2

    目标= 9

    with recursive find_parent(source, target, recentness) as
    (
        select source, target, 0 
        from tbl
        where target = 9
    
        union all
    
        select i.source, i.target, fp.recentness + 1
        from tbl i
        join find_parent fp on i.target = fp.source 
             -- despite the name, this target is another one's source
             and i.target <> 2
    )
    ,construct_path(source, target, recentness, path) as
    (
        select source, target, recentness, source || '.' || target
        from find_parent 
        where recentness = (select max(recentness) from find_parent)
    
        union
    
        select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
        from find_parent dd
        join construct_path cp on dd.recentness = cp.recentness - 1  
    
    )
    select source, target, path
    from construct_path
    order by recentness desc
    

    输出:

    SOURCE   TARGET  PATH
    2        4       2.4
    4        9       2.4.9
    

    现场测试:http://www.sqlfiddle.com/#!1/13e6b/16

  • 0

    用于测试的临时表:

    CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
    INSERT INTO best_path  VALUES
     (1, 2,  1)
    ,(1, 3,  1)
    ,(2, 4,  1)
    ,(2, 5,  1)
    ,(3, 6,  1)
    ,(3, 7,  1)
    ,(4, 8,  1)
    ,(4, 9,  1)
    ,(5, 10, 1)
    ,(5, 11, 1);
    

    查询:
    修改以适应2 - 5的评论

    WITH RECURSIVE t AS (    -- for readability and convenience:
        SELECT 1 AS node_s   -- source_node
             , 4 AS node_t   -- target_node
        )
        , x AS (
        SELECT node_x
        FROM   t, best_path
        WHERE  node_y = node_t
        )
        , a AS  ( 
        SELECT b.node_x
             , b.node_y
             , b.strength AS weight
             , b.node_x || '.' || b.node_y || '.' AS path
        FROM   t, best_path b
        LEFT   JOIN x ON x.node_x = b.node_x
        WHERE  b.node_x = node_s
        AND    (x.node_x IS NULL                    -- no point with edge to target
                OR b.node_y = node_t)               -- except with target_node
    
        UNION ALL
        SELECT a.node_x
             , b.node_y
             , least(a.weight, b.strength)
             , a.path || b.node_y || '.' AS path
        FROM   t, a
        JOIN   best_path AS b ON b.node_x = a.node_y
        LEFT   JOIN x ON x.node_x = b.node_x
        WHERE  a.node_y <> node_t                   -- arrived at target
        AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
        AND    (x.node_x IS NULL                    -- no point with edge to target
                OR b.node_y = node_t)               -- except with target_node
        )
    TABLE a;
    

    完全生成请求的结果 .

    我也将断点条件添加到初始查询中,因为我们只能在一个步骤中到达目标 .

    这恰好与my answer to a previous, similar question非常相似 . 所有解释和链接均适用 . 这里的主要附加技巧是包括一个额外的CTE x 来收集节点......

    (a)直接到目标边缘 .

    用于在递归CTE的中断条件下重复使用 . 众所周知,无论关键字 RECURSIVE 如何,您都可以在递归CTE之上添加其他(非递归)CTE .

相关问题