首页 文章

SQL查询将每个返回的行连接到SUM的同一个表

提问于
浏览
1

我遇到了一个任务,我必须在每个月的每一天返回已发布政策的总COUNT和SUM,并将其与上一年进行比较 .

表PolicyOrder包含以下字段:

PolicyOrderId - 主键CreatedAt(DATETIME)CalculatedPremium - 策略成本或“溢价”PolicyOrderStatusId - 与问题无关 - 但仍然 - 策略的状态 .

为了解决这个问题,我想出了一个内部加入自我表的查询,并根据创建日期的DAY进行分组 .

SELECT 
      DATE(po1.CreatedAt) AS dayDate_2017, 
      SUM(po1.CalculatedPremium) AS premiumSum_2017,
      COUNT(po1.PolicyOrderId) AS policyCount_2017,
      po2.*
FROM 
      PolicyOrder po1
INNER JOIN (
           SELECT 
                DATE(CreatedAt) AS dayDate_2018, 
                SUM(CalculatedPremium) AS premiumSum_2018, 
                COUNT(PolicyOrderId) AS policyCount_2018
           FROM 
                PolicyOrder po2
           WHERE
                YEAR(CreatedAt) = 2018 AND 
                MONTH(CreatedAt) = 10 AND
                PolicyOrderStatusId = 6 
           GROUP BY
                DAY(CreatedAt)
       ) po2 ON ( 
           DAY(po2.dayDate_2018) = DAY(po1.CreatedAt) 
       )
WHERE   
       YEAR(po1.CreatedAt) = 2017 AND 
       MONTH(po1.CreatedAt) = 10 AND 
       PolicyOrderStatusId = 6 
GROUP BY 
       DAY(po1.CreatedAt)

以上查询返回以下结果:

dayDate_2017 | premiumSum_2017 | policyCount_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018
2017-10-01   | 4699.36         | 98               | 2018-10-01   | 8524.21         | 144
2017-10-02   | 9114.55         | 168              | 2018-10-02   | 7942.25         | 140
2017-10-03   | 9512.43         | 178              | 2018-10-03   | 9399.61         | 161
2017-10-04   | 9291.77         | 155              | 2018-10-04   | 6922.83         | 137
2017-10-05   | 8063.27         | 155              | 2018-10-05   | 9278.58         | 178
2017-10-06   | 9743.40         | 184              | 2018-10-06   | 6139.38         | 136
...
2017-10-31   | ...

问题是,现在我必须添加两个列,其中必须计算策略,并从年初UP UNTIL每个返回的行添加金额 .

Desired results:
dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining | countFromYearBegining 
2017-10-01   | 4699.36         | 98               | 150000.34           | 5332   
2017-10-02   | 9114.55         | 168              | 156230.55           | 5443
2017-10-03   | 9512.43         | 178              | 160232.44           | 5663
    ...
2017-10-31   | ...


WHERE:

sumFromYearBegining (150000.34) - SUM of premiumSum from 2017-01-01 until 2017-10-01 (excluding)
countFromYearBegining (5332) - COUNT of policies from 2017-01-01 until 2017-10-01 (excluding)

sumFromYearBegining (1566239.55) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5443) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)

sumFromYearBegining (160232.44) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5663) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)

我已经尝试内部连接相同的表COUNTed和SUMed失败,因为我无法指定我需要计算的范围和总和,我尝试了LEFT加入然后计数,这失败了因为结果计算不是直到每行结果但是直到最后的结果等...

DB小提琴:https://www.db-fiddle.com/f/ckM8HyTD6NjLbK41Mq1gct/5

您对SQL忍者的任何帮助表示高度赞赏 .

2 回答

  • 1

    在没有Window Functions可用性的情况下,我们可以使用User-defined variables来计算Rolling Sum / Count .

    我们首先需要确定2017年 every day 的总和和计数(即使您只需要特定月份的行) . 因为,为了计算3月份的日期的滚动总和,我们还需要1月和2月的总和/计数值 . 一种优化可能性是我们可以限制从第一个月到需求月的计算 .

    请注意, ORDER BY daydate_2017 是必要的,以便能够正确计算滚动总和 . 默认情况下,数据采用无序方式 . 如果没有定义订单,我们无法保证Sum是正确的 .

    此外,我们需要两级子选择查询 . 第一级用于计算滚动总和值 . 第二级用于将结果限制为仅限二月 . 由于 WHERESELECT 之前执行;我们不能将结果限制在2月份,在第一级本身 .

    如果你需要2018年的类似滚动总和;类似的查询逻辑可以在其他子选择查询集中实现 .

    SELECT dt2_2017.*, dt_2018.*
    FROM 
    (
    SELECT dt_2017.*,
           @totsum := @totsum + dt_2017.premiumsum_2017 AS sumFromYearBegining_2017,
           @totcount := @totcount + dt_2017.policycount_2017 AS countFromYearBeginning_2017
    FROM   (SELECT Date(po1.createdat)        AS dayDate_2017,
                   Sum(po1.calculatedpremium) AS premiumSum_2017,
                   Count(po1.policyorderid)   AS policyCount_2017
            FROM   PolicyOrder AS po1
            WHERE  po1.policyorderstatusid = 6 AND 
                   YEAR(po1.createdat) = 2017 AND 
                   MONTH(po1.createdat) <= 2 -- calculate upto February for 2017
            GROUP  BY daydate_2017
            ORDER  BY daydate_2017) AS dt_2017
    CROSS JOIN (SELECT @totsum := 0, @totcount := 0) AS user_init_vars 
    ) AS dt2_2017 
    INNER JOIN (
                 SELECT 
                   DATE(po2.CreatedAt) AS dayDate_2018, 
                   SUM(po2.CalculatedPremium) AS premiumSum_2018, 
                   COUNT(po2.PolicyOrderId) AS policyCount_2018
                 FROM 
                   PolicyOrder po2
                 WHERE
                    YEAR(po2.CreatedAt) = 2018 AND 
                    MONTH(po2.CreatedAt) = 2 AND
                    po2.PolicyOrderStatusId = 6 
                 GROUP BY
                    dayDate_2018
               ) dt_2018 ON DAY(dt_2018.dayDate_2018) = DAY(dt2_2017.dayDate_2017)   
    WHERE YEAR(dt2_2017.daydate_2017) = 2017 AND 
          MONTH(dt2_2017.daydate_2017) = 2;
    

    RESULT: View on DB Fiddle

    | dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining_2017 | countFromYearBeginning_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018 |
    | ------------ | --------------- | ---------------- | ------------------------ | --------------------------- | ------------ | --------------- | ---------------- |
    | 2017-02-01   | 4131.16         | 131              | 118346.77                | 3627                        | 2018-02-01   | 8323.91         | 149              |
    | 2017-02-02   | 2712.74         | 85               | 121059.51000000001       | 3712                        | 2018-02-02   | 9469.33         | 153              |
    | 2017-02-03   | 3888.59         | 111              | 124948.1                 | 3823                        | 2018-02-03   | 6409.21         | 97               |
    | 2017-02-04   | 2447.99         | 74               | 127396.09000000001       | 3897                        | 2018-02-04   | 5693.69         | 120              |
    | 2017-02-05   | 1437.5          | 45               | 128833.59000000001       | 3942                        | 2018-02-05   | 8574.97         | 129              |
    | 2017-02-06   | 4254.48         | 127              | 133088.07                | 4069                        | 2018-02-06   | 8277.51         | 133              |
    | 2017-02-07   | 4746.49         | 136              | 137834.56                | 4205                        | 2018-02-07   | 9853.75         | 173              |
    | 2017-02-08   | 3898.05         | 125              | 141732.61                | 4330                        | 2018-02-08   | 9116.33         | 144              |
    | 2017-02-09   | 8306.86         | 286              | 150039.46999999997       | 4616                        | 2018-02-09   | 8818.32         | 166              |
    | 2017-02-10   | 6740.99         | 204              | 156780.45999999996       | 4820                        | 2018-02-10   | 7880.17         | 134              |
    | 2017-02-11   | 4290.38         | 133              | 161070.83999999997       | 4953                        | 2018-02-11   | 8394.15         | 180              |
    | 2017-02-12   | 3687.58         | 122              | 164758.41999999995       | 5075                        | 2018-02-12   | 10378.29        | 171              |
    | 2017-02-13   | 4939.31         | 159              | 169697.72999999995       | 5234                        | 2018-02-13   | 9383.15         | 160              |
    

  • 2

    如果你想要一种避免在选择列表中使用@variables的方法,并且还避免分析(只有mysql 8支持它们),你可以使用半笛卡尔产品来实现:

    WITH prevYr AS(
        SELECT 
            YEAR(CreatedAt) AS year_prev,
            MONTH(CreatedAt) AS month_prev,
            DAY(CreatedAt) AS day_prev,
            SUM(CalculatedPremium) AS premiumSum_prev, 
            COUNT(PolicyOrderId) AS policyCount_prev
        FROM 
            PolicyOrder
        WHERE
            CreatedAt BETWEEN '2017-02-01' AND '2017-02-28' AND
            PolicyOrderStatusId = 6 
        GROUP BY
            YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
    ),
    currYr AS (
        SELECT 
            YEAR(CreatedAt) AS year_curr,
            MONTH(CreatedAt) AS month_curr,
            DAY(CreatedAt) AS day_curr,
            SUM(CalculatedPremium) AS premiumSum_curr, 
            COUNT(PolicyOrderId) AS policyCount_curr
        FROM 
            PolicyOrder
        WHERE
            CreatedAt BETWEEN '2018-02-01' AND '2018-02-28' AND
            PolicyOrderStatusId = 6 
        GROUP BY
            YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
    ) 
    
    
    SELECT 
          *
    FROM
           prevYr
           INNER JOIN 
           currYr
           ON  
               currYr.day_curr = prevYr.day_prev
    
           INNER JOIN
           (
               SELECT 
                    main.day_prev AS dayRolling_prev, 
                    SUM(pre.premiumSum_prev) AS premiumSumRolling_prev, 
                    SUM(pre.policyCount_prev) AS policyCountRolling_prev
               FROM 
                    prevYr main LEFT OUTER JOIN prevYr pre ON pre.day_prev < main.day_prev
               GROUP BY
                    main.day_prev
            ) rollingPrev
            ON  
               currYr.day_curr = rollingPrev.dayRolling_prev
    
    ORDER BY 1,2,3
    

    我们将2017年和2018年的数据总结为两个CTE,因为它使得事情变得更加清洁和整洁,特别是对于这种滚动计数 . 您可以轻松地遵循CTE的逻辑,因为它或多或少地直接从您的查询中提升 - 我只删除DATE列以支持年/月/日三元组,因为它使其他东西更清洁(连接)并且可以重新组合到需要的日期 . 我还交换了WHERE子句以使用 date BETWEEN x AND y ,因为这将利用列上的索引,而使用 YEAR(date) = x AND MONTH(date) = y 可能不会

    滚动计数通过我称为半笛卡尔的东西起作用 . 它实际上是一种笛卡尔积;任何数据库连接都会导致两个表中的行相乘并在输出中重复表示,这是一个笛卡尔积 . 在这种情况下,它不是一个完整的产品(每一行与每一行交叉),而是使用小于,所以每行只与行的子集交叉 . 随着日期的增加,更多的行与谓词匹配,因为30日的日期有29行小于它 .

    这因此导致以下数据模式:

    maindate   predate    maincount precount
    2017-02-01 NULL       10        NULL
    
    2017-02-02 2017-02-01 20        10
    
    2017-02-03 2017-02-01 30        10
    2017-02-03 2017-02-02 30        20
    
    2017-02-04 2017-02-01 40        10
    2017-02-04 2017-02-02 40        20
    2017-02-04 2017-02-03 40        30
    

    你可以看到,对于任何给定的主日期,它重复N - 1次,因为有N - 1个日期低于满足连接条件的日期 predate < maindate

    如果我们按照maindate分组并将与每个预先关联的计数相加,我们得到该主日期的所有预计数的滚动总和(因此,在该月的第4天,它是SUM(预先计算的日期1) - 3,即10 20 30 = 60.在第5天,我们将第1天到第4天的计数相加 . 在第6天,我们将第1天到第5天等等)

相关问题