首页 文章

MSSQL2012(Over / Partition)过去12个月,包括空值?

提问于
浏览
0

因此,我遇到了一个问题:在一个月内可能不存在的产品的情况下,获得过去12个月(计算正在处理的产品代码)的实际总额/平均值,通过产品代码进行分区 .

尝试从TL开始; DR,让我们去实际的事情:

  • 我的表有4个字段:年,月,产品代码和值;

  • 我需要去年的总和(过去11个月的选定月份);

  • 去年我需要平均(同样) .

我尝试过:

SELECT * FROM (
  SELECT year, month, product,
  AVG(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as average,
  SUM(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as sum
  FROM suchDB.muchUSER.awesomeTABLE
) q
where year = <insert year> and month = <month>

问题:

  • 如果任何查询的月份没有忽略产品是否有历史记录,它将不会给我带来过去几个月的平均值或总和 .

  • 如果任何过去一个月,这应该是被查询月份的历史记录的一部分,那么不会忽略那个月并继续到'last year'范围之后的一个月(一个没有七月的被查询的十月将返回它的前一年的版本而不是在11月停止) .

我们曾经使用 'GROUP BY product''WHERE ((year-1)*100)+month > queriedYear-1Month AND (year*100)+month <= queriedYearMonth' 来查询这个东西,直到有人指出我们在OVER / PARTITION方向并且我们改变了一切......但即使这样我们也遇到了一些问题,比如 avg() 函数会忽略NULL月......

救命?

FORGOT ONE VERY IMPORTANT THING

awesomeTABLE中的数据是版本化的 - 每年/每月可以有多个版本 . 只能使用最新版本 . 我通常通过加入 select distinct year, month, max(version) from awesomeTABLE group by year, month 来做到这一点,但它似乎杀死了一些可能的解决方案......

4 回答

  • 1

    使用date时要做的第一件事是拥有一个datetime字段,而不是年,月和日的分隔字段

    SELECT year, month, product, value
         , DATEFROMPARTS(year, month, 1) fullDate
    FROM   suchDB.muchUSER.awesomeTABLE
    

    由于行可以版本化,我们需要获得每年,每月和每个产品的最新版本,这可以通过多种方式完成,例如使用窗口函数或自动连接,后者的示例是

    SELECT b.[year], b.[month], b.[product], [value]
         , DATEFROMPARTS(b.[year], b.month, 1) fullDate
    FROM   suchDB.muchUSER.awesomeTABLE b
           INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                       FROM   suchDB.muchUSER.awesomeTABLE
                       GROUP BY [year], [month], [product]
                      ) m ON b.[year] = m.year AND b.month = m.month 
                         AND b.product = m.product AND b.[version] = m.lv
    

    拥有产品清单也会派上用场

    SELECT DISTINCT product
    FROM   suchDB.muchUSER.awesomeTABLE
    

    获得去年的数据可以通过多种方式完成,我喜欢 CROSS APPLY

    Declare @_year int
    Declare @_month int
    
    Set @_year = 2015
    Set @_month = 11
    
    ;With baseDate AS (
      SELECT b.[year], b.[month], b.[product], [value]
           , DATEFROMPARTS(b.[year], b.month, 1) fullDate
      FROM   suchDB.muchUSER.awesomeTABLE b
             INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                         FROM   suchDB.muchUSER.awesomeTABLE
                         GROUP BY [year], [month], [product]
                        ) m ON b.[year] = m.year AND b.month = m.month 
                           AND b.product = m.product AND b.[version] = m.lv
    ), Products AS (
      SELECT DISTINCT [product]
      FROM   suchDB.muchUSER.awesomeTABLE
    )
    SELECT @_year [Year], @_month [Month], p.[product]
         , ly.Average
         , ly.[Sum]
    FROM   Products p
           CROSS APPLY (SELECT Sum(lastYear.Value) / 12.0 Average
                             , Sum(lastYear.Value) [Sum]
                        FROM   baseDate lastYear
                        WHERE  lastYear.fullDate > DATEFROMPARTS(@_year - 1, @_month, 1)
                          AND  lastYear.fullDate <= DATEFROMPARTS(@_year, @_month, 1)
                          AND  lastYear.product = p.product
                       ) ly 
    WHERE  ly.[Sum] IS NOT NULL
    

    WHERE 条件删除参数前一年中没有行的产品

    要删除变量并获取每个月的值,需要一个日历表 . 如果表格中的所有产品都在那里,那么我们可以使用 DISTINCT 获取日期,使用第一个 CTE 而不是表格让我们也获得完整日期

    SELECT DISTINCT [year], [month], fullDate
    FROM   baseDate
    

    否则,有不同的方法来创建日历表 . 我们可以将日历表添加到主查询 CTE 并在 CROSS APPLY 中使用它而不是变量

    ;With baseDate AS (
      SELECT b.[year] _y, b.[month] _m, b.[product], [value]
           , DATEFROMPARTS(b.[year], b.month, 1) fullDate
      FROM   suchDB.muchUSER.awesomeTABLE b
             INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                         FROM   suchDB.muchUSER.awesomeTABLE
                         GROUP BY [year], [month], [product]
                        ) m ON b.[year] = m.year AND b.month = m.month 
                           AND b.product = m.product AND b.[version] = m.lv
    ), Products AS (
      SELECT DISTINCT [product]
      FROM   suchDB.muchUSER.awesomeTABLE
    ), Months As (
      SELECT DISTINCT _y, _m, fullDate
      FROM   baseDate
    )
    SELECT _y [Year], _m [Month], p.[product]
         , ly.Average
         , ly.[Sum]
         , ly.[Count]
    FROM   Products p
           CROSS APPLY (SELECT m._y, m._m
                             , Sum(bd.Value) / 12.0 Average
                             , Sum(bd.Value) [Sum]
                             , Count(Value) [Count]
                        FROM   Months m
                               LEFT JOIN baseDate bd 
                                      ON bd.fullDate > DATEADD(YY, -1, m.fullDate)
                                     AND bd.fullDate <= m.fullDate
                        WHERE  bd.product = p.product
                        GROUP BY m._y, m._m
                       ) ly 
    WHERE  ly.[Sum] IS NOT NULL
    
  • 1

    最大的问题是,您需要获取两个列表才能正确汇总数据集中的值 - 一个日期列表和一个产品列表 . 如果没有这两个列表,最后一个月中缺少的产品意味着不会报告产品,或者(如您已发现的那样)可能会汇总错误的12个月(缺少7月意味着11个前置行包括开始月份) .

    下面是对生成这些列表的过程的全面扩展探索 . 它只使用源数据表(假设每个月都出售了一些东西 . 它可以更简洁(即计算上面shawnt示例中的日期),但写的是显示所有步骤和假设 . 将其封装到存储过程中,因为它显示了显式传递的值 .

    CREATE PROCEDURE DoTheThing 
        @startDate DATE  -- Should be Year-Month-01 or YearMonth01
        AS
        BEGIN
        DECLARE @yr INT, @mth INT,
            @yr2 INT, @mth2 INT,
            @endDate DATE   -- StartDt - 11 months
    
        -- if the date will be passed in with a day other than 01, add code here to set the day on the passed date to 01
        -- if only the high year and month are passed in, then create a @startDt value and continue.
    
        SET @endDate = DATEADD(MONTH, -11, @startDate)
    
        SELECT @yr = DATEPART(YEAR, @startDate),
            @mth = DATEPART(MONTH, @startdate),
            @yr2 = DATEPART(YEAR, @startDate),
            @mth2 = DATEPART(MONTH, @startdate)
    
        WITH mthYr AS (
            SELECT DISTINCT 
                YEAR, 
                MONTH
            FROM suchDB.muchUSER.awesomeTABLE   -- Get the data from the source table
            WHERE (
                YEAR = @yr              -- if in the passed-in year, then take all months less than or equal to the start month
                AND MONTH <= @mth
                )
                OR (
                YEAR = @yr2             -- if the period is Jan -- Dec in one year, this reiterates the above
                AND MONTH >= @mth2      -- if not, select the months in the second year where the month is greater than or equal to the calculated month
                )
            ), 
        prods AS (
            SELECT DISTINCT product     -- Return a list of products sold during the year.
            FROM suchDB.muchUSER.awesomeTABLE smt
            INNER JOIN mthYr
                ON mthYr.YEAR = smt.YEAR
                AND mthyr.MONTH = smt.MONTH
            )
    
        SELECT MAX(mthYr.YEAR) AS year,     -- current report only shows passed in year/month value
            MAX(mthYr.month) AS month, 
            prods.product,                  
            AVG(ISNULL(VALUE, 0.00)) average,   -- isNull value adds a zero into the list to be averaged
            SUM(ISNULL(VALUE, 0.00)) SUM        -- not really necessary, but no warnings about NULL values will be generated
        FROM mthYr CROSS JOIN prods         -- cross join of the two lists means all of the products sold will have a value for each month
        LEFT JOIN suchDB.muchUSER.awesomeTABLE smt  -- left join so missing productMonths will still be added in
            ON smt.YEAR = mthYr.year
            AND smt.MONTH = mthYr.month 
            AND prods.product = smt.product
        ORDER BY smt.product
    END
    
  • 2

    如果你有一个产品表和一个日期表,我会留下那些带有上述查询的人,以便表示所有产品和所有日期,然后用你拥有的方法对结果求和 .

    您的第一颗子弹应该被解决,因为所有日期和所有产品都将被代表

    我相信你的第二颗子弹是由“前面的第11行”真正得到过去的11个值而不管日期;这可以通过将0替换为当前缺失的月/产品组合的null来解决,但是在左连接方法的情况下将提供 .

  • 1

    对于缺失的月份有很多可能的修复方法 . 这是一种不会过多改变原始查询的方法:

    WITH ym as (
        select
             year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
            month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
        from (values
             (0), (-1), (-2), (-3),  (-4),  (-5),
            (-6), (-7), (-8), (-9), (-10), (-11)
        ) ofs(n)
    )
    SELECT
        y, m, p.product,
        AVG(coalesce(value, 0)) OVER (
          PARTITION BY p.product 
          ORDER BY y, m
          ROWS 11 PRECEDING
        ) as average,
        SUM(value) OVER (
          PARTITION BY p.product 
          ORDER BY y, m
          ROWS 11 PRECEDING
        ) as sum
    FROM
        ym cross join
        (select distinct product from suchDB.muchUSER.awesomeTABLE) p
        left outer join suchDB.muchUSER.awesomeTABLE t
            on t."year" = ym.y and t."month" = ym.m and t.product = p.product
    where y = <insert year> and m = <month>
    

    而不是最后的 where 你可以做一个 order by y, m descselect top 1 . 我通常会尽量避免使用 top 但是我会在参数中丢弃它,并且避免使用某些编程库两次引用它们可能不那么麻烦 . 即使你手工做,你仍然必须这样做记得在长查询中查看两个地方 .

    由于看起来你只需要一个月的聚合,我想你可以逃脱这个不使用窗口函数的简单版本:

    WITH ym as (
        select
            dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt,
             year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
            month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
        from (values
             (0), (-1), (-2), (-3),  (-4),  (-5),
            (-6), (-7), (-8), (-9), (-10), (-11)
        ) ofs(n)
    )
    SELECT
        year(max(dt)) as "year", month(max(dt)) as "month", p.product,
        AVG(coalesce(value, 0)) as average,
        SUM(value) as "sum"
    FROM
        ym cross join
        (select distinct product from awesome) p
        left outer join awesome t
        on t."year" = ym.y and t."month" = ym.m and t.product = p.product
    GROUP BY p.product
    

    我不知道你是否只想限制去年实际销售的产品,所以我没有处理那里 .

    如果你开始考虑如何使它更通用和可重用,你可能最终会更喜欢这个 . 我继续将产品限制添加到前一年的活动中:

    WITH dt as (
        select dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt
    ),
    ym as (
        select dt, year(dateadd(month, n, dt)) as y, month(dateadd(month, n, dt)) as m
        from (values
             (0), (-1), (-2), (-3),  (-4),  (-5),
            (-6), (-7), (-8), (-9), (-10), (-11)
        ) ofs(n), dt
    )
    SELECT
        year(max(dt)) as "year", month(max(dt)) as "month", p.product,
        AVG(coalesce(value, 0)) as average, SUM(value) as "sum"
    FROM
        ym cross join
        (
            select distinct product from awesome
            where cast("year" + "month" + '01' as date) between
                    (select min(dt) from ym) and (select max(dt) from ym)
        ) p
        left outer join (
            select distinct /* get the latest "version" only */
                first_value("year")
                    over (partition by "year", "month", product order by version desc),
                first_value("month")
                    over (partition by "year", "month", product order by version desc),
                product,
                first_value(value)
                    over (partition by "year", "month", product order by version desc)
            from awesome
        ) t
            on t."year" = ym.y and t."month" = ym.m and t.product = p.product
    GROUP BY p.product
    

    最终查询还会尝试仅针对最新版本处理您的过滤器 . 但是,对于 first_value() 函数,您需要更高版本的SQL Server .

相关问题