首页 文章

SparkSQL:日期范围的条件和

提问于
浏览
1

我有这样的数据帧:

| id | prodId |    date    | value |
| 1  |   a    | 2015-01-01 | 100   |
| 2  |   a    | 2015-01-02 | 150   |
| 3  |   a    | 2015-01-03 | 120   |
| 4  |   b    | 2015-01-01 | 100   |

而且我很想做一个小组,并且将“ Value ”总结为日期范围 . 换句话说,我需要构建一个包含以下列的表:

  • prodId

  • val_1:如果日期在date1和date2之间,则为sum值

  • val_2:如果日期在date2和date3之间,则为sum值

  • val_3:和以前一样

| prodId |      val_1       |      val_2       |
 |        | (01-01 to 01-02) | (01-03 to 01-04) |

 |   a    |        250       |        120       |
 |   b    |        100       |        0         |

在spark中是否有任何预定义的聚合函数允许进行条件求和?你建议开发一个aggr . UDF(如果是这样,任何建议)?非常感谢!

1 回答

  • 2

    首先让我们重新创建示例数据集

    import org.apache.spark.sql.functions.to_date
    
    val df = sc.parallelize(Seq(
      (1, "a", "2015-01-01", 100), (2, "a", "2015-01-02", 150),
      (3, "a", "2015-01-03", 120), (4, "b", "2015-01-01", 100)
    )).toDF("id", "prodId", "date", "value").withColumn("date", to_date($"date"))
    
    val dates = List(("2015-01-01", "2015-01-02"), ("2015-01-03", "2015-01-04"))
    

    你所要做的就是这样:

    import org.apache.spark.sql.functions.{when, lit, sum}
    
    val exprs = dates.map{
      case (x, y) => {
        // Create label for a column name
        val alias = s"${x}_${y}".replace("-", "_")
        // Convert strings to dates
        val xd = to_date(lit(x))
        val yd = to_date(lit(y))
        // Generate expression equivalent to
        // SUM(
        //   CASE
        //     WHEN date BETWEEN ... AND ... THEN value
        //     ELSE 0
        //   END
        // ) AS ...
        // for each pair of dates.
        sum(when($"date".between(xd, yd), $"value").otherwise(0)).alias(alias)
      }
    }
    
    df.groupBy($"prodId").agg(exprs.head, exprs.tail: _*).show
    
    // +------+---------------------+---------------------+
    // |prodId|2015_01_01_2015_01_02|2015_01_03_2015_01_04|
    // +------+---------------------+---------------------+
    // |     a|                  250|                  120|
    // |     b|                  100|                    0|
    // +------+---------------------+---------------------+
    

相关问题