首页 文章

如何按Spark SQL中的日期范围进行筛选

提问于
浏览
3

我正在尝试使用数据块从以下数据中过滤日期范围,数据块返回null作为响应 . 我的csv数据看起来像:

ID, Desc, Week_Ending_Date
100, AAA, 13-06-2015
101, BBB, 11-07-2015
102, CCC, 15-08-2015
103, DDD, 05-09-2015
100, AAA, 29-08-2015
100, AAA, 22-08-2015

我的查询是:

df.select(df("ID"), date_format(df("Week_Ending_Date"), "yyyy-MM-dd"))
.filter(date_format(df("Week_Ending_Date"), "yyyy-MM-  dd").between("2015-07-05", "2015-09-02"))

任何帮助深表感谢 .

1 回答

  • 8

    从我的头脑中,我会通过在阅读时转换日期列然后使用别名来应用过滤器来完成以下操作:

    import java.text.SimpleDateFormat
    
    val format = new SimpleDateFormat("dd-MM-yyyy")
    val data = sc.parallelize(
      List((100, "AAA", "13-06-2015"), (101, "BBB", "11-07-2015"), (102, "CCC", "15-08-2015"), (103, "DDD", "05-09-2015"), (100, "AAA", "29-08-2015"), (100, "AAA", "22-08-2015")).toSeq).map {
      r =>
        val date: java.sql.Date = new java.sql.Date(format.parse(r._3).getTime);
        (r._1, r._2, date)
    }.toDF("ID", "Desc", "Week_Ending_Date")
    
    data.show
    
    //+---+----+----------------+
    //| ID|Desc|Week_Ending_Date|
    //+---+----+----------------+
    //|100| AAA|      2015-06-13|
    //|101| BBB|      2015-07-11|
    //|102| CCC|      2015-08-15|
    //|103| DDD|      2015-09-05|
    //|100| AAA|      2015-08-29|
    //|100| AAA|      2015-08-22|
    //+---+----+----------------+
    
    val filteredData = data
               .select(data("ID"), date_format(data("Week_Ending_Date"), "yyyy-MM-dd").alias("date"))
               .filter($"date".between("2015-07-05", "2015-09-02"))
    
    //+---+----------+
    //| ID|      date|
    //+---+----------+
    //|101|2015-07-11|
    //|102|2015-08-15|
    //|100|2015-08-29|
    //|100|2015-08-22|
    //+---+----------+
    

相关问题