首页 文章

Spark SQL 2.3.1:使用unix_timestamp函数的奇怪行为

提问于
浏览
2

我有一个类似的Spark SQL查询:

SELECT
 CAST(UNIX_TIMESTAMP('2015-03-29T01:11:23Z', "yyyy-MM-dd'T'HH:mm:ss'Z'") AS TIMESTAMP) as column1,
 CAST(UNIX_TIMESTAMP('2015-03-29T02:11:23Z', "yyyy-MM-dd'T'HH:mm:ss'Z'") AS TIMESTAMP) as column2
FROM ...

第一列返回好的结果 . 但是第二列始终为NULL .

我错过了什么?

UPDATE :

完整代码(我删除了FROM子句以简化):

val spark = SparkSession.builder.appName("My Application").getOrCreate()
spark.sql("SELECT CAST(UNIX_TIMESTAMP('2015-03-29T01:11:23Z', \"yyyy-MM-dd'T'HH:mm:ss'Z'\") AS TIMESTAMP) as column1, CAST(UNIX_TIMESTAMP('2015-03-29T02:11:23Z', \"yyyy-MM-dd'T'HH:mm:ss'Z'\") AS TIMESTAMP) as column2").show(false)

完整的痕迹:

2018-07-20 10:26:52 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-20 10:26:52 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-07-20 10:26:52 INFO  SparkContext:54 - Submitted application: My Application
2018-07-20 10:26:52 INFO  SecurityManager:54 - Changing view acls to: XXX
2018-07-20 10:26:52 INFO  SecurityManager:54 - Changing modify acls to: XXX
2018-07-20 10:26:52 INFO  SecurityManager:54 - Changing view acls groups to:
2018-07-20 10:26:52 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-07-20 10:26:52 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(XXX); groups with view permissions: Set(); users  with modify permissions: Set(XXX); groups with modify permissions: Set()
2018-07-20 10:26:52 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 36615.
2018-07-20 10:26:52 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-07-20 10:26:52 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-07-20 10:26:52 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-07-20 10:26:52 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-07-20 10:26:52 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-d8b77ac8-53a3-4923-810e-d81539b30369
2018-07-20 10:26:52 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-07-20 10:26:52 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-07-20 10:26:52 INFO  log:192 - Logging initialized @1074ms
2018-07-20 10:26:52 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-07-20 10:26:52 INFO  Server:414 - Started @1129ms
2018-07-20 10:26:52 INFO  AbstractConnector:278 - Started ServerConnector@7544a1e4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-07-20 10:26:52 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a6f5124{/jobs,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@781e7326{/jobs/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22680f52{/jobs/job,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@39c11e6c{/jobs/job/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@324dcd31{/stages,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@503d56b5{/stages/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72bca894{/stages/stage,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2575f671{/stages/stage/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@329a1243{/stages/pool,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@ecf9fb3{/stages/pool/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2d35442b{/storage,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27f9e982{/storage/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4593ff34{/storage/rdd,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37d3d232{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@30c0ccff{/environment,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@581d969c{/environment/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22db8f4{/executors,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2b46a8c1{/executors/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1d572e62{/executors/threadDump,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29caf222{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@46cf05f7{/static,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7048f722{/,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c074c0c{/api,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58dea0a5{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a2bb0eb{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-07-20 10:26:52 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://XXX:4040
2018-07-20 10:26:52 INFO  SparkContext:54 - Added JAR file:XXX.jar at spark://XXX with timestamp 1532075212997
2018-07-20 10:26:53 INFO  Executor:54 - Starting executor ID driver on host localhost
2018-07-20 10:26:53 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42981.
2018-07-20 10:26:53 INFO  NettyBlockTransferService:54 - Server created on XXX:42981
2018-07-20 10:26:53 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-07-20 10:26:53 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, XXX, 42981, None)
2018-07-20 10:26:53 INFO  BlockManagerMasterEndpoint:54 - Registering block manager XXX:42981 with 366.3 MB RAM, BlockManagerId(driver, XXX, 42981, None)
2018-07-20 10:26:53 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, XXX, 42981, None)
2018-07-20 10:26:53 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, XXX, 42981, None)
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@66bfd864{/metrics/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:XXX').
2018-07-20 10:26:53 INFO  SharedState:54 - Warehouse path is 'file:XXX'.
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2dbd803f{/SQL,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e48e859{/SQL/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22d1886d{/SQL/execution,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7df60067{/SQL/execution/json,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@748fe51d{/static/sql,null,AVAILABLE,@Spark}
2018-07-20 10:26:53 INFO  StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
2018-07-20 10:26:55 INFO  CodeGenerator:54 - Code generated in 238.939456 ms
2018-07-20 10:26:55 INFO  CodeGenerator:54 - Code generated in 14.895801 ms
2018-07-20 10:26:55 INFO  SparkContext:54 - Starting job: show at XXXTransform.scala:21
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Got job 0 (show at XXXTransform.scala:21) with 1 output partitions
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (show at XXXTransform.scala:21)
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Parents of final stage: List()
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Missing parents: List()
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[4] at show at XXXTransform.scala:21), which has no missing parents
2018-07-20 10:26:55 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 7.3 KB, free 366.3 MB)
2018-07-20 10:26:55 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.5 KB, free 366.3 MB)
2018-07-20 10:26:55 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on XXX:42981 (size: 3.5 KB, free: 366.3 MB)
2018-07-20 10:26:55 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at XXXTransform.scala:21) (first 15 tasks are for partitions Vector(0))
2018-07-20 10:26:55 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 1 tasks
2018-07-20 10:26:55 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8067 bytes)
2018-07-20 10:26:55 INFO  Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-07-20 10:26:55 INFO  Executor:54 - Fetching spark://XXX.jar with timestamp 1532075212997
2018-07-20 10:26:55 INFO  TransportClientFactory:267 - Successfully created connection to XXX/10.1.97.135:36615 after 23 ms (0 ms spent in bootstraps)
2018-07-20 10:26:55 INFO  Utils:54 - Fetching spark://XXX.jar to /tmp/spark-853e3e5d-e228-473d-be50-c641d2b678a4/userFiles-7f3d8d8b-e9d4-4d9c-ac52-38e7fe59de7d/fetchFileTemp7750137601738248500.tmp
2018-07-20 10:26:55 INFO  Executor:54 - Adding file:/tmp/spark-853e3e5d-e228-473d-be50-c641d2b678a4/userFiles-7f3d8d8b-e9d4-4d9c-ac52-38e7fe59de7d/XXX.jar to class loader
2018-07-20 10:26:55 INFO  CodeGenerator:54 - Code generated in 5.345898 ms
2018-07-20 10:26:55 INFO  Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1139 bytes result sent to driver
2018-07-20 10:26:55 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 178 ms on localhost (executor driver) (1/1)
2018-07-20 10:26:55 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-07-20 10:26:55 INFO  DAGScheduler:54 - ResultStage 0 (show at XXXTransform.scala:21) finished in 0,321 s
2018-07-20 10:26:55 INFO  DAGScheduler:54 - Job 0 finished: show at XXXTransform.scala:21, took 0,356248 s
+-------------------+-------+
|column1> > > |column2|
+-------------------+-------+
|2015-03-29 01:11:23|null   |
+-------------------+-------+
2018-07-20 10:26:55 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2018-07-20 10:26:55 INFO  AbstractConnector:318 - Stopped Spark@7544a1e4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-07-20 10:26:55 INFO  SparkUI:54 - Stopped Spark web UI at http://XXX:4040
2018-07-20 10:26:55 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-07-20 10:26:55 INFO  MemoryStore:54 - MemoryStore cleared
2018-07-20 10:26:55 INFO  BlockManager:54 - BlockManager stopped
2018-07-20 10:26:55 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-07-20 10:26:55 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-07-20 10:26:55 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-07-20 10:26:55 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-07-20 10:26:55 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1b68790c-9242-480e-a0b4-1b9d7068b27a
2018-07-20 10:26:55 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-853e3e5d-e228-473d-be50-c641d2b678a4

2 回答

  • 1

    我试图重现错误,我成功了 . 我通过发现什么日期有问题开始我的调查,并在执行一个小的for循环后,我结束了:

    val spark = SparkSession.builder.appName("My Application").getOrCreate()
    for (hour <- 0 to 23) {
      println(s"checking ${hour}")
      val queryHour = if (hour < 10) {
        s"0${hour}"
      } else {
        hour
      }
      spark.sql("SELECT CAST(UNIX_TIMESTAMP('2015-03-29T01:11:23Z', \"yyyy-MM-dd'T'HH:mm:ss'Z'\") AS TIMESTAMP) as column1, " +
        "CAST(UNIX_TIMESTAMP('2015-03-29T"+queryHour+":11:23Z', \"yyyy-MM-dd'T'HH:mm:ss'Z'\") AS TIMESTAMP) as column2").show(false)
    }
    

    它返回:

    checking 0
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 00:11:23|
    +-------------------+-------------------+
    
    checking 1
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 01:11:23|
    +-------------------+-------------------+
    
    checking 2
    +-------------------+-------+
    |column1            |column2|
    +-------------------+-------+
    |2015-03-29 01:11:23|null   |
    +-------------------+-------+
    
    checking 3
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 03:11:23|
    +-------------------+-------------------+
    
    checking 4
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 04:11:23|
    +-------------------+-------------------+
    
    checking 5
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 05:11:23|
    +-------------------+-------------------+
    
    checking 6
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 06:11:23|
    +-------------------+-------------------+
    
    checking 7
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 07:11:23|
    +-------------------+-------------------+
    
    checking 8
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 08:11:23|
    +-------------------+-------------------+
    
    checking 9
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 09:11:23|
    +-------------------+-------------------+
    
    checking 10
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 10:11:23|
    +-------------------+-------------------+
    
    checking 11
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 11:11:23|
    +-------------------+-------------------+
    
    checking 12
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 12:11:23|
    +-------------------+-------------------+
    
    checking 13
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 13:11:23|
    +-------------------+-------------------+
    
    checking 14
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 14:11:23|
    +-------------------+-------------------+
    
    checking 15
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 15:11:23|
    +-------------------+-------------------+
    
    checking 16
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 16:11:23|
    +-------------------+-------------------+
    
    checking 17
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 17:11:23|
    +-------------------+-------------------+
    
    checking 18
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 18:11:23|
    +-------------------+-------------------+
    
    checking 19
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 19:11:23|
    +-------------------+-------------------+
    
    checking 20
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 20:11:23|
    +-------------------+-------------------+
    
    checking 21
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 21:11:23|
    +-------------------+-------------------+
    
    checking 22
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 22:11:23|
    +-------------------+-------------------+
    
    checking 23
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 01:11:23|2015-03-29 23:11:23|
    +-------------------+-------------------+
    

    因此,只有2 o 'clock was impacted. From there I dug into Spark'的datetimeExpressions.scala文件,其中进行日期时间内容的转换 . 更确切地说,在添加一些断点之后,我发现问题是由org.apache.spark.sql.catalyst.expressions.UnixTime#eval方法中的静默解析错误引起的,更确切地说:

    case StringType if right.foldable =>
        if (constFormat == null || formatter == null) {
            null
        } else {
            try {
              formatter.parse(t.asInstanceOf[UTF8String].toString).getTime / 1000L
            } catch {
                case NonFatal(_) => null
            }
        }
    

    并使用时区创建格式化程序:

    private lazy val formatter: DateFormat =
    try {
      DateTimeUtils.newDateFormat(constFormat.toString, timeZone)
    } catch {
      case NonFatal(_) => null
    }
    

    接下来,我隔离了引入问题的代码:

    val formatterParseError: DateFormat = DateTimeUtils.newDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'", DateTimeUtils.getTimeZone("Europe/Paris"))
    val formattedDateParseError = formatterParseError.parse("2015-03-29T02:11:23Z")
    

    实际上,我得到了一个ParseException:

    Unparseable date: "2015-03-29T02:11:23Z"
    java.text.ParseException: Unparseable date: "2015-03-29T02:11:23Z"
    at java.text.DateFormat.parse(DateFormat.java:366)
    at   com.waitingforcode.stackoverflow.SchemaTest$$anonfun$2.apply$mcV$sp(SchemaTest.scala:225)
    at com.waitingforcode.stackoverflow.SchemaTest$$anonfun$2.apply(SchemaTest.scala:218)
    at com.waitingforcode.stackoverflow.SchemaTest$$anonfun$2.apply(SchemaTest.scala:218)
    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    

    从那里我检查了Javadoc https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html#timezone的时区管理 . 似乎当时区表示为'Z'(ISO 8601时区)时,使用的正确模式选项是X.当我在隔离的测试用例中更改它时,日期格式正确:

    val formatter: DateFormat = DateTimeUtils.newDateFormat("yyyy-MM-dd'T'HH:mm:ssX", DateTimeUtils.getTimeZone("Europe/Paris"))
    val formattedDate = formatter.parse("2015-03-29T02:11:23Z")
    println(s"formattedDate = ${formattedDate}")
    // formattedDate = Sun Mar 29 04:11:23 CEST 2015
    

    它也适用于Spark的SQL(转换为欧洲/巴黎时区,你在 spark.sql.session.timeZone 选项中更改它):

    checking 0
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 01:11:23|
    +-------------------+-------------------+
    
    checking 1
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 03:11:23|
    +-------------------+-------------------+
    
    checking 2
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 04:11:23|
    +-------------------+-------------------+
    
    checking 3
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 05:11:23|
    +-------------------+-------------------+
    
    checking 4
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 06:11:23|
    +-------------------+-------------------+
    
    checking 5
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 07:11:23|
    +-------------------+-------------------+
    
    checking 6
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 08:11:23|
    +-------------------+-------------------+
    
    checking 7
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 09:11:23|
    +-------------------+-------------------+
    
    checking 8
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 10:11:23|
    +-------------------+-------------------+
    
    checking 9
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 11:11:23|
    +-------------------+-------------------+
    
    checking 10
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 12:11:23|
    +-------------------+-------------------+
    
    checking 11
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 13:11:23|
    +-------------------+-------------------+
    
    checking 12
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 14:11:23|
    +-------------------+-------------------+
    
    checking 13
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 15:11:23|
    +-------------------+-------------------+
    
    checking 14
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 16:11:23|
    +-------------------+-------------------+
    
    checking 15
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 17:11:23|
    +-------------------+-------------------+
    
    checking 16
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 18:11:23|
    +-------------------+-------------------+
    
    checking 17
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 19:11:23|
    +-------------------+-------------------+
    
    checking 18
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 20:11:23|
    +-------------------+-------------------+
    
    checking 19
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 21:11:23|
    +-------------------+-------------------+
    
    checking 20
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 22:11:23|
    +-------------------+-------------------+
    
    checking 21
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-29 23:11:23|
    +-------------------+-------------------+
    
    checking 22
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-30 00:11:23|
    +-------------------+-------------------+
    
    checking 23
    +-------------------+-------------------+
    |column1            |column2            |
    +-------------------+-------------------+
    |2015-03-29 03:11:23|2015-03-30 01:11:23|
    +-------------------+-------------------+
    

    回顾一下,如果失败则返回null,我们可以从以下方面学习:

    / ** 将unix epoch(1970-01-01 00:00:00 UTC)的秒数转换为字符串,表示给定*格式的当前系统时区中该时刻的时间戳 . 如果缺少格式,请使用“1970-01-01 00:00:00”之类的格式 . *请注意,hive语言手册表示如果失败则返回0,但实际上它返回null . * /

    我们可以使用类似ISO的时区选项(X)来解决它 .

  • 2

    这有点傻但是2015-03-29 02:00:00和2015-03-29 02:59:59之间的小时不存在!它与2015年3月29日发生的DST时间变化有关 .

    例如,请参见:Timestamp 2015-03-29 02:00:00 can not be stored

    如果在您的上下文中将所有时间戳都视为没有发生DST的UTC,则设置为:

    spark.conf.set(“spark.sql.session.timeZone”,“UTC”),它会工作 .

    Spark Strutured Streaming automatically converts timestamp to local time

相关问题