首页 文章

时间戳未从Glue加载到Redshift表中

提问于
浏览
0

我在一个存储在S3中的csv文件中有 YYYY-MM-DD XX:XX:XX 格式的时间戳,但是当我使用时间戳数据类型使用Glue加载到Redshift数据库时,timestamp列为null . 看来格式是有效的,但我也尝试过 YYYYMMDD XXXXXXYYMMDD XX:XX:XX 格式 .

我在Glue中的映射从时间戳到时间戳,表的列数据类型也是时间戳 . 以csv格式提供的数据:

1,2016 Summer,2016-06-22 00:00:00

实际产量:

Line |    Term     |    Date
-----+-------------+------------
   1 | 2016 Summer |

预期产出:

Line |    Term     |        Date
-----+-------------+---------------------
   1 | 2016 Summer | 2016-06-22 00:00:00

看起来这应该是一个简单的任务,但我无法做到正确,所以如果其他人能找到我的错误,那将非常感激 .

码:

val datasource37 = glueContext.getCatalogSource(database = "data", tableName = "term", redshiftTmpDir = "", transformationContext = "datasource37").getDynamicFrame()
val applymapping37 = datasource37.applyMapping(mappings = Seq(("id", "bigint", "id", "bigint"), ("name", "string", "name", "varchar(256)"), ("date", "timestamp", "date_start", "timestamp")), caseSensitive = false, transformationContext = "applymapping37")
val resolvechoice37 = applymapping37.resolveChoice(choiceOption = Some(ChoiceOption("make_cols")), transformationContext = "resolvechoice37")
val dropnullfields37 = resolvechoice37.dropNulls(transformationContext = "dropnullfields37")
val datasink37 = glueContext.getJDBCSink(catalogConnection = "dataConnection", options = JsonOptions("""{"dbtable": "term", "database": "data"}"""), redshiftTmpDir = args("TempDir"), transformationContext = "datasink37").writeDynamicFrame(dropnullfields37)

1 回答

  • 1

    我最终从字符串 - >时间戳映射,它工作 . 胶水从时间戳 - >时间戳自动映射,所以我认为它是正确的 .

    例如:

    val applymapping37 = datasource37.applyMapping
                         (mappings = Seq(("id", "bigint", "id", "bigint"),
                         ("name", "string", "name", "varchar(256)"), 
                         ("date", "string", "date_start", "timestamp")),
                         caseSensitive = false, transformationContext = "applymapping37")
    

相关问题