首页 文章

Glue目录中的表如何引发NullPointerException?

提问于
浏览
0

我目前正在尝试从RDS实例在AWS Glue中将两个表连接在一起,并且在成功抓取数据库结构之后,我正在设置我的工作:

table1 = (
    glueContext.create_dynamic_frame
    .from_catalog(database="transact", table_name="transact_table1")
)
table1.printSchema()
print "Count: ", table1.count()

table2 = (
    glueContext.create_dynamic_frame
    .from_catalog(database="transact", table_name="transact_table2")
)
table2.printSchema()
print "Count: ", table2.count()

奇怪的是,这项工作失败了:

File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 275, in count
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o64.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 26, ip-10-1-2-4.ec2.internal, executor 1): java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:427)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:425)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)

奇怪的是它只发生在 second count() 中,因为我可以在日志中看到它正确打印出 table1 模式并计数 . And it also prints the table2 schema.

我之所以这样做,是因为我试图将这些动态帧组合在一起,而且它也失败了 NullPointerException . 是什么赋予了?我可能错过了一些配置吗?

UPDATE 1 :特别是第二张 table 似乎有问题 . 从目录中挑选一些其他表作为 table2 使作业成功 .

所以我将把问题转化为:目录中的表如何与其他表中的表不同会导致它引发此错误?

1 回答

  • 0
    print "Count: ", table2count()
    

    将问题发布到SO时是否是拼写错误,如果不是,请检查代码?不应该是 table2.count()

相关问题