首页 文章

Spark jdbc.write到mysql,出现null错误

提问于
浏览
1

我在Dataframe中创建一个列,该列被设置为null(通过None),但是当发送到JDBC时,我得到“无法获取null的JDBC类型” . 任何帮助,将不胜感激 .

update_func = (when(col("SN") != col("SNORIGINAL"), None)) 

aPACKAGEDF = aPACKAGEDF.withColumn('SNORIGINAL_TEMPCOL', update_func)

java.lang.IllegalArgumentException:无法在org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils获取null的JDBC类型$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ jdbc $ JdbcUtils $$ getJdbcType $ 2.apply(JdbcUtils.scala:175)at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ jdbc $ JdbcUtils $$在scala.Option.getOrElse(Option.scala:121)的getJdbcType $ 2.apply(JdbcUtils.scala:175)org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .org $ apache $ spark $ sql $在org.apache的org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 20.apply(JdbcUtils.scala:635)执行$ datasources $ jdbc $ JdbcUtils $$ getJdbcType(JdbcUtils.scala:174) .scark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 20.apply(JdbcUtils.scala:635)at scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)at scala.collection .TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)at scala.collection.Indexed seqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)at scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)at scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)at scala .collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)位于org.apache.spark的org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .savePartition(JdbcUtils.scala:635) . sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ saveTable $ 1.apply(JdbcUtils.scala:821)at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ saveTable $ 1.apply(JdbcUtils .scala:821)org.apache.spark.rdd.RDD $$ anonfun $ foreachPartition $ 1 $$ anonfun $ apply $ 29.apply(RDD.scala:929)at org.apache.spark.rdd.RDD $$ anonfun $ foreachPartition $ 1 $$ anonfun $ apply $ 29.apply(RDD.scala:929)org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2067)at org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2067)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.sca) la:87)atg.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)at java.util.concurrent .ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)

1 回答

  • 1

    那是因为 None

    update_func = (when(col("SN") != col("SNORIGINAL"), None))
    

    没有定义的类型 . 请改用铸造的文字 . 例如,如果type应为string( VARCHAR 或类似):

    from pyspark.sql.functions import lit
    
    update_func = when(col("SN") != col("SNORIGINAL"), lit(None).cast("string"))
    

相关问题