首页 文章

spark数据帧转换JSON到ORC遇到“列ambignous异常”

提问于
浏览
0

我正在使用spark数据帧,读取JSON数据,然后将其保存到orc . 代码很简单:

DataFrame json = sqlContext.read().json(input);

json.write().format("orc").save(output);

工作失败了 . 这个例外有什么问题?谢谢 .

线程“main”中的异常org.apache.spark.sql.AnalysisException:引用'Canonical_URL'不明确,可能是:Canonical_URL#960,Canonical_URL#1010 . org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:279)at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:116) at org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 8 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 16.apply(Analyzer.scala:350)at org.apache.spark.sql . catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 8 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 16.apply(Analyzer.scala:350)at org.apache.spark.sql.catalyst.analysis.package $ .withPosition( package.scala:48)at org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 8 $$ anonfun $ applyOrElse $ 4.applyOrElse(Analyzer.scala:350)atg.apache.spark . sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 8 $$ anonfun $ applyOrElse $ 4.applyOrElse(Analyzer.scala:341)at org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1 .apply(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $在org.apache.spark.sql.cr响应 . transformUp(TreeNode.scala:285)atg.apache.spark.sql.catalyst.plans.QueryPlan.org $ apache $ spark $ sql $ catalyst $ plans $ QueryPlan $$ transformExpressionUp $ 1(QueryPlan.scala:108)at org . apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ 2 $$ anonfun $ apply $ 2.apply(QueryPlan.scala:123)at scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:244 )scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:244)scala.collection.immutable.List.foreach(List.scala:318)at scala.collection.TraversableLike $ class.map( TraversableLike.scala:244)scala.collection.AbstractTraversable.map(Traversable.scala:105)at sca.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ 2.apply(QueryPlan.scala:122)at scala scala.collection.Iterator $ class中的.collection.Iterator $$ anon $ 11.next(Iterator.scala:328) .foreach(Iterator.scala:727)scala.collection.AbstractIterator.foreach(Iterator.scala:1157)at scala.collection.generic.Growable $ class . $ plus $ plus $ eq(Growable.scala:48)at scala scala.collection.mutable.ArrayBuffer上的.collection.mutable.ArrayBuffer . $ plus $ plus $ eq(ArrayBuffer.scala:103) . 在scala.collection.TraversableOnce $ class上加$ $ plus $ eq(ArrayBuffer.scala:47) .to(TraversableOnce.scala:273)scala.collection.AbstractIterator.to(Iterator.scala:1157)scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)at scala.collection.AbstractIterator.toBuffer( Iterator.scala:1157)scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)at org.apache.spark.sql.catalyst.plans .QueryPlan.transformExpressionsUp(QueryPlan.scala:127)atg.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 8.applyOrElse(Analyzer.scala:341)at org.apache.spark.sql .catalyst.analysis.An alyzer $ ResolveReferences $$ anonfun $ apply $ 8.applyOrElse(Analyzer.scala:243)org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:286)at org . apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:286)at org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:51) org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)at org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:243)at at org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:242)at org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:61)at org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:59)at scala.collection . LinearSeqOptimized $ class.foldLeft(LinearSeqOptimized.scala:111)at scala.collection.immutable.List.foldLeft(List.scala:84) 在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $执行$ 1.apply(RuleExecutor.scala:59)org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply (RuleExecutor.scala:51)scla.collection.immutable.List.foreach(List.scala:318)atg.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:51)at org . apache.spark.sql.SQLContext $ QueryExecution.analyzed $ lzycompute(SQLContext.scala:933)org.apache.spark.sql.SQLContext $ QueryExecution.analyzed(SQLContext.scala:933)at org.apache.spark.sql . 在org.apache.spark.sql.DataFrame $ .apply(DataFrame.scala:51)的org.apache.spark.sql.DataFrame . (DataFrame.scala:131)中的SQLContext $ QueryExecution.assertAnalyzed(SQLContext.scala:931) org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:132)atg.apache.spark.sql.execution.ExecutedCommand.sideEffectResult $ lzycompute(commands.scala:57)at org.apache.spark org.apache.spar中的.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)位于org.apache.spark的org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:88)的k.sql.execution.ExecutedCommand.doExecute(commands.scala:68) . sql.execution.SparkPlan $$ anonfun $执行$ 1.apply(SparkPlan.scala:88)org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:147)at org.apache.spark.sql.execution .SparkPlan.execute(SparkPlan.scala:87)org.apache.spark.sql.SQLContext $ QueryExecution.toRdd $ lzycompute(SQLContext.scala:950)at org.apache.spark.sql.SQLContext $ QueryExecution.toRdd(SQLContext .scala:950)atg.apache.spark.sql.sources.ResolvedDataSource $ .apply(ddl.scala:336)atg.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)at org.apache .spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)位于Main.main(Main.java:70)的com.es.infrastructure.spark.orc.transformer.JsonTransformer.run(JsonTransformer.java:22) sun.reflect.NativeMethodAccessorImpl.i中的sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在org.apache.spark.deploy.SparkSubmit的java.lang.reflect.Method.invoke(Method.java:606)的sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中的nvoke(NativeMethodAccessorImpl.java:57) $ .org $ apache $ spark $ deploy $ org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:170)org.apache.spark.deploy上的$ SparkSubmit $$ runMain(SparkSubmit.scala:665) .SparkSubmit $ .submit(SparkSubmit.scala:193)atg.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:112)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

1 回答

  • 1

    密钥必须具有相同的名称 . 在创建df并进行一些连接时,您需要删除其中一个键 .

相关问题