首页 文章

Spark Scala 无法将数据推送到 Hive 表中

提问于
浏览
0

我试图在现有的配置单元表中推送数据,我已经在配置单元中创建了 orc 表,但是无法在配置单元中推送数据。如果我在 Spark 控制台上复制粘贴但无法通过 spark-submit 运行,则此代码有效。

import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object TestCode {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("first example").setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    for (i <- 0 to 100 - 1) {
    //  sample value but it replace with business logic. and try to push into table.for loop consider as business logic.
      var fstring = "fstring" + i
      var cmd = "cmd" + i
      var idpath = "idpath" + i
      import sqlContext.implicits._
      val sDF = Seq((fstring, cmd, idpath)).toDF("t_als_s_path", "t_als_s_cmd", "t_als_s_pd")
      sDF.write.insertInto("l_sequence");
      //sDF.write.format("orc").saveAsTable("l_sequence");
      println("write data ==> " + i)
    }
   }

给出错误。

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: l_sequence;
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:449)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:455)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:443)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:65)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:63)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:51)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:69)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:259)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:239)
        at com.hq.bds.Helloword$$anonfun$main$1.apply$mcVI$sp(Helloword.scala:16)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at com.hq.bds.Helloword$.main(Helloword.scala:10)
        at com.hq.bds.Helloword.main(Helloword.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

1 回答

  • 0

    您需要将 hive-site.xml 与 spark conf 链接或将 hive-site.xml 复制到 spark conf 目录。 Spark 无法找到您的配置单元 metastore(默认情况下是 derby 数据库),因此,我们必须链接 hive-conf 到 spark conf 目录。

    最后,要将 Spark SQL 连接到现有的 Hive 安装,必须将 hive-site.xml 文件复制到 Spark 的配置目录($SPARK_HOME/conf)。如果您没有现有的 Hive 安装,Spark SQL 仍将运行。

    Sudo 到 root 用户,然后将 hive-site 复制到 spark conf 目录。

    sudo -u root 
    cp /etc/hive/conf/hive-site.xml /etc/spark/conf
    

相关问题