首页 文章

spark-submit命令包括mysql连接器

提问于
浏览
0

我有一个scala对象文件,它在内部查询mysql表做一个连接并将数据写入s3,在本地测试我的代码它运行得很好 . 但是当我将它提交到集群时,它会抛出以下错误:

线程“main”java.sql.SQLException中的异常:在org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun的java.sql.DriverManager.getDriver(DriverManager.java:315)上没有合适的驱动程序$ 2.apply(JdbcUtils.scala:54)位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 2.apply(JdbcUtils.scala:54)scala.Option.getOrElse(Option.scala: 121)org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .createConnectionFactory(JdbcUtils.scala:53)at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $ .resolveTable(JDBCRDD . scala:123)org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation . (JDBCRelation.scala:117)at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala) :53)org.apache中的org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) . spark.sql.DataFrameReader.load(DataFrameRea der.scala:122)在QuaterlyAudit $ .main(QuaterlyAudit.scala:51)at QuaterlyAudit.main(QuaterlyAudit.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl . java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.SparkSubmit $ .org $在org.apache.spark.deploy.SparkSubmit $的org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:185)的apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:736) . 在org.apache.spark.spark.deploy.SparkSubmit.main(SparkSubmit.scala)的org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:124)中提交(SparkSubmit.scala:210)

下面是我的sparksubmit命令:

nohup spark-submit --class QuaterlyAudit --master yarn-client --num-executors 8 
--driver-memory 16g --executor-memory 20g --executor-cores 10 /mypath/campaign.jar &

我正在使用sbt,我在sbt程序集中包含mysql连接器,下面是我的build.sbt文件:

name := "mobilewalla"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
  "org.apache.hadoop" % "hadoop-aws" % "2.6.0" intransitive(),
  "mysql" % "mysql-connector-java" % "5.1.37")

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs@_*) =>
    xs.map(_.toLowerCase) match {
      case ("manifest.mf" :: Nil) |
       ("index.list" :: Nil) |
       ("dependencies" :: Nil) |
       ("license" :: Nil) |
       ("notice" :: Nil) => MergeStrategy.discard
  case _ => MergeStrategy.first // was 'discard' previousely
}
  case "reference.conf" => MergeStrategy.concat
  case _ => MergeStrategy.first
}
assemblyJarName in assembly := "campaign.jar"

我也尝试过:

nohup spark-submit --driver-class-path /mypath/mysql-connector-java-5.1.37.jar 
--class QuaterlyAudit --master yarn-client --num-executors 8 --driver-memory   16g 
--executor-memory 20g --executor-cores 10 /mypath/campaign.jar &

但仍然没有运气,我在这里失踪了什么 .

1 回答

  • 0

    很明显,Spark无法获得JDBC JAR . 可以修复的工作很少 . 毫无疑问,很多人都面临这个问题 . 这是因为Jar没有上传到驱动程序和执行程序 .

    • 您可能希望使用构建管理器(Maven,SBT)组装应用程序,因此您无需在 spark-submit cli中添加依赖项 .

    • 您可以在 spark-submit cli中使用以下选项: --jars $(echo ./lib/*.jar | tr ' ' ',')

    • 您还可以尝试在SPARK_HOME / conf / spark-default.conf文件中配置这两个变量: spark.driver.extraClassPathspark.executor.extraClassPath ,并将这些变量的值指定为jar文件的路径 . 确保工作节点上存在相同的路径 .

相关问题