首页 文章

Spark java.lang.NoClassDefFoundError中spark-cassandra-connector出错:com / datastax / driver / core / ProtocolOptions $ Compression

提问于
浏览
1

当我尝试使用spark-cassandra-connector连接到cassandra时出现此错误:

线程“main”中的异常java.lang.NoClassDefFoundError:com / datastax / driver / core / ProtocolOptions $ com.datastax中的com.datastax.spark.connector.cql.CassandraConnectorConf $ . (CassandraConnectorConf.scala:112)中的压缩 . com.datastax.spark.connector.cql.CassandraConnector $ .apply(CassandraConnector.scala:192)中的spark.connector.cql.CassandraConnectorConf $ . (CassandraConnectorConf.scala)位于com.datastax.spark.connector.SparkContextFunctions.cassandraTable $ default $(SparkContextFunctions.scala:48)位于main.scala.TestSpark $ .main(TestSpark.scala:19)的main.scala.TestSpark.main(TestSpark.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)位于org.apache的java.lang.reflect.Method.invoke(Method.java:606)的sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) .spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:672)org.a pache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at org.apache.spark.deploy.SparkSubmit $ . main(SparkSubmit.scala:120)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)引起:java.lang.ClassNotFoundException:com.datastax.driver.core.ProtocolOptions $ java.net上的压缩 . URLClassLoader $ 1.run(URLClassLoader.java:366)位于java.net.URLClassLoader $ 1.run(URLClassLoader.java:355)java.security.AccessController.doPrivileged(Native Method)位于java.net.URLClassLoader.findClass(URLClassLoader . java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:308)at java.lang.ClassLoader.loadClass(ClassLoader.java:358 )... ... 15我在spark类路径中添加了jar spark-cassandra-connector_2.11-1.5.0-M2.jar

我在sbt文件中添加了依赖项:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1"

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M2"

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.5.0-M2"

这是我尝试执行的scala程序:

package main.scala


import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._

/**
 * Created by Simo on 01.12.15.
 */
object TestSpark {
  def main(args: Array[String]) {
   val conf = new SparkConf(true)
        .set("spark.cassandra.connection.host", "54.229.218.236")
        .setAppName("Simple Application")
    val sc= new SparkContext("local", "test", conf)
    val rdd = sc.cassandraTable("test", "kv")
    println(rdd.count)
    println(rdd.first)
    println(rdd.map(_.getInt("value")).sum)
  }
}

这就是我运行它的方式:

$ sbt package
$ $SPARK_HOME/bin/spark-submit --class "main.scala.TestSpark" target/scala-2.11/simple-project_2.11-1.0.jar

你能帮我理解我做错了什么吗?

谢谢!

Edit:

我试图在依赖项列表和spark类路径中添加Datastax驱动程序:

libraryDependencies += "com.datastax.cassandra" % "cassandra-driver-core" % "2.1.9"
libraryDependencies += "com.datastax.cassandra" % "cassandra-driver-mapping" % "2.1.9"

最后一个错误不再出现,但现在我有另一个错误:

线程“main”中的异常java.lang.NoSuchMethodError:scala.runtime.ObjectRef.zero()Lscala / runtime / ObjectRef; at com.datastax.spark.connector.cql.CassandraConnector $ .com $ datastax $ spark $ connector $ cql $ CassandraConnector $$ createSession(CassandraConnector.scala)at com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2 . 应用(CassandraConnector.scala:150)com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2.apply(CassandraConnector.scala:150)at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache . scala:31)com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)at com.datastax com.datastax.spark.connector.cql.Schema中的com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120)中的.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109) $ .fromCassandra(Schema.scala:241)at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider $ class.tableDef(Cassandr) aTableRowReaderProvider.scala:51)com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef $ lzycompute(CassandraTableScanRDD.scala:59)at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:59) at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider $ class.verify(CassandraTableRowReaderProvider.scala:146)at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:59)at com.datastax.spark位于org.apache.spache.spark.rdd.RDD的org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239)的.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:143) $$ anonfun $在org.apache.spark.rdd.RDD.partitions(RDD.scala:237)的scala.Option.getOrElse(Option.scala:120)处分区$ 2.apply(RDD.scala:237) . apache.spark.SparkContext.runJob(SparkContext.scala:1919)at org.apache.spark.rdd.RDD.count(RDD.scala:1121)at the main.scala.TestSpark $ .main(TestSpark.scala:20)at main.scala.TestSpark.main(TestSpark.scala)at sun at.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke (Method.java:606)org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:672)at org.apache.spark.deploy.SparkSubmit $ . doRunMain $ 1(SparkSubmit.scala:180)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:120)at at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

编辑2:在编译时创建scala 2.10.6(与spark的scala版本相同)上一个错误不再出现但我有这个新错误:

com.datastax中的com.datastax.spark.connector.cql.DefaultConnectionFactory $ .clusterBuilder(CassandraConnectionFactory.scala:36)中的线程“main”java.lang.NoClassDefFoundError:com / google / common / util / concurrent / AsyncFunction中的异常.spark.connector.cql.DefaultConnectionFactory $ .createCluster(CassandraConnectionFactory.scala:85)at com.datastax.spark.connector.cql.CassandraConnector $ .com $ datastax $ spark $ connector $ cql $ CassandraConnector $$ createSession(CassandraConnector.scala) :155)在com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2.apply(CassandraConnector.scala:150)at com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2.apply(CassandraConnector.scala) :150)com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)at com.datastax . com.datastax.spark.con上的spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) nector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120)at com.datastax.spark.connector.cql.Schema $ .fromCassandra (Schema.scala:241)com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider $ class.tableDef(CassandraTableRowReaderProvider.scala:51)at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef $ lzycompute(CassandraTableScanRDD.scala) :59)com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:59)at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider $ class.verify(CassandraTableRowReaderProvider.scala:150)at com . 位于org.apache.spark.rdd.RDD的com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:143)中的datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:59) $ anonfun $在org.apache.spark上分区$ 2.apply(RDD.scala:239) . rdd.RDD $$ anonfun $ partition $ 2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)在org.apache.spark.SparkContext.runJob(SparkContext.scala:1919)org.apache.spark.rdd.RDD.count(RDD.scala:1121)at main.scala.TestSpark $ .main(TestSpark.scala: 20)at.sherSpark.main(TestSpark.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)在java.lang.reflect.Method.invoke(Method.java:606)的org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit) .scala:672)在org.apache的org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at org.apache org.apache.spark.deploy.SparkSu上的.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:120) bmit.main(SparkSubmit.scala)由java.net.URLClassLoader中的java.net.URLClassLoader $ 1.run(URLClassLoader.java:366)中的java.lang.ClassNotFoundException:com.google.common.util.concurrent.AsyncFunction引起的$ 1.run(URLClassLoader.java:355)在java.security.AccessController.doPrivileged(Native Method)java.net.URLClassLoader.findClass(URLClassLoader.java:354)java.lang.ClassLoader.loadClass(ClassLoader.java: 425)at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:308)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)......还有34个

2 回答

  • 2

    最后使用@Odomontois建议的sbt-assembly解决

    这是最终的build.sbt:

    name := "Simple Project"
    
    version := "1.0"
    
    scalaVersion := "2.10.6"
    
    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" % "provided"
    
    libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1" % "provided"
    
    libraryDependencies += "com.datastax.cassandra" % "cassandra-driver-core" % "2.1.9"
    
    libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.5.0-M2"
    
    
    
    jarName in assembly :="my-project-assembly.jar"
    
    assemblyOption in assembly := (assemblyOption in             assembly).value.copy(includeScala = false)
    
    
    resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
    
    mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
        {
            case PathList("netty", "handler", xs @ _*)         => MergeStrategy.first
            case PathList("netty", "buffer", xs @ _*)     => MergeStrategy.first
            case PathList("netty", "common", xs @ _*)     => MergeStrategy.first
            case PathList("netty", "transport", xs @ _*)     => MergeStrategy.first
            case PathList("netty", "codec", xs @ _*)     => MergeStrategy.first
    
            case PathList("META-INF", "io.netty.versions.properties") => MergeStrategy.first
            case x => old(x)
            }
        }
    
  • 0

    您还需要添加Datastax Cassandra Driver的依赖关系(根据Spark-cassandra连接器的版本): - https://repo1.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/

相关问题