我在我的ubuntu 14.04上安装了cassandra 2.1.11,spark 2.0.0.bin hadoop 2.7和java version 1.8.0_101 . 对于Spark Cassandra Connector,我安装了git
sudo apt-get install git
git clone https://github.com/datastax/spark-cassandra-connector.git
并 Build 它
cd spark-cassandra-connector
git checkout v1.4.0
./sbt/sbt assembly
并将scala jar放在主目录中
cp spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.4.0-SNAPSHOT.jar ~
并使用了连接器
bin/spark-shell --jars ~/spark-cassandra-connector-assembly-1.4.0-SNAPSHOT.jar
并在scala promt
sc.stop
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)
我从cqlsh创建了 test keyspace和table my_table 并测试连接,我运行了以下命令
eval test_spark_rdd = sc.cassandraTable("test", "my_table")
并得到了错误
error: missing or invalid dependency detected while loading class file 'CassandraConnector.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'CassandraConnector.class' was compiled against an incompatible version of org.apache.spark.
这是由于火花和cassandra的版本不匹配?
1 回答
这是Spark和Spark之间的不匹配 . 您选择在Spark 2.0.0中使用1.4.0库 .
使用2.0.0版本并使用Spark包 .
https://spark-packages.org/package/datastax/spark-cassandra-connector