我正在尝试使用Scala连接Spark和Cassandra,如此处所述http://www.planetcassandra.org/blog/kindling-an-introduction-to-spark-with-cassandra/我在 Headers 下的步骤中遇到错误:
“将连接器加载到Spark Shell:”
val test_spark_rdd = sc.cassandraTable(“test_spark”,“test”)
test_spark_rdd.first 使用上面的命令(粗体)
它显示错误 Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException
我在这里上传了完整的堆栈跟踪
https://docs.google.com/document/d/1UjGXKifD6chq7-WrHd3GT3LoNcw8GawxAPeOtiEjKvM/edit?usp=sharing
cassandra.YAML文件中的一些rpc设置是:
rpc_address: localhost
# rpc_interface: eth1
# rpc_interface_prefer_ipv6: false
# port for Thrift to listen for clients on
rpc_port: 9160
My spark-defaults config file
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
#spark.serializer org.apache.spark.serializer.KryoSerializer
#spark.driver.memory 5g
#spark.executor.extraJavaOptions -XX:+PrintGCDetails -#Dkey=value -Dnumbers="one two three"
spark.cassandra.connection.host localhost
1 回答
看起来问题是底层分叉执行程序进程无法启动或对本地文件系统执行某些操作 . 确保Executor Process可以访问默认的spark目录 .