我们的产品是5节点Cassandra集群,并试图通过数据交换连接器访问Spark scala代码中的cassandra表 . 我们的火花作业需要比平时更长的时间,并发现其中一个cassandra节点已关闭,导致延迟 .
Spark 2.1.0“com.datastax.spark”%%“spark-cassandra-connector”%“2.0.0-M3”scala 2.11.8
SparkSession.builder().
appName(jobName).
config("spark.cassandra.connection.host", "27.0.5.126").
config("spark.cassandra.connection.port", "9042").
config("spark.cassandra.input.consistency.level", "ONE").
config("spark.cassandra.output.consistency.level", "ONE").
config("hive.exec.dynamic.partition", "true").
config("hive.exec.dynamic.partition.mode", "nonstrict").
config("hive.enforce.bucketing","true").
enableHiveSupport().getOrCreate()
val alerts = spark.read.format("org.apache.spark.sql.cassandra").
options(Map("table"->tableName, "keyspace" -> keyspace)).
load().collect()
当我们指定spark配置中的所有节点时,spark连接器会在超时之前等待2分钟以创建连接池 .
config("spark.cassandra.connection.host", "27.0.5.126,27.0.4.223,27.0.6.85,27.0.6.59").
18/03/02 19:10:40 INFO NettyUtil:在类路径中发现Netty的本地epoll传输,使用它18/03/02 19:10:40 INFO群集:新Cassandra主机/27.0.5.126:9042添加18/03 / 02 19:10:40 INFO群集:新Cassandra主持人/27.0.4.223:9042添加18/03/02 19:10:40 INFO群集:新Cassandra主持人/27.0.6.85:9042已添加18/03/02 19: 10:40 INFO群集:新Cassandra主持人/27.0.6.59:9042添加18/03/02 19:10:40 INFO群集:新Cassandra主持人/27.0.6.187:9042添加18/03/02 19:10:40 INFO CassandraConnector:连接到Cassandra集群:测试集群18/03/02 19:12:40 WARN会话:创建池到/27.0.6.187:9042时出错com.datastax.driver.core.exceptions.ConnectionException:[/ 27.0.6.187]在初始化期间,在com.com上的com.datastax.driver.core.HostConnectionPool $ 2.onSuccess(HostConnectionPool.java:149)的com.datastax.driver.core.HostConnectionPool $ 2.onSuccess(HostConnectionPool.java:135)处关闭了池 . datastax.spark.connector.google.common.util.concurrent.Futures $ 6.run(期货.java:1319)at
当我通过单个cassandra主机进行cassandra连接(一个工作)时,由于负载 balancer 策略,它会尝试连接到数据中心1中的所有节点,并在坏节点上超时,这需要将近2分钟 .
18/03/02 21:17:37 INFO NettyUtil:在类路径中找到Netty的本地epoll传输,使用它18/03/02 21:17:37 INFO Cluster:新Cassandra主机/27.0.5.126:9042添加18/03 / 02 21:17:37 INFO LocalNodeFirstLoadBalancingPolicy:已添加主机27.0.5.126(datacenter1)18/03/02 21:17:37 INFO群集:新Cassandra主机/27.0.4.223:9042已添加18/03/02 21:17: 37 INFO LocalNodeFirstLoadBalancingPolicy:已添加主机27.0.4.223(datacenter1)18/03/02 21:17:37 INFO群集:新Cassandra主机/27.0.6.85:9042已添加18/03/02 21:17:37 INFO群集:新Cassandra主机/27.0.6.59:9042已添加18/03/02 21:17:37 INFO LocalNodeFirstLoadBalancingPolicy:已添加主机27.0.6.59(datacenter1)18/03/02 21:17:37 INFO群集:新Cassandra主机/27.0.6.187: 9042添加18/03/02 21:17:37 INFO LocalNodeFirstLoadBalancingPolicy:添加主机27.0.6.187(datacenter1)18/03/02 21:17:37 INFO CassandraConnector:连接到Cassandra集群:测试集群
如果任何一个节点在Casssandra集群中关闭,我该如何避免这种延迟