我有一台带有3台机器的DSE集群:1,2和3 .
当我向主人提交申请时,如果我理解得很好,就会发生这样的事情:
-
Master接收应用程序并分配资源
-
驱动程序开始运行到已分配的工作程序
-
驱动程序在群集的其他节点上运行Executor以共享工作负载
所以我们在这个集群中有这个配置:
-
1是师父,有 Worker 1
-
2是奴隶,已经工作2
-
3是奴隶,有 Worker 3
当Spark为驱动程序选择worker 1(master)时,一切运行正常 . 但是当Spark决定将worker 2(slave)或worker 3(slave)分配给驱动程序时,它会尝试绑定master的ip并且每次都失败:
INFO 16:20:45 Changing view acls to: cassandra
INFO 16:20:45 Changing modify acls to: cassandra
INFO 16:20:45 SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cassandra); users with modify permissions: Set(cassandra)
INFO 16:20:45 Slf4jLogger started
ERROR 16:20:46 failed to bind to /10.1.1.1:0, shutting down Netty transport
WARN 16:20:46 Service 'Driver' could not bind on port 0. Attempting port 1.
INFO 16:20:46 Slf4jLogger started
ERROR 16:20:46 failed to bind to /10.1.1.1:0, shutting down Netty transport
WARN 16:20:46 Service 'Driver' could not bind on port 0. Attempting port 1.
每个节点的配置都很简单:
export SPARK_LOCAL_IP="10.1.1.1" // or .2 or .3
export SPARK_PUBLIC_DNS="xx.xx.xx.xx"
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=7080
export SPARK_DRIVER_HOST="10.1.1.1" // or .2 or .3
export SPARK_WORKER_INSTANCES=1
export SPARK_DRIVER_MEMORY="10G"
我试图在spark-defaults.conf中设置spark.driver.port,但它没有任何效果 .
这是提交电话:
/usr/bin/dse spark-submit --properties-file production.conf --master spark://10.1.1.1:7077 --deploy-mode cluster --class "com.company.SignalIO" aggregation.jar 2015-6-1-00:00:00 2015-6-2-00:00:00 signal_table
有任何想法吗?