我在spark 1.4.1上设置了一个spark集群,当启动worker时,我这样做:1)设置conf文件:

# A Spark Worker will be started on each of the machines listed below.
localhost
10.0.0.4

2)启动集群:

$ ./sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/sbin/../logs/spark-romain-org.apache.spark.deploy.master.Master-1-romain-ProLiant-ML150-G6.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/sbin/../logs/spark-romain-org.apache.spark.deploy.worker.Worker-1-romain-ProLiant-ML150-G6.out
10.0.0.4: starting org.apache.spark.deploy.worker.Worker, logging to /home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/sbin/../logs/spark-romain-org.apache.spark.deploy.worker.Worker-1-romain-wks.out

3)我在UI上没有检测到工作者(10.0.0.6:8080),我在10.0.0.4日志文件中收到以下错误消息:

$ cat spark-romain-org.apache.spark.deploy.worker.Worker-1-romain-wks.out
    ormatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar:/home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -XX:+UseCompressedOops -Xms512m -Xmx512m -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://romain-ProLiant-ML150-G6:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/23 01:35:43 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/09/23 01:35:43 WARN Utils: Your hostname, romain-wks resolves to a loopback address: 127.0.1.1; using 10.0.0.4 instead (on interface wlan0)
15/09/23 01:35:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/09/23 01:35:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/23 01:35:43 INFO SecurityManager: Changing view acls to: romain
15/09/23 01:35:43 INFO SecurityManager: Changing modify acls to: romain
15/09/23 01:35:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(romain); users with modify permissions: Set(romain)
15/09/23 01:35:44 INFO Slf4jLogger: Slf4jLogger started
15/09/23 01:35:44 INFO Remoting: Starting remoting
15/09/23 01:35:44 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@10.0.0.4:35860]
15/09/23 01:35:44 INFO Utils: Successfully started service 'sparkWorker' on port 35860.
15/09/23 01:35:44 INFO Worker: Starting Spark worker 10.0.0.4:35860 with 8 cores, 26.5 GB RAM
15/09/23 01:35:44 INFO Worker: Running Spark version 1.4.1
15/09/23 01:35:44 INFO Worker: Spark home: /home/romain/informatique/zoo/spark/spark-1.4.1-bin-hadoop2.6
15/09/23 01:35:44 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/09/23 01:35:44 INFO WorkerWebUI: Started WorkerWebUI at http://10.0.0.4:8081
15/09/23 01:35:44 INFO Worker: Connecting to master akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077/user/Master...
15/09/23 01:35:45 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: romain-ProLiant-ML150-G6: Name or service not known
15/09/23 01:35:54 INFO Worker: Retrying connection to master (attempt # 1)
15/09/23 01:35:54 INFO Worker: Connecting to master akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077/user/Master...
15/09/23 01:35:54 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: romain-ProLiant-ML150-G6
15/09/23 01:36:04 INFO Worker: Retrying connection to master (attempt # 2)
15/09/23 01:36:04 INFO Worker: Connecting to master akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077/user/Master...
15/09/23 01:36:04 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: romain-ProLiant-ML150-G6: Name or service not known
15/09/23 01:36:14 INFO Worker: Retrying connection to master (attempt # 3)
15/09/23 01:36:14 INFO Worker: Connecting to master akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077/user/Master...
15/09/23 01:36:14 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@romain-ProLiant-ML150-G6:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: romain-ProLiant-ML150-G6

4)在奴隶机器上尝试:[$ ./sbin/start-slave.sh spark://10.0.0.6:7077]给我:

15/09/23 02:19:13 INFO Worker: Connecting to master akka.tcp://sparkMaster@10.0.0.6:7077/user/Master... 15/09/23 02:19:13 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@10.0.0.6:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.0.0.6:7077

我的主人为自己的ip地址命名,并没有使用它的ip作为参考,但我的集群中没有DNS可以将创建的地址与实际的IP相匹配....我怎样才能让主人发送它ip地址,而不是 Worker 的名字?
谢谢,罗曼 .