首页 文章

无法连接到spark群集中的slave

提问于
浏览
2

我想创建一个spark独立集群 . 我有2个工作站和笔记本电脑 . 都有Ubuntu作为他们的操作系统 . 这些系统中的每一个都具有不同的用户名 . 我关注了这个博客Spark cluster . 我编辑了hosts文件

sudo gedit /etc/hosts

10.8.9.13 master  
10.8.19.23 slave01  
10.8.5.158 slave02 

user-name of Master: lab   
user-name of Slave01: lab-zero   
user-name of  Slave02: computer

我还生成了键值对 ssh-keygen -t rsa 并将其添加到.ssh / authorized_keys文件中 . 所以当我ssh这两台机器时,我能够在没有密码的情况下登录 . 但是,当我运行 ./start-all.sh 时,它给出了

lab@slave02's password: lab@slave01's password: localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/lab/Downloads/spark-2.1.1-bin-hadoop2.7/logs/spark-acs-lab-rg.apache.spark.deploy.worker.Worker-1-M1.out

它被卡在这里,并且使用我的默认用户名 lab 而不是远程主机的用户名(在这种情况下是slave的用户名: lab-zerocomputer )来访问两个从属服务器

当我检查Spark Master UI时,它会给我一个错误:

The requested URL could not be retrieved

当我输入 ./stop-slaves.sh 时它返回

no org.apache.spark.deploy.worker.Worker to stop

这是我的工作日志:

17/11/30 01:53:40 INFO Worker: Retrying connection to master (attempt # 16) 17/11/30 01:53:40 INFO Worker: Connecting to master
10.8.9.13:7077... 17/11/30 01:53:40 WARN Worker: Failed to connect to master 10.8.9.13:7077 org.apache.spark.SparkException: Exception thrown in awaitResult    
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)  
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)  
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)    
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)  
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)  
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)    
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)     
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)  
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)   
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)  
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  
at java.util.concurrent.FutureTask.run(FutureTask.java:266)     
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.io.EOFException    
at java.io.DataInputStream.readFully(DataInputStream.java:197)  
at java.io.DataInputStream.readUTF(DataInputStream.java:609)    
at java.io.DataInputStream.readUTF(DataInputStream.java:564)    
at org.apache.spark.rpc.netty.RequestMessage$.readRpcAddress(NettyRpcEnv.scala:582)     
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:592)  
at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:651)    
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:636)    
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157)  
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105)     
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)   
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)   
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)   
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)   
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)    
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)     
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)     
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)  
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)  
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)    
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)     
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)     
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)     
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)    
at java.lang.Thread.run(Thread.java:748)


at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:189)   
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)    
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)   
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)   
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)   
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)   
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)    
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)     
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)     
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)  
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)  
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)    
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)     
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)     
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)     
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)    
    ... 1 more 17/11/30 01:54:43 ERROR Worker: All masters are unresponsive! Giving up.

2 回答

  • 0

    通过将所有系统名称更改为具有相同名称来解决该问题 . 还在master中创建了一个 slaves 文件:〜/ spark-2.0.2-bin-hadoop2.7 / conf . 从属文件具有以下内容:

    # A Spark Worker will be started on each of the machines listed below.
    10.8.9.13   
    10.8.19.23   
    10.8.5.158
    

    还将主IP地址添加到〜/ spark-2.0.2-bin-hadoop2.7 /conf/spark-env.sh文件中

    export SPARK_MASTER_HOST=10.8.9.13
    
  • 1
    • 在主控主机上运行 ./sbin/start-master.sh ,检查ui中的可用性,默认端口为 8080 .

    • 在每个从属主机上运行 ./sbin/start-slave.sh spark://10.8.9.13:7077 .

    有关详细信息,请参阅this link .

相关问题