当我在我的集群上启动hbase时,HMaster进程和HQuorumPeer进程在主节点上启动,而只有HQuorumPeer进程在从服务器上启动 .

在GUI控制台的任务部分,我可以看到状态为RUNNING的主(node0)和状态“等待区域服务器计数到达;当前在0中检查,睡眠250920毫秒,期望最小值为1,最大值2147483647,超时4500毫秒,间隔1500毫秒“ . 在软件属性部分,我可以在zookeeper仲裁中找到我的所有节点,其描述为“所有已注册ZK服务器的地址” . 所以似乎Zookeeper正在工作,但在日志文件中似乎是问题所在 .

记录hbase-clusterhadoop-master:

2016-09-08 12:26:14,875 INFO  [main-SendThread(node0:2181)] zookeeper.ClientCnxn: Opening socket connection to server node0/192.168.1.113:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login) 2016-09-08 12:26:14,882 WARN  [main-SendThread(node0:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)   at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2016-09-08 12:26:14,994 WARN  [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node3:2181,node2:2181,node1:2181,node0:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
    ........
    2016-09-08 12:32:53,063 INFO  [master:node0:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=node3:2181,node2:2181,node1:2181,node0:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase
    2016-09-08 12:32:53,064 INFO  [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Opening socket connection to server node3/192.168.1.112:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)
    2016-09-08 12:32:53,065 INFO  [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Socket connection established to node3/192.168.1.112:2181, initiating session
    2016-09-08 12:32:53,069 INFO  [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Session establishment complete on server node3/192.168.1.112:2181, sessionid = 0x357095a4b940001, negotiated timeout = 90000
    2016-09-08 12:32:53,072 INFO  [master:node0:60000] zookeeper.RecoverableZooKeeper: Node /hbase/replication/rs already exists and this is not a retry
    2016-09-08 12:32:53,072 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner
    2016-09-08 12:32:53,075 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.snapshot.SnapshotLogCleaner
    2016-09-08 12:32:53,076 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.cleaner.HFileLinkCleaner
    2016-09-08 12:32:53,077 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner
    2016-09-08 12:32:53,078 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner
    2016-09-08 12:32:53,078 INFO  [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2016-09-08 12:32:54,607 INFO  [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1529 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    2016-09-08 12:32:56,137 INFO  [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3059 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

记录hbase-clusterhadoop-zookeeper-node0(master):

2016-09-08 12:26:18,315 WARN  [WorkerSender[myid=0]] quorum.QuorumCnxManager: Cannot open channel to 1 at election address node1/192.168.1.156:3888
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
    at java.net.Socket.connect(Socket.java:527)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:695)

记录hbase-clusterhadoop-regionserver-node1(其中一个slave):

2016-09-08 12:33:32,690 INFO  [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Opening socket connection to server node3/192.168.1.112:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)
2016-09-08 12:33:32,691 INFO  [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Socket connection established to node3/192.168.1.112:2181, initiating session
2016-09-08 12:33:32,692 INFO  [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2016-09-08 12:33:32,793 WARN  [regionserver60020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node3:2181,node2:2181,node1:2181,node0:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2016-09-08 12:33:32,794 ERROR [regionserver60020] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2016-09-08 12:33:32,794 WARN  [regionserver60020] zookeeper.ZKUtil: regionserver:600200x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
    at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,794 ERROR [regionserver60020] zookeeper.ZooKeeperWatcher: regionserver:600200x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
    at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,795 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server node1,60020,1473330794709: Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
    at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,798 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
2016-09-08 12:33:32,798 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Unexpected exception during initialization, aborting
2016-09-08 12:33:32,867 INFO  [regionserver60020-SendThread(node0:2181)] zookeeper.ClientCnxn: Opening socket connection to server node0/192.168.1.113:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)

记录hbase-clusterhadoop-zookeeper-node1:

2016-09-08 12:33:32,075 WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0%0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to node3/192.168.1.112:2888
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
    at java.net.Socket.connect(Socket.java:527)
    at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2016-09-08 12:33:32,227 INFO  [node1/192.168.1.156:3888] quorum.QuorumCnxManager: Received connection request /192.168.1.113:49844
2016-09-08 12:33:32,233 INFO  [WorkerReceiver[myid=1]] quorum.FastLeaderElection: Notification: 1 (message format version), 0 (n.leader), 0x10000002d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) FOLLOWING (my state)
2016-09-08 12:33:32,239 INFO  [WorkerReceiver[myid=1]] quorum.FastLeaderElection: Notification: 1 (message format version), 3 (n.leader), 0x10000002d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) FOLLOWING (my state)
2016-09-08 12:33:32,725 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.111:49534
2016-09-08 12:33:32,725 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2016-09-08 12:33:32,725 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.1.111:49534 (no session established for client)

conf文件abase-site:

<configuration>

    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>

    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://node0:9000/hbase</value>
    </property>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>node0,node1,node2,node3</value>
    </property>

    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/Users/clusterhadoop/usr/local/zookeeper</value>
    </property>

    <property>
        <name>hbase.tmp.dir</name>
        <value>/Users/clusterhadoop/usr/local/hbtmp</value>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

    <property>
        <name>hbase.master</name>
        <value>node0:60000</value>
    </property>

    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>

    <property>
        <name>hbase.zookeeper.property.maxClientCnxns</name>
        <value>1000</value>
    </property>

</configuration>

主机文件:

127.0.0.1       localhost
127.0.0.1       node3
192.168.1.112 node3
192.168.1.156 node1
192.168.1.111 node2
192.168.1.113 node0

什么是问题以及如何解决问题?