我试图在8节点IB(OFED-1.5.3-4.0.42)集群上部署Hadoop-RDMA并遇到以下问题(a.k.a File ...只能复制到0个节点,而不是1个):
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt
Warning: $HADOOP_HOME is deprecated.
14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.From.Code(Unknown Source)
at org.apache.hadoop.hdfs.From.F(Unknown Source)
at org.apache.hadoop.hdfs.From.F(Unknown Source)
at org.apache.hadoop.hdfs.The.run(Unknown Source)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
... 12 more`
14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting...
14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
当我开始从本地文件系统复制到HDFS时,似乎数据没有传输到DataNode . 我测试了DataNodes的可用性:
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0`
-------------------------------------------------
Datanodes available: 0 (4 total, 4 dead)`
`Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014
Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014
Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014
Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:55 MSK 2014
并尝试在HDFS文件系统中成功的mkdir . 重新启动Hadoop守护进程并没有产生任何积极影响 .
你能帮我解决这个问题吗?谢谢 .
最好,亚历克斯
1 回答
我发现了我的问题 . 该问题与已配置为NFS分区的hadoop.tmp.dir的配置有关 . 默认情况下,它配置为/ tmp,即本地fs . 从core-site.xml中删除hadoop.tmp.dir之后,问题就解决了 .