首页 文章

在YARN接受后,MapReduce Jobs失败了

提问于
浏览
0

即使是简单的WordCount mapreduce也会因同样的错误而失败 .

Hadoop 2.6.0

以下是纱线日志 .

在资源协商期间似乎发生了某种超时 .
但我无法验证相同,究竟是什么导致超时 .

2016-11-11 15:38:09,313 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:启动appattempt_1478856936677

000002时出错 . 异常:java.io.IOException:本地异常失败:java.io.IOException:java.net.SocketTimeoutException:等待通道准备好读取时超时60000毫秒 . ch:java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054 remote = platform-demo / 10.0.37.145:60487];主机详细信息:本地主机为:“platform-demo / 10.0.37.145”;目标主机是:“platform-demo”:60487; org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)org.apache.hadoop.ipc.Client.call(Client.java:1472)atg.apache.hadoop.ipc.Client.call (Client.java:1399)org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke(ProtobufRpcEngine.java:232)at com.sun.proxy . $ Proxy79.startContainers(Unknown Source)at org.apache.hadoop . 在org.apache的org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)的yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96) . hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor $ Worker.run( ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)引起:java.io.IOException:java.net.SocketTimeoutException:60000毫秒超时,等待通道准备好阅读 . ch:java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054 remote = platform-demo / 10.0.37.145:60487] org.apache.hadoop.ipc.Client $ Connection $ 1.run(Client.java :680)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) )org.apache.hadoop.ipc.Client $ Connection.handleSaslConnectionFailure(Client.java:643)org.apache.hadoop.ipc.Client $ Connection.setupIOstreams(Client.java:730)org.apache.hadoop . ipc.Client $ Connection.access $ 2800(Client.java:368)org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)atg.apache.hadoop.ipc.Client.call(Client.java) :1438)... 9更多引起:java.net.SocketTimeoutException:等待通道准备好读取时60000毫秒超时 . ch:java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054 remote = platform-demo / 10.0.37.145:60487] org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) atg.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)atg.ap.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)at java.io.FilterInputStream.read(FilterInputStream.java) :133)at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)at java.io.BufferedInputStream.read(BufferedInputStream.java:254)at java.io.DataInputStream.readInt(DataInputStream.java:387)at org .apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)org.apache.hadoop.ipc.Client $ Connection.setupSaslConnection(Client.java:553)org.apache.hadoop.ipc.Client $ Connection .access $ 1800(Client.java:368)org.apache.hadoop.ipc.Client $ Connection $ 2.run(Client.java:722)at org.apache.hadoop.ipc.Client $ Connection $ 2.run(Client . java:718)在java.security.AccessController.doP在org.apache.hadoop.ipc的org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)的javax.security.auth.Subject.doAs(Subject.java:415)上的rivileged(Native Method) . 客户端$ Connection.setupIOstreams(Client.java:717)... 12更多2016-11-11 15:38:09,319 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:更新应用程序尝试appattempt_1478856936677

000002最终状态:FAILED,退出状态:-1000 2016-11-11 15:38:09,319 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:appattempt_1478856936677

000002状态从ALLOCATED更改为FINAL_SAVING

我试图更改以下属性

yarn.nodemanager.resource.memory-mb 2200物理内存量,以MB为单位可以分配给容器 . yarn.scheduler.minimum-allocation-mb 500 dfs.datanode.socket.write.timeout 3000000 dfs.socket.timeout 3000000

1 回答

  • 0

    Q1.MapReduce作业失败,在被YARN接受之后

    原因是130左右的多个连接卡在端口60487上 .

    Q2.MapReduce作业在YARN接受后失败了

    问题是由于hadoop tmp / app / hadoop / tmp . 清空此目录并重新尝试MAPR作业,作业已成功执行 .

    Q3.Unhealthy Node local-dirs很糟糕:/ tmp / hadoop-hduser / nm-local-dir

    使用folowing属性编辑yarn-site.xml .

    <property>
            <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
            <value>98.5</value>
    </property>
    

    参考Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?

相关问题