运行distcp作业我遇到以下问题:几乎所有的map任务都标记为成功,但注意说Container已被杀死 .
在联机界面上, Map 作业的日志显示:Progress 100.00 State SUCCEEDED
但是注意它几乎每次尝试(~200)容器被ApplicationMaster杀死 . ApplicationMaster杀死的容器 . 根据要求杀死容器 . 退出代码是143
在与该尝试相关联的日志文件中,我可以看到一个日志,说任务'attempt_xxxxxxxxx_0'已完成 .
对于所有作业/尝试,stderr输出为空 .
在查看应用程序主日志并执行其中一次成功(但已杀死)尝试后,我会找到以下日志:
2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
我设置了“mapreduce.map.speculative = false”!
All MAP task are SUCCEEDED(distcp job has no REDUCE),but MAPREDUCE is going for a long time(several hours) , then it is succeeded and distcp job is done.
我正在运行'yarn version'= Hadoop 2.5.0-cdh5.3.1
我应该担心吗?是什么导致容器被杀?任何建议将不胜感激!
1 回答
那些被杀的企图可能是由于投机性的执行 . 在这种情况下,没有什么可担心的 .
要确保是这种情况,请尝试运行您的distcp,如下所示:
你应该停止看到那些被杀的企图 .