首页 文章

在被Driver停止后,Spark流式传输作业失败

提问于
浏览
8

我有一个火花流工作,从Kafka读取数据并对其进行一些操作 . 我正在通过一个纱线集群Spark 1.4.1运行这个工作,它有两个节点,每个节点有16 GB RAM,每个节点有16个核心 .

我把这些conf传递给了spark-submit工作:

--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3

作业返回此错误并在运行一段时间后结束:

INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)

.....

ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver

Updated :

这些日志也被发现:

INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.

....

INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.

INFO yarn.ExecutorRunnable: Starting Executor Container.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...

INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)

INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
    at org.apache.hadoop.util.Shell.run(Shell.java:487)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

可能是什么原因?感谢一些帮助 .

谢谢

2 回答

  • -3

    你能告诉你从kafka读取的scala / java代码吗?我怀疑你可能没有正确创建SparkConf .

    尝试类似的东西

    SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");
    

    还尝试在yarn-client模式下运行应用程序并共享输出 .

  • 1

    我遇到了同样的问题 . 我找到了一个解决方案来解决问题,方法是删除 main 函数末尾的 sparkContext.stop() ,保留GC的 stop 动作 .

    Spark团队已经在Spark核心中解决了这个问题,但是到目前为止,修复工具刚刚成为了主分支 . 我们需要等到修复程序更新到新版本 .

    https://issues.apache.org/jira/browse/SPARK-12009

相关问题