首页 文章

当当前内存不足时,为什么Spark不会分配新的YARN容器?

提问于
浏览
1

我有Cloudera Cluster,YARN容量为600 Vcores,内存为3600 GiB . 但Admin团队已将纱线容器的最大内存配置为6 GB . 我的用户有权尽可能多地分配容器 .

当我尝试在大小为50 Gb的数据集上运行spark作业时作业失败并且Executor Memory Overhead错误 .

当一个容器内存不足以解决为什么不能尝试新的容器?

1 回答

  • 0

    当一个容器内存不足以解决为什么不能尝试新的容器?

    ...因为默认情况下Spark不会这样做(并且 you 没有配置它) .

    执行程序的数量,更重要的是CPU内核和RAM内存的总数由您在 spark-submit 时控制 . 这就是 --driver-memory--executor-memory--driver-cores--total-executor-cores--executor-cores--num-executors 等等 .

    $ ./bin/spark-submit --help
    ...
      --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
      --driver-java-options       Extra Java options to pass to the driver.
      --driver-library-path       Extra library path entries to pass to the driver.
      --driver-class-path         Extra class path entries to pass to the driver. Note that
                                  jars added with --jars are automatically included in the
                                  classpath.
    
      --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
    ...
     Spark standalone with cluster deploy mode only:
      --driver-cores NUM          Cores for driver (Default: 1).
    ...
     Spark standalone and Mesos only:
      --total-executor-cores NUM  Total cores for all executors.
    
     Spark standalone and YARN only:
      --executor-cores NUM        Number of cores per executor. (Default: 1 in YARN mode,
                                  or all available cores on the worker in standalone mode)
    
     YARN-only:
      --driver-cores NUM          Number of cores used by the driver, only in cluster mode
                                  (Default: 1).
      --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
      --num-executors NUM         Number of executors to launch (Default: 2).
                                  If dynamic allocation is enabled, the initial number of
                                  executors will be at least NUM.
    ...
    

    有些是特定于部署模式,而有些则依赖于正在使用的集群管理器(在您的情况下将是YARN) .

    总结...它是 you 来决定使用 spark-submit 选项分配给Spark应用程序的资源数量 .

    阅读Spark官方文档中的Submitting Applications .

相关问题