首页 文章

Spark Actuator ,驱动程序,执行程序核心,执行程序内存的值

提问于
浏览
0

我对Spark Actuator ,驱动程序,执行程序核心,执行程序内存的值有一些疑问 .

  • 如果群集上没有运行应用程序,如果要提交作业,Spark执行程序,执行程序核心,执行程序内存的默认值是什么?

  • 如果我们想要计算您要提交的作业所需的Spark执行程序,执行程序核心,执行程序内存的值,您将如何执行此操作?

2 回答

  • 0

    如果群集上没有运行应用程序,如果您要提交作业,Spark执行程序,执行程序核心,执行程序内存的默认值是什么?

    默认值存储在安装了spark的群集的 spark-defaults.conf 中 . 所以你可以验证这些值 . 通常默认值是 .

    检查默认值 . 请参考这个document

    如果我们想要计算您要提交的作业所需的Spark执行程序,执行程序核心,执行程序内存的值,您将如何执行此操作?

    取决于以下事项

    • 你有什么类型的工作,即它是shuffle密集型或只有 Map 操作 . 如果它是shuffle你可能需要更多的内存 .

    • 数据大小,数据大小越大,内存使用量越大

    • 群集约束 . 你能承受多少记忆力 .

    基于这些因素,您需要从一些数字开始,然后查看需要了解瓶颈并增加或减少内存占用量的spark UI .

    保持执行程序内存超过40G的一个注意事项可能会更加高效,因为JVM GC会变慢 . 内核太多也可能会减慢进程 .

  • 0

    Avishek的答案涵盖了默认值 . 我将重点介绍计算最佳值 . 让我们举一个例子,

    Example : 6 nodes, each with 16 cores and 64Gb RAM

    每个执行程序都是JVM实例 . 因此可以在节点上执行多个执行程序 .

    让我们从选择 number of cores per executor 开始:

    Number of cores = Concurrent tasks as executor can run 
    
    One may think if there is higher concurrency, performance will be better. However, experiments have shown that spark jobs perform well when the number of cores = 5.
    
    If number of cores > 5, it leads to poor performance.
    
    Note that 1 core and 1 Gb is needed for OS and Hadoop Daemons.
    

    现在,计算执行者的数量:

    As discussed earlier, there are 15 cores available for each node and we are planning for 5 cores per executors.
    
    Thus number of executors per node = 15/5 = 3
    Total number of executors = 3*6 = 18
    
    Out of all executors, 1 executor is needed for AM management by YARN.
    Thus, final executors count = 18-1 = 17 executors.
    

    内存每执行者:

    Executor per node = 3
    RAM available per node = 63 Gb (as 1Gb is needed for OS and Hadoop Daemon)
    Memory per executor = 63/3 = 21 Gb.
    
    Some memory overhead is required by spark. Which is max(384, 7% of memory per executor).
    Thus, 7% of 21 = 1.47
    As 1.47Gb > 384Mb, subtract 1.47 from 21.
    Hence, 21 - 1.47 ~ 19 Gb
    

    最终数字:

    Executors - 17, Cores 5, Executor Memory - 19 GB
    

    注意:

    1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.
    
    2. Memory per executor should be less than 40 else there will be a considerable GC overhead.
    

相关问题