内存分配给Spark中的执行程序和任务-Java 学习之路

我的群集配置如下： - 7个节点，每个节点有32个内核和252 GB内存 .

纱线配置如下： -

yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25

Map 缩小配置如下： -

mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb

火花配置如下： -

spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384

现在我尝试通过将值设置为主纱线以及执行程序内存，num-executors，executor-cores的不同值来运行spark-shell .

spark-shell --master yarn --executor-memory 9856M --num-executors 175 --executor-cores 1

在这种情况下， Actuator 存储器384对于纱线调度器不能超过10GB max . 所以在这种情况下9856M 384 MB = 10GB所以它工作正常 . 现在，一旦火花壳启动， Actuator 总数为124而不是重新检测175.火花壳启动日志中的存储内存或每个 Actuator 的Spark UI为6.7 GB（即10GB的67％） .

spark shell进程的top命令输出如下： -

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
8478    hdp66-ss  20    0   13.5g 1.1g  25m S  1.9  0.4   2:11.28

因此虚拟内存为13.5G，物理内存为1.1g

spark-shell --master yarn --executor-memory 9856M --num-executors 35 --executor-cores 5

在这种情况下， Actuator 存储器384对于纱线调度器不能超过10GB max . 所以在这种情况下9856M 384 MB = 10GB所以它工作正常 . 现在一旦火花壳启动， Actuator 的总数就是35.火花壳启动日志中的存储内存或每个 Actuator 的Spark UI是6.7 GB（即10GB的67％） .

spark shell进程的top命令输出如下： -

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
5256    hdp66-ss  20    0   13.2g 1.1g  25m S  2.6  0.4   1:25.25

因此虚拟内存为13.2G，物理内存为1.1g

spark-shell --master yarn --executor-memory 4096M --num-executors 200 --executor-cores 1

在这种情况下， Actuator 存储器384对于纱线调度器不能超过10GB max . 所以在这种情况下4096M 384 MB = 4GB所以它工作正常 . 现在一旦火花壳启动， Actuator 的总数就是200.火花壳启动日志中的存储内存或每个 Actuator 的Spark UI是2.7 GB（即4GB的67％） .

spark shell进程的top命令输出如下： -

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
21518   hdp66-ss  20    0   19.2g 1.4g  25m S  3.9  0.6   2:24.46

因此虚拟内存为19.2G，物理内存为1.4g .

那么有人可以解释一下这些记忆和执行者是如何开始的 . 为什么在spark UI上看到的内存占执行者内存的67％？以及如何为每个执行程序确定虚拟和物理内存 .

1 回答

Spark几乎总是分配用户为执行程序请求的内存的65％到70％ . Spark的这种行为是由SPARK JIRA TICKET "SPARK-12579" 引起的 .

This link is to the scala file located in the Apache Spark Repository that is used to calculate the executor memory among other things.

if (conf.contains("spark.executor.memory")) {
  val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
  if (executorMemory < minSystemMemory) {
    throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
      s"$minSystemMemory. Please increase executor memory using the " +
      s"--executor-memory option or spark.executor.memory in Spark configuration.")
  }
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong

}

上面的代码对您看到的行为负责 . 对于群集可能没有用户请求的内存的情况，这是一种安全防范 .

回复于 2024-04-25T10:21:03+08:00

内存分配给Spark中的执行程序和任务

1 回答

相关问题