Apache spark(graphx)可能没有利用所有内核和内存

我正在尝试使用实时日记数据的图表,https://snap.stanford.edu/data/soc-LiveJournal1.html .

我有一个由10个计算节点组成的集群 . 每个计算节点都有64G RAM和32个内核 .

当我使用9个工作节点运行pagerank算法时,它比仅使用1个woker节点运行它要慢 . 我怀疑由于某些配置问题,我没有使用所有内存和/或内核 .

我经历了火花的配置,调整和编程指南 .

我使用spark-shell来运行调用的脚本

./spark-shell --executor-memory 50g

我有 Worker 和主人跑 . 当我启动spark-shell时,我得到以下日志

14/07/09 17:26:10 INFO Slf4jLogger: Slf4jLogger started
14/07/09 17:26:10 INFO Remoting: Starting remoting
14/07/09 17:26:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@node0472.local:60035]
14/07/09 17:26:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@node0472.local:60035]
14/07/09 17:26:10 INFO SparkEnv: Registering MapOutputTracker
14/07/09 17:26:10 INFO SparkEnv: Registering BlockManagerMaster
14/07/09 17:26:10 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140709172610-7f5e
14/07/09 17:26:10 INFO MemoryStore: MemoryStore started with capacity 294.4 MB.
14/07/09 17:26:10 INFO ConnectionManager: Bound socket to port 45700 with id = ConnectionManagerId(node0472.local,45700)
14/07/09 17:26:10 INFO BlockManagerMaster: Trying to register BlockManager
14/07/09 17:26:10 INFO BlockManagerInfo: Registering block manager node0472.local:45700 with 294.4 MB RAM
14/07/09 17:26:10 INFO BlockManagerMaster: Registered BlockManager
14/07/09 17:26:10 INFO HttpServer: Starting HTTP Server
14/07/09 17:26:10 INFO HttpBroadcast: Broadcast server started at http://172.16.104.72:48116
14/07/09 17:26:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7b4a7c3c-9fc9-4a64-b2ac-5f328abe9265
14/07/09 17:26:10 INFO HttpServer: Starting HTTP Server
14/07/09 17:26:11 INFO SparkUI: Started SparkUI at http://node0472.local:4040
14/07/09 17:26:12 INFO AppClient$ClientActor: Connecting to master spark://node0472.local:7077...
14/07/09 17:26:12 INFO SparkILoop: Created spark context..
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140709172612-0007
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/0 on worker-20140709162149-node0476.local-53728 (node0476.local:53728) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/0 on hostPort node0476.local:53728 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/1 on worker-20140709162145-node0475.local-56009 (node0475.local:56009) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/1 on hostPort node0475.local:56009 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/2 on worker-20140709162141-node0474.local-58108 (node0474.local:58108) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/2 on hostPort node0474.local:58108 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/3 on worker-20140709170011-node0480.local-49021 (node0480.local:49021) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/3 on hostPort node0480.local:49021 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/4 on worker-20140709165929-node0479.local-53886 (node0479.local:53886) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/4 on hostPort node0479.local:53886 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/5 on worker-20140709170036-node0481.local-60958 (node0481.local:60958) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/5 on hostPort node0481.local:60958 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/6 on worker-20140709162151-node0477.local-44550 (node0477.local:44550) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/6 on hostPort node0477.local:44550 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/7 on worker-20140709162138-node0473.local-42025 (node0473.local:42025) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/7 on hostPort node0473.local:42025 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/8 on worker-20140709162156-node0478.local-52943 (node0478.local:52943) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/8 on hostPort node0478.local:52943 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/1 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/0 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/2 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/3 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/6 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/4 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/5 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/8 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/7 is now RUNNING
Spark context available as sc.

scala> 14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0479.local:47343/user/Executor#1253632521] with ID 4
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0474.local:39431/user/Executor#1607018658] with ID 2
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0481.local:53722/user/Executor#-1846270627] with ID 5
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0477.local:40185/user/Executor#-111495591] with ID 6
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0473.local:36426/user/Executor#652192289] with ID 7
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0480.local:37230/user/Executor#-1581927012] with ID 3
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0475.local:46363/user/Executor#-182973444] with ID 1
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0476.local:58053/user/Executor#609775393] with ID 0
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@node0478.local:55152/user/Executor#-2126598605] with ID 8
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0474.local:60025 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0473.local:33992 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0481.local:46513 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0477.local:37455 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0475.local:33829 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0479.local:56433 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0480.local:38134 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0476.local:46284 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0478.local:43187 with 28.8 GB RAM

根据日志,我相信我的应用程序已在工作人员上注册,每个执行者都有50克RAM . 现在,我在终端上运行以下scala代码来加载数据和计算pagerank

import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val startgraphloading = System.currentTimeMillis;
val graph = GraphLoader.edgeListFile(sc, "filepath").cache();
graph.cache();
val endgraphloading = System.currentTimeMillis;

val startpr1 = System.currentTimeMillis;
val prGraph = graph.staticPageRank(1)
val endpr1   = System.currentTimeMillis;

val startpr2 = System.currentTimeMillis;
val prGraph = graph.staticPageRank(5)
val endpr2   = System.currentTimeMillis;

val loadingt = endgraphloading - startgraphloading;
val firstt   = endpr1 - startpr1
val secondt  = endpr2 - startpr2

print(loadingt)
print(firstt)
print(secondt)

当我尝试在每个节点上查看内存使用情况时,实际上只使用了2-3个计算节点RAM . 这是对的吗?只有1名 Worker 而不是9名 Worker ,它运行得更快 .

我正在使用spark独立集群模式 . 配置有问题吗?

提前致谢 :)

回答(1)

2 years ago

看完火花码后,我发现了这个问题 . 在我的脚本中,我使用graphx是一个问题 .

val graph = GraphLoader.edgeListFile(sc, "filepath").cache();

当我查看edgeListFile的构造函数时,它说minPartition = 1 . 我认为它是最小分区,但它是你想要的分区大小 . 我将它设置为节点数,即我想要的分区,并完成 . 另外需要注意的是,如graphx编程指南中所述,如果你还没有从主分支构建spark 1.0 . 您应该使用自己的partitionBy函数 . 如果图形未正确分区,则会导致一些问题 .

我花了一段时间才知道这一点,希望这些信息能节省一些人的时间:)