SparkPi程序在Yarn / Spark / Google Compute Engine下继续运行

在Google Compute Engine上部署了一个Hadoop(Yarn Spark)集群,其中包含一个主服务器和两个从服务器 . 当我运行以下shell脚本时:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 /home/hadoop/spark-install/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10

工作只是保持运行和每一秒我收到类似这样的消息:


15/02/06 22:47:12 INFO yarn.Client: Application report from ResourceManager:
         application identifier: application_1423247324488_0008<br>
         appId: 8<br>
         clientToAMToken: null<br>
         appDiagnostics:<br>
         appMasterHost: hadoop-w-zrem.c.myapp.internal<br>
         appQueue: default<br>
         appMasterRpcPort: 0<br>
         appStartTime: 1423261517468<br>
         yarnAppState: RUNNING<br>
         distributedFinalState: UNDEFINED<br>
         appTrackingUrl: http://hadoop-m-xxxx:8088/proxy/application_1423247324488_0008/<br>
         appUser: achitre

回答(2)

2 years ago

而不是 --master yarn-cluster 使用 --master yarn-client

2 years ago

将以下行添加到我的脚本后,它工作:

export SPARK_JAVA_OPTS="-Dspark.yarn.executor.memoryOverhead=1024 -Dspark.local.dir=/tmp -Dspark.executor.memory=1024"

我想,在指定内存时我们不应该使用'm','g'等;否则我们得到NumberFormatException .