首页 文章

在同一台Windows机器上使用Master在独立群集上运行IntelliJ Idea中的Spark

提问于
浏览
3

将master设置为local [*]时,我能够在IntelliJ Idea中成功运行Spark应用程序 . 但是,当我将master设置为单独的Spark实例时,会发生异常 .

我试图执行的SparkPi App如下所示 .

import scala.math.random

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark tjvrlaptop:7077").setAppName("Spark Pi") //.set("spark.scheduler.mode", "FIFO").set("spark.cores.max", "8")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 20
    val n = math.max(100000000L * slices, Int.MaxValue).toInt // avoid overflow

    for(j <- 1 to 1000000) {
      val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x * x + y * y < 1) 1 else 0
      }.reduce(_ + _)
      println("Pi is roughly " + 4.0 * count / n)
    }
    spark.stop()
  }
}

这是我的build.sbt内容:

name := "SBTScalaSparkPi"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

这是我的plugins.sbt内容:

logLevel := Level.Warn

我通过在同一台机器上的不同命令提示中使用以下命令来执行Spark Master和worker .

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.master.Master --host tjvrlaptop

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.worker.Worker spark tjvrlaptop:7077

[大师和 Worker 似乎没有任何问题地启动和运行] [1]

[1]:http i.stack.imgur.com/B3BDZ.png

接下来,我尝试在IntelliJ中运行该程序 . 一段时间后它失败并出现以下错误:

Command Promt where Master is running

16/03/27 14:44:33 INFO Master:注册应用Spark Pi
16/03/27 14:44:33 INFO Master:已注册的应用Spark Pi,ID为app app-20160327144433-0000
16/03/27 14:44:33 INFO Master:在 Worker 上启动执行者app-20160327144433-0000 / 0
worker-20160327140440-192.168.56.1-52701 16/03/27 14:44:38 INFO Master:收到来自应用程序app-20160327144433-0000的注册申请16/03/27 14:44:38 INFO Master:删除app app-20160327144433 -0000 16/03/27 14:44:38 INFO Master:TJVRLAPTOP:55368已取消关联,将其删除 . 16/03/27 14:44:38 INFO Master:192.168.56.1:55350已取消关联,将其删除 . 16/03/27 14:44:38 WARN Master:获得未知执行者app-20160327144433-0000 / 0的状态更新

Command Prompt where the Worker is running

16/03/27 14:44:34 INFO工作者:被要求为Spark Pi发起执行者app-20160327144433-0000 / 0 16/03/27 14:44:34 INFO SecurityManager:将视图更改为:tjoha 16/03 / 27 14:44:34 INFO SecurityManager:将修改修改为:tjoha 16/03/27 14:44:34 INFO SecurityManager:SecurityManager:身份验证禁用; ui acls disabled;具有查看权限的用户:Set(tjoha);具有修改权限的用户:Set(tjoha)16/03/27 14:44:34 INFO ExecutorRunner:启动命令:“C Program Files \ Java \ jre1.8.0_77 \ bin \ java”“-cp”“C Users \ tjoha \ Documents \ spark-1.6.1-bin-hadoop2.6 \ bin .. \ conf ; C Users \ tjoha \ Documents \ spark-1.6.1-bin-hadoop2.6 \ bin .. \ lib \ spark-assembly -1.6.1-hadoop2.6.0.jar; C Users \ tjoha \ Documents \ spark-1.6.1-bin-hadoop2.6 \ bin .. \ lib \ datanucleus-api-jdo-3.2.6.jar; C Users \ tjoha \ Documents \ spark-1.6.1-bin-hadoop2.6 \ bin .. \ lib \ datanucleus-core-3.2.10.jar; C Users \ tjoha \ Documents \ spark-1.6.1-bin-hadoop2 . 6 \ bin .. \ lib \ datanucleus-rdbms-3.2.9.jar“” - Xms1024M“” - Xmx1024M“” - Dspark.driver.port = 55350“”org.apache.spark.executor.CoarseGrainedExecutorBackend“” - driver-url“”spark CoarseGrainedScheduler@192.168.56.1:55350“” - executor-id“”0“” - hostname“”192.168.56.1“”--cores“”8“” - app-id“” app-20160327144433-0000“” - worker-url“”spark Worker@192.168.56.1:52701“16/03/27 14:44:38 INFO工作者:被要求杀死执行人app-20160327144433-0000 / 0 16/03 / 27 14:44:38 INFO E. xecutorRunner:执行者app-20160327144433-0000 / 0的runner线程被中断16/03/27 14:44:38 INFO ExecutorRunner:杀戮过程! 16/03/27 14:44:38 INFO工作人员:执行人app-20160327144433-0000 / 0完成状态KILLED exitStatus 1 16/03/27 14:44:38 INFO Worker:清理应用程序的本地目录app-20160327144433- 0000 16/03/27 14:44:38 INFO ExternalShuffleBlockResolver:应用程序app-20160327144433-0000已删除,cleanupLocalDirs = true

IntelliJ Idea Output

使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties 16/03/27 15:06:04 INFO SparkContext:运行Spark版本1.6.1 16/03/27 15:06:05 WARN NativeCodeLoader:无法为您的平台加载native-hadoop库...使用builtin-java类适用于16/03/27 15:06:05 INFO SecurityManager:将视图更改为:tjoha 16/03/27 15:06:05 INFO SecurityManager:将修改修改为:tjoha 16/03/27 15:06:05 INFO SecurityManager:SecurityManager:认证已禁用; ui acls disabled;具有查看权限的用户:Set(tjoha);具有修改权限的用户:设置(tjoha)16/03/27 15:06:06 INFO Utils:在端口56183上成功启动服务'sparkDriver'16/03/27 15:06:07 INFO Slf4jLogger:Slf4jLogger已启动16/03 / 27 15:06:07 INFO Remoting:启动远程处理16/03/27 15:06:07 INFO Remoting:远程启动;收听地址:[akka tcp sparkDriverActorSystem@192.168.56.1:56196] 16/03/27 15:06:07 INFO Utils:在端口56196上成功启动了'sparkDriverActorSystem'服务.16 / 03/27 15:06:07 INFO SparkEnv :注册MapOutputTracker 16/03/27 15:06:07 INFO SparkEnv:注册BlockManagerMaster 16/03/27 15:06:07 INFO DiskBlockManager:在C Users \ tjoha \ AppData \ Local \ Temp \ blockmgr-9623b0f9-81f5-4a10-bbc7-ba077d53a2e5 16/03/27 15:06:07创建本地目录INFO MemoryStore:MemoryStore以容量2.4 GB 16/03/27 15:06:07开始信息SparkEnv:注册OutputCommitCoordinator 16/03/27 15:06:07 WARN Utils:服务'SparkUI'无法绑定端口4040.尝试端口4041. 16/03/27 15:06:07 INFO Utils:在端口4041上成功启动了'SparkUI'服务.16 / 03/27 15:06:07 INFO SparkUI:在http 192.168.56.1:4041 16/03开始SparkUI / 27 15:06:08 INFO AppClient $ ClientEndpoint:连接到主火花tjvrlaptop:7077 ... 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend:使用应用ID app连接到Spark群集app-20160327150608-0002 16/03 / 27 15:06:09 INFO AppClient $ ClientEndpoint:Executor添加:app-20160327150608-0002 / 0 on worker-20160327150550-192.168.56.1-56057(192.168.56.1:56057)8核16/03/27 15:06 :09 INFO SparkDeploySchedulerBackend:G在HostPort 192.168.56.1:56057上运行执行程序ID app-20160327150608-0002 / 0 8核,1024.0 MB RAM 16/03/27 15:06:09 INFO AppClient $ ClientEndpoint:Executor更新:app-20160327150608-0002 / 0是现在运行16/03/27 15:06:09 INFO Utils:在端口56234上成功启动了服务'org.apache.spark.network.netty.NettyBlockTransferService' . 16/03/27 15:06:09 INFO NettyBlockTransferService:服务器创建on 56234 16/03/27 15:06:09 INFO BlockManagerMaster:尝试注册BlockManager 16/03/27 15:06:09 INFO BlockManagerMasterEndpoint:注册块管理器192.168.56.1:56234和2.4 GB RAM,BlockManagerId(驱动程序,192.168) .56.1,56234)16/03/27 15:06:09 INFO BlockManagerMaster:已注册BlockManager 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend:SchedulerBackend已准备好在达到minRegisteredResourcesRatio后开始进行调度:0.0 16/03/27 15 :06:10 INFO SparkContext:起始工作:在SparkPi.scala中减少:37 16/03/27 15:06:10 INFO DAGScheduler:找到工作0(减少在SparkPi.scala:37)有20个输出分区16/03/27 15:06:10 INFO DAGScheduler:最后阶段:ResultStage 0(在SparkPi.scala减少:37)16/03/27 15:06:10 INFO DAGScheduler :最后阶段的父母:名单()16/03/27 15:06:10 INFO DAGScheduler:失踪的父母:名单()16/03/27 15:06:10 INFO DAGScheduler:提交ResultStage 0(MapPartitionsRDD [1] at Map 在SparkPi.scala:33),没有失踪的父母16/03/27 15:06:10 INFO MemoryStore:阻止broadcast_0存储为内存中的值(估计大小1880.0 B,免费1880.0 B)16/03/27 15 :06:10 INFO MemoryStore:阻止broadcast_0_piece0作为字节存储在内存中(估计大小1212.0 B,免费3.0 KB)16/03/27 15:06:10 INFO BlockManagerInfo:在192.168.56.1:56234内存中添加了broadcast_0_piece0(大小: 1212.0 B,免费:2.4 GB)16/03/27 15:06:10 INFO SparkContext:在DAGScheduler.scala广播中创建广播0:1006 16/03/27 15:06:10 INFO DAGScheduler:提交20个丢失的任务来自ResultStage 0(MapPartitionsRDD [1]在 Map a t SparkPi.scala:33)16/03/27 15:06:10 INFO TaskSchedulerImpl:添加任务集0.0,包含20个任务16/03/27 15:06:14 INFO SparkDeploySchedulerBackend:已注册的执行者NettyRpcEndpointRef(null)(TJVRLAPTOP:56281 )ID为0/03/03/27 15:06:14 INFO TaskSetManager:在阶段0.0中启动任务0.0(TID 0,TJVRLAPTOP,分区0,PROCESS_LOCAL,2078字节)16/03/27 15:06:14 ... TJVRLAPTOP,分区6,PROCESS_LOCAL,2078字节)16/03/27 15:06:14 INFO TaskSetManager:在阶段0.0中启动任务7.0(TID 7,TJVRLAPTOP,分区7,PROCESS_LOCAL,2078字节)16/03/27 15: 06:14 INFO BlockManagerMasterEndpoint:注册块管理器TJVRLAPTOP:56319,511.1 MB RAM,BlockManagerId(0,TJVRLAPTOP,56319)16/03/27 15:06:15 INFO BlockManagerInfo:在TJVRLAPTOP内存中添加了broadcast_0_piece0:56319(大小:1212.0) B,免费:511.1 MB)16/03/27 15:06:15 INFO TaskSetManager:在0.0阶段启动任务8.0(TID 8,TJVRLAPTOP,分区8,PROCESS_LOCAL,2078字节)16/03/27 15:06:15 INFO TaskSetManager:在阶段0.0中启动任务9.0(TID 9,TJVRLAPTOP,分区9,PROCESS_LOCAL,2078字节)16/03/27 15:06:15 INFO TaskSetManager:在阶段0.0中启动任务10.0(TID 10,TJVRLAPTOP,分区10,PROCESS_LOCAL, 2078字节)16/03/27 15:06:15 INFO TaskSetManager:在阶段0.0中启动任务11.0(TID 11,TJVRLAPTOP,分区11,PROCESS_LOCAL,2078字节)16/03/27 15:06:15 ... java .lang.ClassNotFoundException:java.net.URLClassLoader.findClass中的SparkPi $$ anonfun $ main $ 1 $$ anonfun $ 1(未知来源)at java.lang.ClassLoader.loadClass(Unknown Source),位于java.lang.Class.forName0(本机方法)的java.lang.ClassLoader.loadClass(未知来源),位于org的java.lang.Class.forName(未知来源) .apache.spark.serializer.JavaDeserializationStream $$ anon $ 1.resolveClass(JavaSerializer.scala:68)at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)at java.io.ObjectInputStream.readClassDesc(Unknown Source)at java.io. java.io.ObjectInputStream中的java.io.ObjectInputStream.defaultReadFields中的java.io.ObjectInputStream.readObject0(未知来源)处于java.io.ObjectInputStream的java.io.ObjectInputStream.readSerialData(未知来源)中的ObjectInputStream.readOrdinaryObject(未知来源) . java.io.ObjectInputStream.readOdial对话的java.io.ObjectInputStream.readSerialData(Unknown Source)中的java.io.ObjectInputStream.defaultReadFields(未知来源)java.io.ObjectInputStream.readObject0(未知来源)中的readOrdinaryObject(未知来源)( java.io.ObjectInputStream.readObj中的Unknown Source) ect0(未知来源)java.io.ObjectInputStream.readSerial上的java.io.ObjectInputStream.defaultReadFields(未知来源)java.io.ObjectInputStream.readSerial上的未知来源(未知来源)java.io.ObjectInputStream.readObject0上的java.io.ObjectInputStream.readOrdinaryObject(未知来源)来自org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala)的org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)中的java.io.ObjectInputStream.readObject(未知来源)的未知来源:115)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)atg.apache.spark.scheduler.Task.run(Task.scala:89)at org.apache.spark.executor . 执行者$ TaskRunner.run(Executor.scala:214)java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(Unknown Source)at java.lang.Thread.run (未知来源)16/03/27 15:06:15 INFO TaskSetManager:在执行者TJVRLAPTOP的阶段0.0(TID 5)中丢失任务5.0:java.lang.ClassNotFo undException(SparkPi $$ anonfun $ main $ 1 $$ anonfun $ 1)[重复1] 16/03/27 15:06:15 INFO TaskSetManager:在执行程序TJVRLAPTOP的阶段0.0(TID 3)中丢失任务3.0:java.lang.ClassNotFoundException ... INFO TaskSetManager:在阶段0.0中启动任务10.1(TID 20,TJVRLAPTOP,分区10,PROCESS_LOCAL,2078字节)16/03/27 15:06:15 ... TJVRLAPTOP,分区3,PROCESS_LOCAL,2078字节)16 / 03/27 15:06:15 INFO TaskSetManager:在执行者TJVRLAPTOP上的阶段0.0(TID 4)中丢失任务4.0:java.lang.ClassNotFoundException(SparkPi $$ anonfun $ main $ 1 $$ anonfun $ 1)[重复8] 16 / 03/27 15:06:15 INFO TaskSetManager:在执行者TJVRLAPTOP上的阶段0.0(TID 12)中丢失任务12.0:java.lang.ClassNotFoundException ... INFO TaskSetManager:在阶段0.0中启动任务2.3(TID 39,TJVRLAPTOP,分区2 ,PROCESS_LOCAL,2078字节)16/03/27 15:06:16 MapOutputTrackerMasterEndpoint已停止! 16/03/27 15:06:16 WARN TransportChannelHandler:来自TJVRLAPTOP / 192.168.56.1的异常:56281 java.io.IOException:远程主机强制关闭现有连接16/03/27 15:06:17 INFO MemoryStore:MemoryStore已清除16/03/27 15:06:17 INFO BlockManager:BlockManager已停止16/03/27 15:06:17 INFO BlockManagerMaster:BlockManagerMaster已停止16/03/27 15:06:17 INFO OutputCommitCoordinator $ OutputCommitCoordinatorEndpoint: OutputCommitCoordinator停了! 16/03/27 15:06:17 INFO SparkContext:成功停止SparkContext 16/03/27 15:06:17 INFO ShutdownHookManager:关闭挂钩,名为16/03/27 15:06:17 INFO RemoteActorRefProvider $ RemotingTerminator:关闭远程守护进程 . 16/03/27 15:06:17 INFO ShutdownHookManager:删除目录C Users \ tjoha \ AppData \ Local \ Temp \ spark-11f8184f-23fb-43be-91bb-113fb74aa8b9

1 回答

  • 1

    当您以嵌入模式(local [*])运行时,Spark在类路径上具有所有必需的代码 .

    当您在独立模式下运行时,您必须通过将jar复制到lib文件夹将其显式提供给Spark .

相关问题