rdd_data = sc.parallelize([ list(r)[2:-1] for r in data.itertuples()])  
rdd_data.count()

使用独立群集我面临以下错误 . windows 7 python 3.6

给我错误:

get_return_value中的~~ \ Anaconda2 \ envs \ py36 \ lib \ site-packages \ py4j \ protocol.py(answer,gateway_client,target_id,name)318引发Py4JJavaError(319“调用{0}时发生错误{1} {2 } . \ n“ . - > 320格式(target_id,” . ,名称),值)321 else:322引发Py4JError(Py4JJavaError:调用z时发生错误:org.apache.spark.api.python.PythonRDD .collectAndServe . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务0失败1次,最近失败:阶段0.0中丢失的任务0.0(TID 0,localhost, Actuator 驱动程序):org . apache.spark.SparkException:Python工作者没有及时连接到org.apache.spark.api.pyi.py.PythonWorkerFactory.create的org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:138) . PythonWorkerFactory.scala:67)org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)atg.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)at org.apache.spark .api.pyt hon.PythonRDD.compute(PythonRDD.scala:63)位于org.apache.spark.rdd.RDD.iterator的org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)(RDD.scala:287) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)org.apache.spark.scheduler.Task.run(Task.scala:108)at org.apache.spark.executor.Executor $ TaskRunner .run(Executor.scala:338)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)java.lang上的java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) . Thread.run(Thread.java:748)引起:java.net.SocketTimeoutException:在java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:135)的java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)接受超时java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409),java.net.PlainSocketImpl.accept(PlainSocketImpl.java:199),java.net.ServerSocket.implAccept(ServerSocket.java:545),java.net.Server org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)中的Socket.accept(ServerSocket.java:513)... 12个驱动程序堆栈跟踪:at org.apache.spark.scheduler.DAGScheduler . org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1517)org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1505)at org.apache.spark .scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1504)scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer) .scala:48)org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:814)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:814)位于org.apache.spark.s的scala.Option.foreach(Option.scala:257) cheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)在Org.apache.spark中,org.apache.ched . .dAGScheduler.runJob(DAGScheduler.scala:630)位于org.apache的org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)org.apache.spark.SparkContext.runJob(SparkContext.scala:2050) . spark.SparkContext.runJob(SparkContext.scala:2069)org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)at org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply(RDD . scala:936)org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)atg.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)at org.apache.spa rk.rdd.RDD.withScope(RDD.scala:362)org.apache.spark.rdd.RDD.collect(RDD.scala:935)org.apache.spark.api.python.PythonRDD $ .collectAndServe(PythonRDD) .scala:467)在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.inv:62 )在py4j的py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)的sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498) .reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)位于py4j.Gateway.invoke(Gateway.java:280)py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)atpy4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:214)at java.lang.Thread.run(Thread.java:748)引起:org.apache.spark .SparkException:Python工作者没有及时连接到org.apache.spark.api.pyi.pyy.pyy.pyy.pyy.PythonWorkerFactory.create上的org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:138) . (PythonWorkerFactory.scala :67)org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)位于org.apache.spark.api的org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128) . 位于org.apache.spark.rdd.RDD.iterator的org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)的python.PythonRDD.compute(PythonRDD.scala:63)(RDD.scala:287) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)org.apache.spark.scheduler.Task.run(Task.scala:108)at org.apache.spark.executor.Executor $ TaskRunner java.util.concurrent.ThreadPoolExecutor中的.run(Executor.scala:338) . runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)... 1更多引起:java.net.SocketTimeoutException:接受在java.net.DualStackPlainSocketImpl超时java.net.DualStackPlainSocketImpl.smplAccept(DualStackPlainSocketImpl.java:135)中的.waitForNewConnection(本机方法),java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409),java.net.PlainSocketImpl.accept(PlainSocketImpl.java:1991) )java.net.ServerSocket.implAccept(ServerSocket.java:545)at java.net.ServerSocket.accept(ServerSocket.java:513)at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala: 133)......还有12个