我在一个公共列上连接了两个数据帧,然后运行了一个show方法:
df= df1.join(df2, df1.col1== df2.col2, 'inner')
df.show()
然后加入运行非常慢,最后引发错误:奴隶丢失 .
Py4JJavaError: An error occurred while calling o109.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 : ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
驱动程序堆栈跟踪:at
org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1431)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler .scala:1419)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1418)at scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)at at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799)org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799)at the scla.Option.foreach(Option.scala:236)at org位于org.a的org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)的.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) pus.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)at org.apache.spark.util.EventLoop $$ anon $ 1 . 运行(EventLoop.scala:48)org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)org.apache.spark .SparkContext.runJob(SparkContext.scala:1845)atg.apache.spark.SparkContext.runJob(SparkContext.scala:1858)at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212)at at位于org.apache.spark.sql的org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)的org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) . DataFrame $$ anonfun $ org $ apache $ spark $ sql $ DataFrame $$执行$ 1 $ 1.apply(DataFrame.scala:1499)org.apache.spark.sql.DataFrame $$ anonfun $ org $ apache $ spark $ sql $ DataFrame $$在org.apa执行$ 1 $ 1.apply(DataFrame.scala:1499)位于org.apache.spark.sql.DataFrame.org的org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)的che.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:56) $ apache $ spark $ sql $ DataFrame $$执行$ 1(DataFrame.scala:1498)org.apache.spark.sql.DataFrame.org $ apache $ spark $ sql $ DataFrame $$ collect(DataFrame.scala:1505)at at org.apache.spark.sql.DataFrame $$ anonfun $ head $ 1.apply(DataFrame.scala:1375)at org.apache.spark.sql.DataFrame $$ anonfun $ head $ 1.apply(DataFrame.scala:1374)at at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)atg.apache.spark.sql.DataFrame.take( DataFrame.scala:1456)atg.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at ja va.lang.reflect.Method.invoke(Method.java:498)py4j.reflection.MethodInvoke.invoke(MethodInvoker.java:231)py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)py4j.Gateway .invoke(Gateway.java:259)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)at py4j.commands.CallCommand.execute(CallCommand.java:79)py4j.GatewayConnection.run(GatewayConnection.java: 209)at java.lang.Thread.run(Thread.java:745)
经过一些搜索,似乎这是一个与内存相关的问题 . 然后我增加了重新分配到3000,增加了执行程序内存,增加了memoryOverhead,但仍然没有运气,我得到了同样的奴隶丢失错误 . 在df.show()期间,我发现其中一个执行程序shuffle写入的大小非常高,其他的并不是那么高 . 任何线索?谢谢
1 回答
如果使用scala试试
如果pyspark
要么
如果尝试在pyspark中执行相同的操作 - sql