我的场景:从elasticsearch读取数据然后做一些计算,计算的最终结果存储在elasticsearch中 .

我使用少量数据来测试是否成功,但切换到大量数据总是会出现此错误 . 我真的很困惑

spark版本:1.6.1 elasticsearch版本:2.3.1

线程“main”中的异常org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务1失败4次,最近失败:阶段0.0中失去的任务1.3(TID 37,10.10.150.231):org .elasticsearch.hadoop.rest.EsHadoopInvalidRequest:空c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw ==在org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478)在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436 )org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496)at org.elasticsearch.hadoop.rest.RestRepository . 在org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)上的org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)的scala.collection中滚动(RestRepository.java:454) .Iterator $$ anon $ 13.hasNext(Iterator.scala:371)at scala.collection.Iterator $$ anon $ 11.hasNext(Itera tor.scala:327)org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)at org.apache.spark.CacheManager .getOrCompute(CacheManager.scala:78)org.apache.spark.rdd.RDD.iterator(RDD.scala:268)atg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)org . aplet.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)atg.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask . scala:73)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)atg.apache.spark.scheduler.Task.run(Task.scala:89)at org.apache.spark.executor .Executor $ TaskRunner.run(Executor.scala:214)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在java.util.concurrent.ThreadPoolExecutor中$ Worker.run(ThreadPoolExecutor.java:617)在java.lang.Thread.run(Thread.java:745)

驱动程序堆栈跟踪:在org.apache.spark.scheduler.DAGScheduler.org $阿帕奇$火花$ $调度$$ DAGScheduler failJobAndIndependentStages(DAGScheduler.scala:1431)在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1申请(DAGScheduler.scala:1419)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1418)at scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala: 59)位于org.apache.spark.scheduler.DAGScheduler上的org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)中的scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:799)在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:799)在scala.Option.foreach(Option.scala:236 )org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler . 阶:1640)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)在org.apache.spark.util .EventLoop $$匿名$ 1.run(EventLoop.scala:48)在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)在org.apache.spark.SparkContext.runJob(SparkContext.scala:1832 )org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)org.apache.spark.SparkContext.runJob(SparkContext.scala:1922)at org.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark) .scala:67)atg.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark.scala:52)org.elasticsearch.spark.package $ SparkRDDFunctions.saveToEs(package.scala:37)at BothwayForPU $ .main( BothwayForPU.scala:82)在BothwayForPU.main(BothwayForPU.scala)在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)在sun.reflect.Deleg atingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at atjava.lang.reflect.Method.invoke(Method.java:606)org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:731)at org . apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206)at org.apache.spark.deploy.SparkSubmit $ . main(SparkSubmit.scala:121)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)引起:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:null c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw == at org.elasticsearch.hadoop.rest .restClient.checkResponse(RestClient.java:478)atg.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)at at Org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496)位于org.elasticsearch.hadoop.rest.ScrollQuer的org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:454) y.hasNext(ScrollQuery.java:86)org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)at scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)at scala .collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala) :171)org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)org.apache.spark.rdd.RDD.iterator(RDD.scala:268)atg.apache.spark.rdd.MapPartitionsRDD . 计算(MapPartitionsRDD.scala:38)atg.apache.spache.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)atg.apache.spark.rdd.RDD.iterator(RDD.scala:270)atg.apache .spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)atg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)at org.apache.spark.scheduler.Task.run(Task.scala :89)在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala) :214)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread . Java的:745)