我是Apache Spark和hadoop的新手 . 我想用火花对遥感图像进行分类 . 我正在尝试使用spark mllib随机森林分类器使用spark mllib的文档,但我收到一个错误,这是我的代码

from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, 
VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
data = 
spark.read.format("libsvm").load("data/mllib/images/multi-channel/image_sat.jpg")

这是错误消息

2018-06-17 16:01:34 WARN LibSVMFileFormat:66 - 未指定'numFeatures'选项,通过输入来确定要素的数量 . 如果您事先知道该号码,请通过'numFeatures'选项指定它以避免额外扫描 . 2018-06-17 16:01:48 WARN ObjectStore:568 - 无法获取数据库global_temp,返回NoSuchObjectException [阶段0:>(0 1)/ 1] 2018-06-17 16:01:55错误执行者:91 - 阶段0.0(TID 0)中的任务0.0中的异常java.lang.ArrayIndexOutOfBoundsException:63位于org.apache.spark.unsafe.types的org.apache.spark.unsafe.types.UTF8String.numBytesForFirstByte(UTF8String.java:191) . UTF8String.numChars(UTF8String.java:206)org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext(未知来源)org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java) :43)at org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:614)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408 )scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon $ 11 . hasNext(ITER ator.scala:408)org.apache.spark.rdd.RDD $$ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD.scala:1014)at org.apache.spark.rdd.RDD $$ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD.scala:1013)org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:2123)at org.apache.spark.SparkContext $$ anonfun $ 33.apply( SparkContext.scala:2123)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)atg.apache.spark.scheduler.Task.run(Task.scala:109)org.apache.spark .executor.Executor $ TaskRunner.run(Executor.scala:345)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) )在java.lang.Thread.run(Thread.java:748)2018-06-17 16:01:55 WARN TaskSetManager:66 - 阶段0.0中丢失的任务0.0(TID 0,localhost,executor driver):java.lang .ArrayIndexOutOfBoundsException:63位于org.apache.spark.unsafe的org.apache.spark.unsafe.types.UTF8String.numBytesForFirstByte(UTF8String.java:191) . org.apache.spark中的types.UTF8String.numChars(UTF8String.java:206)org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext(未知来源)org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator) .java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:614)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala :408)scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)atg.apache.spark.rdd.RDD $$ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD.scala:1014)at org.apache.spark.rdd.RDD $ $ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD.scala:1013)org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:2123)at org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:2123)at org.apache.spark.scheduler.ResultTask.runTas k(ResultTask.scala:87)org.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)at java .util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)2018 -06-17 16:01:55 ERROR TaskSetManager:70 - 阶段0.0中的任务0失败1次;中止作业Traceback(最近一次调用最后一次):文件“”,第1行,在文件“/usr/local/spark/python/pyspark/sql/readwriter.py”,第166行,在load return self._df(self . _jreader.load(path))文件“/usr/local/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py”,第1160行,在调用文件“/ usr / local / spark中/python/pyspark/sql/utils.py“,第63行,在deco返回f(* a,** kw)文件”/usr/local/spark/python/lib/py4j-0.10.6-src.zip/ py4j / protocol.py“,第320行,在get_return_value中py4j.protocol.Py4JJavaError:调用o57.load时发生错误 . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务0失败1次,最近失败:阶段0.0中丢失的任务0.0(TID 0,localhost, Actuator 驱动程序):java.lang.ArrayIndexOutOfBoundsException :org.apache.spark.spache.spark.sp. .sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext(未知来源)org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec $ $ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:614)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator . scala:408)at scla.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at scla.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)at org.apac he.spark.rdd.RDD $$ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD.scala:1014)at org.apache.spark.rdd.RDD $$ anonfun $ reduce $ 1 $$ anonfun $ 14.apply(RDD .scala:1013)org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:2123)org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:2123)at org . apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)atg.apache.spark.scheduler.Task.run(Task.scala:109)at org.apache.spark.executor.Executor $ TaskRunner.run( Executor.scala:345)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run (Thread.java:748)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2124)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1029)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)
    at org.apache.spark.mllib.util.MLUtils$.computeNumFeatures(MLUtils.scala:94)
    at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply$mcI$sp(LibSVMRelation.scala:104)
    at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply(LibSVMRelation.scala:95)
    at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply(LibSVMRelation.scala:95)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.source.libsvm.LibSVMFileFormat.inferSchema(LibSVMRelation.scala:95)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:202)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:202)
    at scala.Option.orElse(Option.scala:289)
    at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:201)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 63
    at org.apache.spark.unsafe.types.UTF8String.numBytesForFirstByte(UTF8String.java:191)
    at org.apache.spark.unsafe.types.UTF8String.numChars(UTF8String.java:206)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1014)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
    at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
    at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more

我正在使用python和ubuntu 16.04有人可以帮助我pleaaaaase

谢谢