首页 文章

在spark上运行wordcount >>> lines = sc.textFile(“README.md”)>>> lines.count()

提问于
浏览
0

Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误 . :org.apache.hadoop.mapred.InvalidInputException:输入路径不存在:org.apache中的文件:/home/shubhranshu/Documents/spark/spark-1.6.1-bin-hadoop2.6/bin/README.md . 位于org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)的hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java: 313)org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)atg.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239)org.apache .spark.rdd.RDD $$ anonfun $ partition $ 2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala :237)org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)atg.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239)at org . apache.spark.rdd.RDD $$ anonfun $ partition $ 2.apply(RDD.scala:237)位于org.apac的scala.Option.getOrElse(Option.scala:120)在org.apache.spark.rdd.RDD $$ anonfun的org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:58)中的he.spark.rdd.RDD.partitions(RDD.scala:237) $ partition $ 2.apply(RDD.scala:239)at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120) org.apache.spark.rdd.RDD.partitions(RDD.scala:237)org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)at org.apache.spark.rdd.RDD $$ anonfun $在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:150)收集$ 1.apply(RDD.scala:927)org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:111) )org.apache.spark.rdd.RDD.withScope(RDD.scala:316)atg.apache.spark.rdd.RDD.collect(RDD.scala:926)org.apache.spark.api.python . 位于sun.reflect.NativeMeth的sun.reflect.NativeMethodAccessorImpl.invoke0(本地方法)的org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)中的PythonRDD $ .collectAndServe(PythonRDD.scala:405)位于py.j.reflection.MethodInvoker.invoke的java.lang.reflect.Method.invoke(Method.java:497)的sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中的odAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) (MethodInvoker.java:231)py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)at py4j.Gateway.invoke(Gateway.java:259)py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:209)at java.lang.Thread.run(Thread.java:745)

1 回答

  • 0

    需要给README.md提供正确的路径,所以正确的代码是:

    lines = sc.textFile(“../ README.md”)lines.count()

相关问题