我在Linux服务器上运行spark 2.0和zeppelin-0.6.1-bin-all . 默认的spark笔记本运行得很好,但是当我尝试使用sqlContext在pyspark中创建并运行一个新的笔记本时,我得到错误“py4j.Py4JException:Method createDataFrame([class java.util.ArrayList,class java.util.ArrayList, null])不存在“
我试过运行一个简单的代码,
%pyspark
wordsDF = sqlContext.createDataFrame([('cat',), ('elephant',), ('rat',), ('rat',), ('cat', )], ['word'])
wordsDF.show()
print type(wordsDF)
wordsDF.printSchema()
我收到错误,
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7635635698598314374.py", line 266, in
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7635635698598314374.py", line 259, in
exec(code)
File "", line 1, in
File "/spark/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.py", line 299, in createDataFrame
return self.sparkSession.createDataFrame(data, schema, samplingRatio)
File "/spark/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/spark/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/spark/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 316, in get_return_value
format(target_id, ".", name, value))
Py4JError: An error occurred while calling o48.createDataFrame. Trace:
py4j.Py4JException: Method createDataFrame([class java.util.ArrayList, class java.util.ArrayList, null]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
当我用“sqlContext = SQLContext(sc)”尝试相同的代码时,它工作得很好 .
我已经尝试设置解释器“zeppelin.spark.useHiveContext false”配置,但它不起作用 .
我显然必须遗漏一些东西,因为这是一个如此简单的操作 . 如果有任何其他配置要设置或我缺少什么,请建议 .
我使用Zeppelin 0.6.0测试了相同的代码,它工作正常 .
1 回答
SparkSession
是Spark 2.0.0的默认入口点,它在Zeppelin 0.6.1中映射到spark
(就像它在Spark shell中一样) . 你试过spark.createDataFrame(...)
吗?